Installation

Linux Mint 19 (Ubuntu bionic)

Installing packages

$ sudo apt install g++  # need to build pdftotext
$ sudo apt install libpoppler-cpp-dev  # need to buid pdftotext

Installing tesseract

$ sudo apt install tesseract-ocr
$ sudo apt install tesseract-ocr-rus  # install languages you want

Installing ghostscript

$ sudo apt install ghostscript

Installing python3.7

$ sudo apt install python3.7
$ sudo apt install python3.7-dev

Installing pip

$ sudo apt install python-pip

Installing virtualenv

$ pip install --user virtualenv
$ echo 'PATH=~/.local/bin:$PATH' >> ~/.bashrc
$ source ~/.bashrc

Installing virtualenvwrapper

$ pip install --user setuptools
$ pip install --user wheel
$ pip install --user virtualenvwrapper
$ echo 'source ~/.local/bin/virtualenvwrapper.sh' >> ~/.bashrc
$ source ~/.bashrc

Creating virtualenv for django_ocr_server

$ mkvirtualenv django_ocr_server -p /usr/bin/python3.7

Installing django-ocr-server (on virtualenv django_ocr_server). It installs Django as a dependency.

$ pip install django-ocr-server

Create your Django project (on virtualenv django_ocr_server)

$ django-admin startproject ocr_server

Go to project directory

$ cd ocr_server

Edit ocr_server/settings.py

Add applications to INSTALLED_APPS

INSTALLED_APPS = [
    ...
    'rest_framework',
    'rest_framework.authtoken',
    'django_ocr_server',
    'rest_framework_swagger',
]

Edit ocr_server/urls.py

from django.contrib import admin
from django.urls import path, include
from rest_framework.documentation import include_docs_urls

admin.site.site_header = 'OCR Server Administration'
admin.site.site_title = 'Welcome to OCR Server Administration Portal'

urlpatterns = [
    path('admin/', admin.site.urls, ),
    path('docs/', include_docs_urls(title='OCR Server API')),
    path('', include('django_ocr_server.urls'), ),
]

Perform migrations (on virtualenv django_ocr_server)

$ python manage.py migrate

Create superuser (on virtualenv django_ocr_server)

$ python manage.py createsuperuser

Run server (on virtualenv django_ocr_server), than visit http://localhost:8000/

$ python manage.py runserver

Linux Mint 19 (Ubuntu bionic) automatic installation

Clone django_ocr_server from github

$ git clone https://github.com/shmakovpn/django_ocr_server.git

Run the installation script using sudo

$sudo {your_path}/django_ocr_server/install_ubuntu.sh

The script creates OS user named ‘django_ocr_server’, installs all needed packages. Creates the virtual environment. It installs django_ocr_server (from PyPI by default, but you can create the package from cloned repository, see the topic ‘Creation a distribution package’ how to do this). Then it creates the django project named ‘ocr_server’ in the home directory of ‘django_ocr_server’ OS user. After the script changes settings.py and urls.py is placed in ~django_ocr_server/ocr_server/ocr_server/. Finally it applies migrations and creates the superuser named ‘admin’ with the same password ‘admin’.

Run server under OS user django_ocr_server, then change ‘admin’ password in the http://localhost:your_port/admin/ page.

$ sudo su
# su django_ocr_server
$ cd ~/ocr_server
$ workon django_ocr_server
$ python manage.py runserver

Centos 7

Install epel repository

$ sudo yum install epel-release

Install yum-utils

$ sudo yum install yum-utils

Install ghostscript (Interpreter for PostScript language & PDF needed for ocrmypdf)

$ sudo yum install ghostscript

Install wget (A utility for retrieving files using the HTTP or FTP protocols for download qpdf that needed for ocrmypdf)

$ sudo yum install wget

Install qpdf

$ cd /usr/local/src
$ wget https://github.com/qpdf/qpdf/releases/download/release-qpdf-9.1.0/qpdf-9.1.0.tar.gz
$ # TODO tar -zxvf qpdf-9.1.0.tar.gz
$ # TODO cd qpdf-9.1.0
$ # TODO ./Configure
$ # TODO make
$ # TODO make install

Install python 3.6

$ sudo yum install python36
$ sudo yum install python36-devel

Install gcc

$ sudo yum intall gcc
$ sudo yum install gcc-c++

Install poppler-cpp-devel (Development files for C++ wrapper for building pdftotext)

$ sudo yum install poppler-cpp-devel

Install tesseract

$ sudo yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
$ sudo bash -c "echo 'gpgcheck=0' >> /etc/yum.repos.d/download.opensuse.org_repositories_home_Alexander_Pozdnyakov_CentOS_7*.repo"
$ sudo yum update
$ sudo yum install tesseract
$ sudo yum install tesseract-langpack-rus  # install a language pack you need

Install pip

$ sudo yum install python-pip

Install virtualenv

$ sudo pip install virtualenv

Create the virtual env for django_ocr_server

$ sudo virtualenv /var/www/ocr_server/venv -p /usr/bin/python3.6 --distribute

Give rights to the project folder to your user

$ sudo chown -R {your_user} /var/www/ocr_server/

Activate virtualenv

$ source /var/www/ocr_server/venv/bin/activate

Install postgresql 11 (The Postgresql version 9.2 that is installing in Centos 7 by default returns an error when applying migrations )

$ sudo rpm -Uvh https://yum.postgresql.org/11/redhat/rhel-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
$ sudo yum install postgresql11-server
$ sudo yum install postgresql-devel
$ sudo /usr/pgsql-11/bin/postgresql-11-setup initdb

Edit /var/lib/pgsql/11/data/pg_hba.conf

host    all             all             127.0.0.1/32            md5
host    all             all             ::1/128                 md5
$ sudo systemctl enable postgresql-11
$ sudo systemctl start postgresql-11
$ sudo -u postgres psql

Create the database and it’s user

create database django_ocr_server encoding utf8;
create user django_ocr_server with password 'django_ocr_server';
alter database django_ocr_server owner to django_ocr_server;
alter user django_ocr_server createdb;  -- if you want to run tests
\q

Install python postgres database driver

$ pip install psycopg2-binary  # (on virtualenv django_ocr_server)

Installing django-ocr-server (on virtualenv django_ocr_server). It installs Django as a dependency

$ pip install django-ocr-server

Create django project (on virtualenv django_ocr_server)

$ cd /var/www/ocr_server
$ django-admin startproject ocr_server .

Edit ocr_server/settings.py

Add applications to INSTALLED_APPS

INSTALLED_APPS = [
    ...
    'rest_framework',
    'rest_framework.authtoken',
    'django_ocr_server',
    'rest_framework_swagger',
]

Configure database connection

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'django_ocr_server',
        'USER': 'django_ocr_server',
        'PASSWORD': 'django_ocr_server',
        'HOST': 'localhost',
        'PORT': '',
    }
}

Edit ocr_server/urls.py

from django.contrib import admin
from django.urls import path, include
from rest_framework.documentation import include_docs_urls

admin.site.site_header = 'OCR Server Administration'
admin.site.site_title = 'Welcome to OCR Server Administration Portal'

urlpatterns = [
    path('admin/', admin.site.urls, ),
    path('docs/', include_docs_urls(title='OCR Server API')),
    path('', include('django_ocr_server.urls'), ),
]

Apply migrations (on virtualenv django_ocr_server)

$ python manage.py migrate

Create superuser (on virtualenv django_ocr_server)

$ python manage.py createsuperuser

Run server (on virtualenv django_ocr_server), than visit http://localhost:8000/

$ python manage.py runserver