Scanner -> FTP -> PDF/OCR pipeline container

Name Last Update
app Loading commit data...
data Loading commit data...
docs Loading commit data...
var Loading commit data...
.dockerignore Loading commit data...
.gitignore Loading commit data...
Dockerfile Loading commit data...
Makefile Loading commit data... Loading commit data...

e5 Scanner Container






For building the docker container it is nice to have:

For developing on your local system make sure you got the following packages (Ubuntu Trusty package names):

  • poppler-utils
  • tesseract-ocr
  • tesseract-ocr-deu
  • pdftk
  • inotify-tools
  • realpath

Getting started

Install project dependencies

Just run these commands to setup all dependencies:

# Install all needed dependencies
$ make install

Start the development environment

To start the application for development run these commands.

# Start the application
$ make start


This application is able to be deployed with the help of docker. The next sections will help you building a shipable image of the app.

Quick Start

# Run the container
$ docker run -d \
    -e 'MAX_PARALLEL_JOBS=8' \

Available Configuration Parameters

General Options

  • KEEP_ORIGINAL: Switch to keep the original input file. Defaults to: false.

Process Limits

  • MAX_PARALLEL_JOBS: Number of parallel job. Defaults to: 8.

Data Directories

  • DATA_IN: Data directory inside the container to watch for new files. Defaults to: /app/data/in.
  • DATA_TMP: Data directory inside the container to store temporary files. Defaults to: /app/data/tmp.
  • DATA_OUT: Data directory inside the container to store all outputs. Defaults to: /app/data/out.

Data Volumes

In order to kick off a docker container you need to specify data volumes. These data volumes should be host bind mounts for this application, but if you like to store outputs on a docker container volume its fine, too. Anyways the following examples are host bind mounts.

The default paths for the daemon inside the container are /app/data/{in,tmp,out}. Just ignore tmp if you don't be interessted in temporary stuff. The in and out directories needs to be mounted.

$ docker run -d \
    -e 'MAX_PARALLEL_JOBS=8' \
    -v '/host/path/to/in:/app/data/in' \
    -v '/host/path/to/out:/app/data/out' \


# Build the container
$ make build-image

# Start the builded container (quickstart)
$ make start-image

# Stop the container
$ make stop-image

# Purge the build - so you could start over with
# another build
$ make clean-image

# See logs for the running container
$ make image-logs

# Start a bash session on the builded container for debugging.
# This command need **clean** or **stop** like the normal **start**.
$ make start-bash-image