Welcome to this repository which provides you with a complete-standalone, automated tool to compare the performance of different OCR services.
This tool was developed to compare the performance of Amazon's, Google's and Microsoft's text detection in a variety of images from: handdrawn characters and words to 'live scene' photographs.
The blog can be found here
This project was developed using:
python 3.7.4
python modules version
as described inrequirements.txt
CharacTER.py
released on 27/06/2019
Software versions are subject to change with new releases, to ensure the project runs smoothly without alteration the above versions should be used. This software was last ran on 14/10/2019
This tool is fully automated to generate images' transcriptions from disk, pass them, one-by-one, into each supported OCR service and generate meaningful metrics.
This tool makes use of the command-line interface (CLI) to operate.
The tool currrently supports the following OCR services:
- Textract is used for detecting document text
- Rekognition is used for detecting live scene text
- Vision is used for detecting document and live scene text
- Computer Vision is used for detecting document and live scene text
Following the instructions below will enable you to use the tool for comparing your own images.
The following need to be setup before using this tool.
- Follow the steps in this guide to create an account and setup a user
- Follow steps 2-4 in this guide to generate your account's key
- Follow the steps in this guide to create a billing activated account
- Follow the steps in this guide to enabled Google Vision for a Google Cloud Project
- Follow the steps in this guide to create an account and link a congitive service resource to it
- Create a secret file, e.g.
vi /path/to/directory/.ms/credentials.txt
- Follow the step 'Get the keys from you resource' in this guide and store this in the secret file (replace the placeholder key value with your account's key)
{
"key": "XXXXXXXXXXXXXXX00XXX"
}
Optional: It is recommended that you store your service/access keys in a secret '.' file.
mv /path/to/saved/credentials.txt /path/to/file/.secret_file.txt
You will need the pathways to these keys in future steps
To install this tool to your local machine for comparison purposes, follow the instructions below.
- Clone this repo to your local machine
git clone <HTTPS URL>/ocr_comparison_tool.git
- Move into the ocr_comparison_tool directory
cd /path/to/cloned/directory/ocr_comparison_tool/
2.5. Optional: Create a python3 virtual environment
python3 -m venv .
then
. bin/activate
- Install the required python libraries
pip3 install -r requirements.txt
To configure the OCR services for this tool, follow the steps below.
In ./ocr_settings/amazon_settings.py
change the placeholder paths to your specific secret files:
environ['AWS_SHARED_CREDENTIALS_FILE']='/path/to/your/secret/credential/.file.txt'
environ['AWS_CONFIG_FILE']='/path/to/your/secret/config/.file.txt'
In ./ocr_settings/google_settings.py
change the placeholder path to your specific secret file:
environ['GOOGLE_APPLICATION_CREDENTIALS']='/path/to/your/secret/credential/.file.json'
In ./ocr_settings/microsoft_settings.py
change the placeholder path to your specific secret file:
MICROSOFT_ACCESS_CREDENTIALS='/path/to/your/secret/credential/.file.json'
In ./ocr_settings/gateway_settings.py
change the placeholder path to your specific CharacTER.py file:
environ['CHARACTER_SCRIPT_PATH']='/path/to/script/CharacTER.py'
- images must
.jpg
or.png
format - images must be at least
50 x 50pxl
This tool supports a variety of ways to process images and their transcripts:
- run using directory
- run using single image (transcript auto-generated)
- run using single image (transcript provided)
- define images' properties filename
- define type of OCR transcript
python3 /path/to/ocr_comparison_tool/cmd.py --dir /path/to/entry_dir
Note: For this option, entry_dir must adhere to the following structure:
entry_dir
├── props.csv # properties for images
├── ogl/ # original transcripts*
├── res/ # apis' transcripts*
├── met/ # CharacTER metric scores*
└── imgs/ # images to be transcribed
├── img1.jpg
├── img2.png
├── .
├── .
├── .
└── imgn.jpg
Directories: ogl, res and met are optional* as they are generated by the tool.
python3 /path/to/ocr_comparison_tool/cmd.py --img /path/to/image.jpg
Note: This command auto-generates the original transcript and so assumes that props.csv is located within the current working directory
python3 /path/to/ocr_comparison_tool/cmd.py --ogl /path/to/transcript.txt --img /path/to/image.jpg
python3 /path/to/ocr_comparison_tool/cmd.py --prp properties.csv
Note: The property file must be located in the current working directory
python3 /path/to/ocr_comparison_tool/cmd.py --med [image/document/both]
Note: Changing the media invokes only the models for that type
- Applied Innovation - Kainos