Skip to content

JAIJANYANI/Language-Detection-in-Image

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Text-Language-Detection-in-Image

Detects and Recognizes text and font language in an image

Description

Performed this analysis using The Tesseract OCR Engine.

The Project consist of following steps :

1.) The first step is a connected component analysis in which outlines of the components are stored into Blobs
2.) Blobs are organized into text lines and broken into words
3.) Recognize every word in a two-pass process
4.) A final phase resolves fuzzy spaces, and finalize text

Prerequisites

Software

  • libtesseract (>=3.04)
  • libleptonica (>=1.71)
  • Cython
  • Pillow
  • tesserocr
  • Python 2.7.0 |Anaconda 4.3.0 (64-bit)|

Tested on Ubuntu 16.04 LTS amd64 xenial image built on 2017-09-19 8-core CPU

Installation

sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install python-dev python-pip
sudo apt-get install tesseract-ocr-all libtesseract-dev libleptonica-dev
pip install Cython
pip install Pillow
pip install tesserocr

Running

  • Simply Clone the repository and run this command from root directory.
python ocr_itt.py -i <image_path.jpg>

Input 1

$ python ocr_itt.py -i e.jpg
e

Output

English
Confidence score is 78.6614583333

Input 2

$ python ocr_itt.py -i h.jpg
h

Output

Hindi
Confidence score is 84.2118644068

Input 3

$ python ocr_itt.py -i s.jpg
s

Output

Spanish
Confidence score is 69.7443609023

Author

Jai Janyani

About

Detects and Recognize text and font language in an image

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages