Skip to content

Text extraction using deep learning and computer vision techniques.

Notifications You must be signed in to change notification settings

angadbajwa23/Text_Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EXTRACT

Introduction

EXTRACT is an optical character recognition engine for various operating systems which extracts texts from an image and converts them to plain text.

This model is a very primitive form of the original google tesseract which extracts texts from an image and converts them to plain text.

Modules/Library REQUIREMENTS:

  1. os
  2. numpy
  3. PIL
  4. sys
  5. keras
  6. cropyble
  7. cv2
  8. shutil

Features

a) Extracts text from input image
b) Works on lowercase, uppercase, numbers and special characters.
c) Saves the output in output.txt to allow search.

How To Run the script:

NOTE1:- The trained model is not provided. So for the very first time run the script as it is. Once the model is trained: COMMENT OUT 'Train_Model()' then run the script for further use.

Run the script on your terminal: 'python3 tesseract.py':

Input Image Output

Contributors

  • Akarsh Malik
  • Angad Ripudaman Singh Bajwa

Future Work

  1. To add characters of your own, make sure to add them in the train and test dataset
  2. Change the output of the softmax layer in Train_Model function to the total number of trained characters.
  3. Re-train the model
  4. Test your image

About

Text extraction using deep learning and computer vision techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages