Automated System for Solving CAPTCHAs

CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are popular ways of preventing bots from attempting to log on to systems by extensively searching the password space. In its traditional form, an image is given which contains a few characters (sometimes with some obfuscation thrown in). The challenge is to identify what those characters are and in what order. In this project, we wish to crack these challenges.

So I haved developed a computer program to automatically solve traditional CAPTCHA challenges by identifying characters in obfuscated images using image processing and machine learning.

Method to find out what character are present in the image:

Initial image

Find corner pixel with max frequency to get the background colour of the image
Change the background to white

Now Dilate the image to remove the stray lines

Convert the image to grayscale

Segment image into 3 characters
Iterate over the columns of the image
Check the frequency of number of non white pixels in each column
This will help to get the start and end coordinates of the character
Here, We have used window size of 30 pixel to detect characters and ignore any remaining noise (small stray lines) after dilation
Get the bounding box of the three characters

We found 37 such images where our method was not able to segment images in three characters out of 2000 images.
So we divide the image equally in three segment of size 150x150 by leaving margin of 15 pixel in the beginning and 10 pixel in the following two.
Extracting each character from the image using the bounding box we get from the above approach.
Now we resize the image into 30x30 pixel from 150x150

Now we flatten the image to convert into 1D array
This the the feature vector of one character

Character Label to Numeric Label

Using dictionary we encode each character to a numeric label from 0 to 23 {'ALPHA' : 0, 'BETA' : 1, 'CHI' : 2, 'DELTA' : 3, 'EPSILON': 4, 'ETA' : 5, 'GAMMA' : 6, 'IOTA' : 7, 'KAPPA' : 8, 'LAMDA': 9, 'MU' :10, 'NU' : 11, 'OMEGA' : 12, 'OMICRON':13, 'PHI' : 14, 'PI' : 15, 'PSI' : 16, 'RHO' : 17, 'SIGMA' : 18, 'TAU' : 19, 'THETA' : 20, 'UPSILON' : 21, 'XI' : 22, 'ZETA': 23}

Numeric Label to Character Label

Using dictionary we decode each character from a numeric label. {0 : 'ALPHA', 1: 'BETA', 2: 'CHI', 3: 'DELTA', 4: 'EPSILON', 5: 'ETA', 6: 'GAMMA', 7: 'IOTA', 8: 'KAPPA', 9: 'LAMDA', 10: 'MU', 11: 'NU', 12: 'OMEGA', 13: 'OMICRON', 14: 'PHI', 15: 'PI', 16: 'PSI', 17: 'RHO', 18: 'SIGMA', 19: 'TAU', 20: 'THETA', 21: 'UPSILON', 22: 'XI', 23: 'ZETA'}

Training

For trying we have used Logistics Regression with 5000 iterations.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
reference letters		reference letters
test		test
train		train
.DS_Store		.DS_Store
1.Initial-img.png		1.Initial-img.png
2.Background-Extraction.png		2.Background-Extraction.png
3.Dilated-image.png		3.Dilated-image.png
4.Grayscale.png		4.Grayscale.png
README.md		README.md
assn3.pdf		assn3.pdf
eval.py		eval.py
logistic C graph.png		logistic C graph.png
model.sav		model.sav
predict.py		predict.py
symbol_1_150x150.png		symbol_1_150x150.png
symbol_1_30x30.png		symbol_1_30x30.png
symbol_2_150x150.png		symbol_2_150x150.png
symbol_2_30x30.png		symbol_2_30x30.png
symbol_3_150x150.png		symbol_3_150x150.png
symbol_3_30x30.png		symbol_3_30x30.png
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated System for Solving CAPTCHAs

So I haved developed a computer program to automatically solve traditional CAPTCHA challenges by identifying characters in obfuscated images using image processing and machine learning.

Method to find out what character are present in the image:

Character Label to Numeric Label

Numeric Label to Character Label

Training

About

Releases

Packages

Languages

saqeeb360/Automated-System-for-Solving-CAPTCHAs

Folders and files

Latest commit

History

Repository files navigation

Automated System for Solving CAPTCHAs

So I haved developed a computer program to automatically solve traditional CAPTCHA challenges by identifying characters in obfuscated images using image processing and machine learning.

Method to find out what character are present in the image:

Character Label to Numeric Label

Numeric Label to Character Label

Training

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages