-
First step is to convert your PDF into images (e.g. jpeg, PNG files) using python or softwares such as Adobe Acrobat then save all your images to one folder .
-
Download a free open-source tool called ComicEnhancerPro_eng (for download link, see the creator's blog site https://www.cnblogs.com/stronghorse/p/14594337.html)
-
Then try and test various enhancement features of the tool including USM sharpening, Gamma, JPG QLT, bold, auto level. Depending on the quality of the scan, you may need to adjust the features accordingly. Overdone can also hurt the quality so you should aim for a result that is comfortable for human eyes which will lead to high quality OCR results.
-
Click File, choose set DPI, then choose folder to set all images into DPI 300.
-
Then after correctly adjusting the level of enhancement you need (step 4), click File then batch process. Browse the folder that contains all the images, click process all and agree to overwriting the prexisting image files. This will apply all the enhancement features (step 4) onto every image.
-
Now download a free open-source tool called "Image to pdf or xps". Import all your processed images into it to convert it into one single PDF. Choose location and name. This software will ensure that there will be no damage to image enhancement in the process of converting Image to PDF. This step is critical to ensuring good OCR results.
-
You now have a optimized PDF that will have higher OCR accuracy than before.
-
Notifications
You must be signed in to change notification settings - Fork 0
jzou1995/Optimizing-PDF-OCR-Accuracy-through-Image-Enhancement
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This is a step by step guide on improving OCR accuracy results through image enhancement tools
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published