Language-agnostic OCR #640
pavel-denisov-fraunhofer
started this conversation in
Ideas
Replies: 1 comment
-
This is a good idea. I think we might have to review a bit the design for when to use the "auto" language mode, but it would definitely be a nice contribution. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I implemented a modification for the
TesseractOcrModel
to work with cases when the document language is not known in advance. It uses Tesseract's script detection to detect the script, and then runs an appropriate script OCR model (e.g. "Latin" for English or German). Would you be interested in this feature in Docling? If yes, I could prepare a PR.Beta Was this translation helpful? Give feedback.
All reactions