-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Find bounding box for each section in the image #44
base: main
Are you sure you want to change the base?
Feat: Find bounding box for each section in the image #44
Conversation
…g_box param is true
…ong with it's bounding boxes
…ding_box param is set to True
@tylermaran Please have a look at this PR. |
Hi @tylermaran! I wanted to check in on my PR #44. If you have any feedback, I’d love to hear it—just making sure it hasn’t gotten lost in the shuffle! I’m also planning to work on a Node version for this PR, so any input would be super helpful. |
Hey @getwithashish! Sorry I sat on this one for so long. But starting to really look into bounding boxes now and will be testing out your PR. i.e.
I think this method gives a couple improvements:
|
PyTesseract is very old and much worse at OCR than GPT (try with handwritten notes for example), so this PR would be a massive downgrade. I am not sure if it is even a good idea for finding bounding boxes. I'd suggest to look into topics such as "Layout Detection" or "Layout Analysis". Here is a relatively recent benchmark: https://github.com/opendatalab/OmniDocBench?tab=readme-ov-file#layout-detection DocLayout-YOLO does not seem too bad (license might be problematic), but there are new models being released every week, so I'd suggest to abstract it somehow. |
Summary
This pull request introduces a new feature to locate the bounding box of each section within an image, enhancing the traceability of the markdown content. Users now have the ability to toggle this feature to obtain bounding box information for any markdown-generated section.
Why
Previously, there was no way to trace which section of the image the generated markdown originated from, limiting the interpretability of the output. This feature addresses that gap by providing bounding box coordinates for each markdown section.
Changes
bounding_box
param is set toTrue
(pdf.py)bounding_box
param is set toTrue
(modellitellm.py)Section
type which will include all the identified sections of a page, along with their corresponding bounding boxes (types.py)Page
model to include sections and their bounding boxes (zerox.py)pyproject.toml
(pyproject.toml)Functionality
Bounding Box De-Normalization
Bounding boxes are normalized (values between 0 and 1). To de-normalize, multiply the normalized values by the image's dimensions (width, height):
Usage
Output
Generated Markdown
Screenshots
Image plotted with bounding boxes
Performance Impact
This performance impact is expected, considering the accuracy provided by the bounding box detection.