Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Reading Order in Single-page Image-Text Layouts #570

Open
Bariskau opened this issue Dec 11, 2024 · 7 comments
Open

Incorrect Reading Order in Single-page Image-Text Layouts #570

Bariskau opened this issue Dec 11, 2024 · 7 comments
Assignees
Labels
bug Something isn't working pdf PDF issue (except docling-parse)

Comments

@Bariskau
Copy link

Bug

There is an issue with the page reading order. Especially in single-page documents, the reading order of images on the left and text content on the right is not working as expected. This causes incorrect information mapping under images when converting to Markdown format.

Steps to reproduce

  1. Upload a single-page document
  2. Use a page layout with an image on the left and text on the right
  3. Convert the document to Markdown format
  4. Check the output

Expected Behavior:

Reading order should be: Page Title => Image 1 => Section-Header 1 => List Items => Image 2

Actual Behavior:

Reading order is incorrect: Page Title => Image 1 => Image 2 => Section-Header 1

Docling version

2.10.0

Python version

3.10

Sample layout

sample-layout
@Bariskau Bariskau added the bug Something isn't working label Dec 11, 2024
@PeterStaar-IBM PeterStaar-IBM self-assigned this Dec 13, 2024
@cau-git
Copy link
Contributor

cau-git commented Dec 18, 2024

@Bariskau what is the input format you were using? Is this a native Powerpoint, a PDF, or something else? If you provide the source file we could verify more easily.

@Bariskau
Copy link
Author

sample-cpu.pdf
@cau-git I am using PDF format. I shared a sample layout as PDF. Thank you.

@mkhalid12
Copy link

I am also having same issue, is there any solution to solve this order problem?

@Bariskau
Copy link
Author

Bariskau commented Jan 4, 2025

LayoutReader (LayoutML) ordering works compatibly with DocLing. However, DocLing has limitations in obtaining line height and width values. Due to this technical limitation, dividing layout bounding boxes into random smaller bounding boxes and then ordering them with the model generally yields successful results.

However, there are two significant issues with this approach:

  1. LayoutML v3's license is not suitable for commercial use.
  2. Using a multi-modal pre-trained model to develop a model that only uses bounding box data is not an optimal approach.

@dolfim-ibm dolfim-ibm added the pdf PDF issue (except docling-parse) label Jan 30, 2025
@cau-git
Copy link
Contributor

cau-git commented Jan 31, 2025

@Bariskau @mkhalid12 a revised reading order model is currently under development. We will post updates when we have them ready.

@PeterStaar-IBM
Copy link
Contributor

@Bariskau @mkhalid12 You can track this PR: #811

@mkhalid12
Copy link

@cau-git this is such a great news looking forward to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pdf PDF issue (except docling-parse)
Projects
None yet
Development

No branches or pull requests

5 participants