Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't parse the table, treats it as an image #590

Open
Zilong01 opened this issue Dec 13, 2024 · 3 comments
Open

Doesn't parse the table, treats it as an image #590

Zilong01 opened this issue Dec 13, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@Zilong01
Copy link

Thank you guys for your work.
While using it I found out that docling can save tables as images by doing the following

for element, _level in conv_res.document.iterate_items()::
        if isinstance(element, TableItem).
            table_counter += 1
            element_image_filename = (
                output_dir /tables/ ftable-{table_counter}.png”
            )
            element_image_filename.parent.mkdir(parents=True, exist_ok=True)
            with element_image_filename.open(“wb”) as fp.
                element.get_image(conv_res.document).save(fp, “PNG”)

In my task, due to the complexity of tables, many of which are not recognized correctly, I wanted it to be treated as an image without parsing, so I set up the

    pipeline_options.do_ocr = False
    pipeline_options.do_table_structure = False

But this causes the table not to be inserted into the resulting markdown, but the table is still recognized. So I think the table recognition can be inserted as an image into the original text position, now is there a way to achieve this task? Or give me a little idea for modification, thanks!

@Zilong01 Zilong01 added the enhancement New feature or request label Dec 13, 2024
@alexshmmy
Copy link

alexshmmy commented Dec 13, 2024

@Zilong01 Not sure if I understood your question. Tables are properly parsed and saved to .md file, irrespectively if you chose to save also the tables as .png or not. For example for the basic code, without saving the tables as .png, but properly parsing them:

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling_core.types.doc import ImageRefMode, PictureItem
from pathlib import Path

IMAGE_RESOLUTION_SCALE = 2.0

def pdf_to_md(input_doc_path, output_dir):
    pipeline_options = PdfPipelineOptions()
    pipeline_options.images_scale = IMAGE_RESOLUTION_SCALE
    pipeline_options.generate_picture_images = True
    pipeline_options.generate_page_images = True
    pipeline_options.do_ocr = True
    pipeline_options.do_table_structure = True
    pipeline_options.table_structure_options.do_cell_matching = True

    doc_converter = DocumentConverter(
        format_options={
            InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
        }
    )

    conv_res = doc_converter.convert(input_doc_path)
    doc_filename = conv_res.input.file.stem

    table_counter = 0
    picture_counter = 0
    for element, _level in conv_res.document.iterate_items():
        if isinstance(element, PictureItem):
            picture_counter += 1
            element_image_filename = (
                output_dir / f"{doc_filename}-picture-{picture_counter}.png"
            )
            with element_image_filename.open("wb") as fp:
                element.get_image(conv_res.document).save(fp, "PNG")

    # Save markdown with embedded pictures
    md_filename = output_dir / f"{doc_filename}-with-images.md"
    conv_res.document.save_as_markdown(md_filename, image_mode=ImageRefMode.EMBEDDED)

if __name__ == "__main__":
    pdf_file_path = "https://arxiv.org/pdf/2206.01062"
    output_dir = Path(f"./outpu")
    output_dir.mkdir(parents=True, exist_ok=True)
    pdf_to_md(pdf_file_path, output_dir)

I get the table perfect in my .md output file as:
Screenshot 2024-12-13 at 13 40 10

@Zilong01
Copy link
Author

@alexshmmy
Yes, the table can be parsed very well.
But I don't want to parse the table into text, I want to turn it into an image and insert it into MD, just like a regular image.
In this way, I can use MLLM to explain it and avoid the trouble of parsing errors
Just like this

![table-1](Image location of screenshot of this table)

@alexshmmy
Copy link

alexshmmy commented Dec 13, 2024

@Zilong01 Alright! I understood. Since there is way to save the tables as an images, in the row they appear in the text, wouldn't be fairly easy with a simple parser to inject the table images in the text of the output .md file? You only need to make sure where exactly to inject, i.e., the positions that the parser would put the table there (the code above puts the text tables in the correct positions). Then tables will be shown as images, as you desire. I think you can do that with help also of chatgpt and post the code here. Or any of the maintainers can write if there is plan of such extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants