Doesn't parse the table, treats it as an image #590

Zilong01 · 2024-12-13T08:41:24Z

Thank you guys for your work.
While using it I found out that docling can save tables as images by doing the following

for element, _level in conv_res.document.iterate_items()::
        if isinstance(element, TableItem).
            table_counter += 1
            element_image_filename = (
                output_dir / “tables” / f “table-{table_counter}.png”
            )
            element_image_filename.parent.mkdir(parents=True, exist_ok=True)
            with element_image_filename.open(“wb”) as fp.
                element.get_image(conv_res.document).save(fp, “PNG”)

In my task, due to the complexity of tables, many of which are not recognized correctly, I wanted it to be treated as an image without parsing, so I set up the

    pipeline_options.do_ocr = False
    pipeline_options.do_table_structure = False

But this causes the table not to be inserted into the resulting markdown, but the table is still recognized. So I think the table recognition can be inserted as an image into the original text position, now is there a way to achieve this task? Or give me a little idea for modification, thanks!

alexshmmy · 2024-12-13T12:46:59Z

@Zilong01 Not sure if I understood your question. Tables are properly parsed and saved to .md file, irrespectively if you chose to save also the tables as .png or not. For example for the basic code, without saving the tables as .png, but properly parsing them:

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling_core.types.doc import ImageRefMode, PictureItem
from pathlib import Path

IMAGE_RESOLUTION_SCALE = 2.0

def pdf_to_md(input_doc_path, output_dir):
    pipeline_options = PdfPipelineOptions()
    pipeline_options.images_scale = IMAGE_RESOLUTION_SCALE
    pipeline_options.generate_picture_images = True
    pipeline_options.generate_page_images = True
    pipeline_options.do_ocr = True
    pipeline_options.do_table_structure = True
    pipeline_options.table_structure_options.do_cell_matching = True

    doc_converter = DocumentConverter(
        format_options={
            InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
        }
    )

    conv_res = doc_converter.convert(input_doc_path)
    doc_filename = conv_res.input.file.stem

    table_counter = 0
    picture_counter = 0
    for element, _level in conv_res.document.iterate_items():
        if isinstance(element, PictureItem):
            picture_counter += 1
            element_image_filename = (
                output_dir / f"{doc_filename}-picture-{picture_counter}.png"
            )
            with element_image_filename.open("wb") as fp:
                element.get_image(conv_res.document).save(fp, "PNG")

    # Save markdown with embedded pictures
    md_filename = output_dir / f"{doc_filename}-with-images.md"
    conv_res.document.save_as_markdown(md_filename, image_mode=ImageRefMode.EMBEDDED)

if __name__ == "__main__":
    pdf_file_path = "https://arxiv.org/pdf/2206.01062"
    output_dir = Path(f"./outpu")
    output_dir.mkdir(parents=True, exist_ok=True)
    pdf_to_md(pdf_file_path, output_dir)

I get the table perfect in my .md output file as:

Zilong01 · 2024-12-13T13:07:23Z

@alexshmmy
Yes, the table can be parsed very well.
But I don't want to parse the table into text, I want to turn it into an image and insert it into MD, just like a regular image.
In this way, I can use MLLM to explain it and avoid the trouble of parsing errors
Just like this

![table-1](Image location of screenshot of this table)

alexshmmy · 2024-12-13T13:48:16Z

@Zilong01 Alright! I understood. Since there is way to save the tables as an images, in the row they appear in the text, wouldn't be fairly easy with a simple parser to inject the table images in the text of the output .md file? You only need to make sure where exactly to inject, i.e., the positions that the parser would put the table there (the code above puts the text tables in the correct positions). Then tables will be shown as images, as you desire. I think you can do that with help also of chatgpt and post the code here. Or any of the maintainers can write if there is plan of such extension.

Zilong01 added the enhancement New feature or request label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't parse the table, treats it as an image #590

Doesn't parse the table, treats it as an image #590

Zilong01 commented Dec 13, 2024

alexshmmy commented Dec 13, 2024 •

edited

Loading

Zilong01 commented Dec 13, 2024

alexshmmy commented Dec 13, 2024 •

edited

Loading

Doesn't parse the table, treats it as an image #590

Doesn't parse the table, treats it as an image #590

Comments

Zilong01 commented Dec 13, 2024

alexshmmy commented Dec 13, 2024 • edited Loading

Zilong01 commented Dec 13, 2024

alexshmmy commented Dec 13, 2024 • edited Loading

alexshmmy commented Dec 13, 2024 •

edited

Loading

alexshmmy commented Dec 13, 2024 •

edited

Loading