-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't parse the table, treats it as an image #590
Comments
@Zilong01 Not sure if I understood your question. Tables are properly parsed and saved to .md file, irrespectively if you chose to save also the tables as .png or not. For example for the basic code, without saving the tables as .png, but properly parsing them: from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling_core.types.doc import ImageRefMode, PictureItem
from pathlib import Path
IMAGE_RESOLUTION_SCALE = 2.0
def pdf_to_md(input_doc_path, output_dir):
pipeline_options = PdfPipelineOptions()
pipeline_options.images_scale = IMAGE_RESOLUTION_SCALE
pipeline_options.generate_picture_images = True
pipeline_options.generate_page_images = True
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True
pipeline_options.table_structure_options.do_cell_matching = True
doc_converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)
conv_res = doc_converter.convert(input_doc_path)
doc_filename = conv_res.input.file.stem
table_counter = 0
picture_counter = 0
for element, _level in conv_res.document.iterate_items():
if isinstance(element, PictureItem):
picture_counter += 1
element_image_filename = (
output_dir / f"{doc_filename}-picture-{picture_counter}.png"
)
with element_image_filename.open("wb") as fp:
element.get_image(conv_res.document).save(fp, "PNG")
# Save markdown with embedded pictures
md_filename = output_dir / f"{doc_filename}-with-images.md"
conv_res.document.save_as_markdown(md_filename, image_mode=ImageRefMode.EMBEDDED)
if __name__ == "__main__":
pdf_file_path = "https://arxiv.org/pdf/2206.01062"
output_dir = Path(f"./outpu")
output_dir.mkdir(parents=True, exist_ok=True)
pdf_to_md(pdf_file_path, output_dir) |
@alexshmmy ![table-1](Image location of screenshot of this table) |
@Zilong01 Alright! I understood. Since there is way to save the tables as an images, in the row they appear in the text, wouldn't be fairly easy with a simple parser to inject the table images in the text of the output .md file? You only need to make sure where exactly to inject, i.e., the positions that the parser would put the table there (the code above puts the text tables in the correct positions). Then tables will be shown as images, as you desire. I think you can do that with help also of chatgpt and post the code here. Or any of the maintainers can write if there is plan of such extension. |
Thank you guys for your work.
While using it I found out that docling can save tables as images by doing the following
In my task, due to the complexity of tables, many of which are not recognized correctly, I wanted it to be treated as an image without parsing, so I set up the
But this causes the table not to be inserted into the resulting markdown, but the table is still recognized. So I think the table recognition can be inserted as an image into the original text position, now is there a way to achieve this task? Or give me a little idea for modification, thanks!
The text was updated successfully, but these errors were encountered: