Skip to content

Commit

Permalink
docs: Add example for inspection of picture content (DS4SD#624)
Browse files Browse the repository at this point in the history
* chore: Add example for inspection of picture content

Signed-off-by: Christoph Auer <[email protected]>

* fix: Test case re-generation

Signed-off-by: Christoph Auer <[email protected]>

* fix: Test case re-generation only on CPU

Signed-off-by: Christoph Auer <[email protected]>

* fix: Add missing GT files

Signed-off-by: Christoph Auer <[email protected]>

---------

Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Václav Vančura <[email protected]>
  • Loading branch information
cau-git authored and vancura committed Feb 6, 2025
1 parent 4214bee commit 48ba424
Show file tree
Hide file tree
Showing 34 changed files with 171 additions and 23 deletions.
29 changes: 29 additions & 0 deletions docs/examples/inspect_picture_content.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from docling_core.types.doc import TextItem

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption

source = "tests/data/amt_handbook_sample.pdf"

pipeline_options = PdfPipelineOptions()
pipeline_options.images_scale = 2
pipeline_options.generate_page_images = True

doc_converter = DocumentConverter(
format_options={InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)}
)

result = doc_converter.convert(source)

doc = result.document

for picture in doc.pictures:
# picture.get_image(doc).show() # display the picture
print(picture.caption_text(doc), " contains these elements:")

for item, level in doc.iterate_items(root=picture, traverse_pictures=True):
if isinstance(item, TextItem):
print(item.text)

print("\n")
Binary file added tests/data/amt_handbook_sample.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v1/2203.01017v2.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v1/2203.01017v2.pages.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v1/2206.01062.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v1/2206.01062.pages.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v1/2305.03393v1-pg9.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v1/2305.03393v1.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v1/2305.03393v1.pages.json

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions tests/data/groundtruth/docling_v1/amt_handbook_sample.doctags.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<document>
<paragraph><location><page_1><loc_12><loc_88><loc_53><loc_94></location>pulleys, provided the inner race of the bearing is clamped to the supporting structure by the nut and bolt. Plates must be attached to the structure in a positive manner to eliminate rotation or misalignment when tightening the bolts or screws.</paragraph>
<paragraph><location><page_1><loc_12><loc_77><loc_53><loc_86></location>The two general types of self-locking nuts currently in use are the all-metal type and the fiber lock type. For the sake of simplicity, only three typical kinds of self-locking nuts are considered in this handbook: the Boots self-locking and the stainless steel self-locking nuts, representing the all-metal types; and the elastic stop nut, representing the fiber insert type.</paragraph>
<subtitle-level-1><location><page_1><loc_12><loc_73><loc_28><loc_75></location>Boots Self-Locking Nut</subtitle-level-1>
<paragraph><location><page_1><loc_12><loc_64><loc_54><loc_73></location>The Boots self-locking nut is of one piece, all-metal construction designed to hold tight despite severe vibration. Note in Figure 7-26 that it has two sections and is essentially two nuts in one: a locking nut and a load-carrying nut. The two sections are connected with a spring, which is an integral part of the nut.</paragraph>
<paragraph><location><page_1><loc_12><loc_52><loc_53><loc_62></location>The spring keeps the locking and load-carrying sections such a distance apart that the two sets of threads are out of phase or spaced so that a bolt, which has been screwed through the load-carrying section, must push the locking section outward against the force of the spring to engage the threads of the locking section properly.</paragraph>
<paragraph><location><page_1><loc_12><loc_38><loc_54><loc_50></location>The spring, through the medium of the locking section, exerts a constant locking force on the bolt in the same direction as a force that would tighten the nut. In this nut, the load-carrying section has the thread strength of a standard nut of comparable size, while the locking section presses against the threads of the bolt and locks the nut firmly in position. Only a wrench applied to the nut loosens it. The nut can be removed and reused without impairing its efficiency.</paragraph>
<paragraph><location><page_1><loc_12><loc_33><loc_53><loc_36></location>Boots self-locking nuts are made with three different spring styles and in various shapes and sizes. The wing type that is</paragraph>
<caption><location><page_1><loc_12><loc_8><loc_31><loc_9></location>Figure 7-26. Self-locking nuts.</caption>
<figure>
<location><page_1><loc_12><loc_10><loc_52><loc_31></location>
<caption>Figure 7-26. Self-locking nuts.</caption>
</figure>
<paragraph><location><page_1><loc_54><loc_85><loc_95><loc_94></location>the most common ranges in size for No. 6 up to 1 / 4 inch, the Rol-top ranges from 1 / 4 inch to 1 / 6 inch, and the bellows type ranges in size from No. 8 up to 3 / 8 inch. Wing-type nuts are made of anodized aluminum alloy, cadmium-plated carbon steel, or stainless steel. The Rol-top nut is cadmium-plated steel, and the bellows type is made of aluminum alloy only.</paragraph>
<paragraph><location><page_1><loc_54><loc_83><loc_55><loc_85></location>.</paragraph>
<subtitle-level-1><location><page_1><loc_54><loc_82><loc_76><loc_83></location>Stainless Steel Self-Locking Nut</subtitle-level-1>
<paragraph><location><page_1><loc_54><loc_54><loc_96><loc_81></location>The stainless steel self-locking nut may be spun on and off by hand as its locking action takes places only when the nut is seated against a solid surface and tightened. The nut consists of two parts: a case with a beveled locking shoulder and key and a thread insert with a locking shoulder and slotted keyway. Until the nut is tightened, it spins on the bolt easily, because the threaded insert is the proper size for the bolt. However, when the nut is seated against a solid surface and tightened, the locking shoulder of the insert is pulled downward and wedged against the locking shoulder of the case. This action compresses the threaded insert and causes it to clench the bolt tightly. The cross-sectional view in Figure 7-27 shows how the key of the case fits into the slotted keyway of the insert so that when the case is turned, the threaded insert is turned with it. Note that the slot is wider than the key. This permits the slot to be narrowed and the insert to be compressed when the nut is tightened.</paragraph>
<subtitle-level-1><location><page_1><loc_54><loc_51><loc_65><loc_52></location>Elastic Stop Nut</subtitle-level-1>
<paragraph><location><page_1><loc_54><loc_47><loc_93><loc_50></location>The elastic stop nut is a standard nut with the height increased to accommodate a fiber locking collar. This</paragraph>
<caption><location><page_1><loc_54><loc_8><loc_81><loc_10></location>Figure 7-27. Stainless steel self-locking nut.</caption>
<figure>
<location><page_1><loc_54><loc_11><loc_94><loc_46></location>
<caption>Figure 7-27. Stainless steel self-locking nut.</caption>
</figure>
</document>
1 change: 1 addition & 0 deletions tests/data/groundtruth/docling_v1/amt_handbook_sample.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"_name": "", "type": "pdf-document", "description": {"title": null, "abstract": null, "authors": null, "affiliations": null, "subjects": null, "keywords": null, "publication_date": null, "languages": null, "license": null, "publishers": null, "url_refs": null, "references": null, "publication": null, "reference_count": null, "citation_count": null, "citation_date": null, "advanced": null, "analytics": null, "logs": [], "collection": null, "acquisition": null}, "file-info": {"filename": "amt_handbook_sample.pdf", "filename-prov": null, "document-hash": "4ba7cdbd9ce8155d692d8f477f88bb3ec1acc2a463cf1e0209d1e624e58ebce9", "#-pages": 1, "collection-name": null, "description": null, "page-hashes": [{"hash": "f31706a847734c62e1e41f9f792c756283d1d4955552c1cc7f5e23c351bdd7cb", "model": "default", "page": 1}]}, "main-text": [{"prov": [{"bbox": [71.99212646484375, 681.3463745117188, 314.11212158203125, 730.3163452148438], "page": 1, "span": [0, 244], "__ref_s3_data": null}], "text": "pulleys, provided the inner race of the bearing is clamped to the supporting structure by the nut and bolt. Plates must be attached to the structure in a positive manner to eliminate rotation or misalignment when tightening the bolts or screws.", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [71.99230194091797, 593.8463745117188, 313.15460205078125, 667.8163452148438], "page": 1, "span": [0, 376], "__ref_s3_data": null}], "text": "The two general types of self-locking nuts currently in use are the all-metal type and the fiber lock type. For the sake of simplicity, only three typical kinds of self-locking nuts are considered in this handbook: the Boots self-locking and the stainless steel self-locking nuts, representing the all-metal types; and the elastic stop nut, representing the fiber insert type.", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [71.99230194091797, 568.8463745117188, 167.27230834960938, 580.1864013671875], "page": 1, "span": [0, 22], "__ref_s3_data": null}], "text": "Boots Self-Locking Nut", "type": "subtitle-level-1", "payload": null, "name": "Section-header", "font": null}, {"prov": [{"bbox": [71.99229431152344, 491.84637451171875, 318.49224853515625, 565.8163452148438], "page": 1, "span": [0, 319], "__ref_s3_data": null}], "text": "The Boots self-locking nut is of one piece, all-metal construction designed to hold tight despite severe vibration. Note in Figure 7-26 that it has two sections and is essentially two nuts in one: a locking nut and a load-carrying nut. The two sections are connected with a spring, which is an integral part of the nut.", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [71.99229431152344, 404.34637451171875, 316.65728759765625, 478.3163757324219], "page": 1, "span": [0, 332], "__ref_s3_data": null}], "text": "The spring keeps the locking and load-carrying sections such a distance apart that the two sets of threads are out of phase or spaced so that a bolt, which has been screwed through the load-carrying section, must push the locking section outward against the force of the spring to engage the threads of the locking section properly.", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [71.99229431152344, 291.84637451171875, 318.8122863769531, 390.8163757324219], "page": 1, "span": [0, 477], "__ref_s3_data": null}], "text": "The spring, through the medium of the locking section, exerts a constant locking force on the bolt in the same direction as a force that would tighten the nut. In this nut, the load-carrying section has the thread strength of a standard nut of comparable size, while the locking section presses against the threads of the bolt and locks the nut firmly in position. Only a wrench applied to the nut loosens it. The nut can be removed and reused without impairing its efficiency.", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [71.99229431152344, 254.34637451171875, 313.91229248046875, 278.3163757324219], "page": 1, "span": [0, 122], "__ref_s3_data": null}], "text": "Boots self-locking nuts are made with three different spring styles and in various shapes and sizes. The wing type that is", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [72.0, 60.99040222167969, 184.14828491210938, 71.80239868164062], "page": 1, "span": [0, 31], "__ref_s3_data": null}], "text": "Figure 7-26. Self-locking nuts.", "type": "caption", "payload": null, "name": "Caption", "font": null}, {"name": "Picture", "type": "figure", "$ref": "#/figures/0"}, {"prov": [{"bbox": [320.9923095703125, 656.3463745117188, 561.808349609375, 730.3163452148438], "page": 1, "span": [0, 368], "__ref_s3_data": null}], "text": "the most common ranges in size for No. 6 up to 1 / 4 inch, the Rol-top ranges from 1 / 4 inch to 1 / 6 inch, and the bellows type ranges in size from No. 8 up to 3 / 8 inch. Wing-type nuts are made of anodized aluminum alloy, cadmium-plated carbon steel, or stainless steel. The Rol-top nut is cadmium-plated steel, and the bellows type is made of aluminum alloy only.", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [320.99542236328125, 643.8463745117188, 325.99542236328125, 655.3163452148438], "page": 1, "span": [0, 1], "__ref_s3_data": null}], "text": ".", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [320.99542236328125, 631.3463745117188, 450.99542236328125, 642.6864013671875], "page": 1, "span": [0, 32], "__ref_s3_data": null}], "text": "Stainless Steel Self-Locking Nut", "type": "subtitle-level-1", "payload": null, "name": "Section-header", "font": null}, {"prov": [{"bbox": [320.99542236328125, 416.84637451171875, 568.00439453125, 628.3163452148438], "page": 1, "span": [0, 1015], "__ref_s3_data": null}], "text": "The stainless steel self-locking nut may be spun on and off by hand as its locking action takes places only when the nut is seated against a solid surface and tightened. The nut consists of two parts: a case with a beveled locking shoulder and key and a thread insert with a locking shoulder and slotted keyway. Until the nut is tightened, it spins on the bolt easily, because the threaded insert is the proper size for the bolt. However, when the nut is seated against a solid surface and tightened, the locking shoulder of the insert is pulled downward and wedged against the locking shoulder of the case. This action compresses the threaded insert and causes it to clench the bolt tightly. The cross-sectional view in Figure 7-27 shows how the key of the case fits into the slotted keyway of the insert so that when the case is turned, the threaded insert is turned with it. Note that the slot is wider than the key. This permits the slot to be narrowed and the insert to be compressed when the nut is tightened.", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [320.99542236328125, 391.84637451171875, 388.50543212890625, 403.1863708496094], "page": 1, "span": [0, 16], "__ref_s3_data": null}], "text": "Elastic Stop Nut", "type": "subtitle-level-1", "payload": null, "name": "Section-header", "font": null}, {"prov": [{"bbox": [320.99542236328125, 364.84637451171875, 552.351318359375, 388.8163757324219], "page": 1, "span": [0, 108], "__ref_s3_data": null}], "text": "The elastic stop nut is a standard nut with the height increased to accommodate a fiber locking collar. This", "type": "paragraph", "payload": null, "name": "Text", "font": null}, {"prov": [{"bbox": [321.0, 63.01040267944336, 481.6493225097656, 73.82240295410156], "page": 1, "span": [0, 46], "__ref_s3_data": null}], "text": "Figure 7-27. Stainless steel self-locking nut.", "type": "caption", "payload": null, "name": "Caption", "font": null}, {"name": "Picture", "type": "figure", "$ref": "#/figures/1"}, {"prov": [{"bbox": [537.9854125976562, 33.70970153808594, 560.775390625, 46.01969909667969], "page": 1, "span": [0, 4], "__ref_s3_data": null}], "text": "7-45", "type": "page-footer", "payload": null, "name": "Page-footer", "font": null}], "figures": [{"prov": [{"bbox": [70.59269714355469, 79.6090087890625, 309.863037109375, 242.77777099609375], "page": 1, "span": [0, 31], "__ref_s3_data": null}], "text": "Figure 7-26. Self-locking nuts.", "type": "figure", "payload": null, "bounding-box": null}, {"prov": [{"bbox": [320.4467468261719, 81.689208984375, 558.8576049804688, 352.359375], "page": 1, "span": [0, 46], "__ref_s3_data": null}], "text": "Figure 7-27. Stainless steel self-locking nut.", "type": "figure", "payload": null, "bounding-box": null}], "tables": [], "bitmaps": null, "equations": [], "footnotes": [], "page-dimensions": [{"height": 774.0, "page": 1, "width": 594.0}], "page-footers": [], "page-headers": [], "_s3_data": null, "identifiers": null}
31 changes: 31 additions & 0 deletions tests/data/groundtruth/docling_v1/amt_handbook_sample.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
pulleys, provided the inner race of the bearing is clamped to the supporting structure by the nut and bolt. Plates must be attached to the structure in a positive manner to eliminate rotation or misalignment when tightening the bolts or screws.

The two general types of self-locking nuts currently in use are the all-metal type and the fiber lock type. For the sake of simplicity, only three typical kinds of self-locking nuts are considered in this handbook: the Boots self-locking and the stainless steel self-locking nuts, representing the all-metal types; and the elastic stop nut, representing the fiber insert type.

## Boots Self-Locking Nut

The Boots self-locking nut is of one piece, all-metal construction designed to hold tight despite severe vibration. Note in Figure 7-26 that it has two sections and is essentially two nuts in one: a locking nut and a load-carrying nut. The two sections are connected with a spring, which is an integral part of the nut.

The spring keeps the locking and load-carrying sections such a distance apart that the two sets of threads are out of phase or spaced so that a bolt, which has been screwed through the load-carrying section, must push the locking section outward against the force of the spring to engage the threads of the locking section properly.

The spring, through the medium of the locking section, exerts a constant locking force on the bolt in the same direction as a force that would tighten the nut. In this nut, the load-carrying section has the thread strength of a standard nut of comparable size, while the locking section presses against the threads of the bolt and locks the nut firmly in position. Only a wrench applied to the nut loosens it. The nut can be removed and reused without impairing its efficiency.

Boots self-locking nuts are made with three different spring styles and in various shapes and sizes. The wing type that is

Figure 7-26. Self-locking nuts.
<!-- image -->

the most common ranges in size for No. 6 up to 1 / 4 inch, the Rol-top ranges from 1 / 4 inch to 1 / 6 inch, and the bellows type ranges in size from No. 8 up to 3 / 8 inch. Wing-type nuts are made of anodized aluminum alloy, cadmium-plated carbon steel, or stainless steel. The Rol-top nut is cadmium-plated steel, and the bellows type is made of aluminum alloy only.

.

## Stainless Steel Self-Locking Nut

The stainless steel self-locking nut may be spun on and off by hand as its locking action takes places only when the nut is seated against a solid surface and tightened. The nut consists of two parts: a case with a beveled locking shoulder and key and a thread insert with a locking shoulder and slotted keyway. Until the nut is tightened, it spins on the bolt easily, because the threaded insert is the proper size for the bolt. However, when the nut is seated against a solid surface and tightened, the locking shoulder of the insert is pulled downward and wedged against the locking shoulder of the case. This action compresses the threaded insert and causes it to clench the bolt tightly. The cross-sectional view in Figure 7-27 shows how the key of the case fits into the slotted keyway of the insert so that when the case is turned, the threaded insert is turned with it. Note that the slot is wider than the key. This permits the slot to be narrowed and the insert to be compressed when the nut is tightened.

## Elastic Stop Nut

The elastic stop nut is a standard nut with the height increased to accommodate a fiber locking collar. This

Figure 7-27. Stainless steel self-locking nut.
<!-- image -->

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v1/redp5110_sampled.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v2/2203.01017v2.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v2/2203.01017v2.pages.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v2/2206.01062.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v2/2206.01062.pages.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v2/2305.03393v1-pg9.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v2/2305.03393v1.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/data/groundtruth/docling_v2/2305.03393v1.pages.json

Large diffs are not rendered by default.

Loading

0 comments on commit 48ba424

Please sign in to comment.