Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placeholder elements in Powerpoint files have no size #584

Open
maciejwie opened this issue Dec 12, 2024 · 0 comments
Open

Placeholder elements in Powerpoint files have no size #584

maciejwie opened this issue Dec 12, 2024 · 0 comments
Assignees
Labels
bug Something isn't working pptx issue related to pptx backend

Comments

@maciejwie
Copy link

Bug

Some shapes with shape_type=MSO_SHAPE_TYPE.PLACEHOLDER in some PPTX files may have text and has_text_frame=True but without any bounding box coordinates (left, right, top, width, height = None). This causes generate_prov() to fail with an unhandled exception.

Steps to reproduce

Sample file: placeholder.pptx

from docling.document_converter import DocumentConverter
doc_converter = DocumentConverter()
doc_converter.convert('placeholder.pptx')

Results in:

File ~/dev/docling-test/.venv/lib/python3.12/site-packages/docling/backend/mspowerpoint_backend.py:106, in MsPowerpointDocumentBackend.generate_prov(self, shape, slide_ind, text)
    [104](https://file+.vscode-resource.vscode-cdn.net/dev/docling-test/~/dev/docling-test/.venv/lib/python3.12/site-packages/docling/backend/mspowerpoint_backend.py:104) width = shape.width
    [105](https://file+.vscode-resource.vscode-cdn.net/dev/docling-test/~/dev/docling-test/.venv/lib/python3.12/site-packages/docling/backend/mspowerpoint_backend.py:105) height = shape.height
--> [106](https://file+.vscode-resource.vscode-cdn.net/dev/docling-test/~/dev/docling-test/.venv/lib/python3.12/site-packages/docling/backend/mspowerpoint_backend.py:106) shape_bbox = [left, top, left + width, top + height]
    [107](https://file+.vscode-resource.vscode-cdn.net/dev/docling-test/~/dev/docling-test/.venv/lib/python3.12/site-packages/docling/backend/mspowerpoint_backend.py:107) shape_bbox = BoundingBox.from_tuple(shape_bbox, origin=CoordOrigin.BOTTOMLEFT)
    [108](https://file+.vscode-resource.vscode-cdn.net/dev/docling-test/~/dev/docling-test/.venv/lib/python3.12/site-packages/docling/backend/mspowerpoint_backend.py:108) # prov = [{"bbox": shape_bbox, "page": parent_slide, "span": [0, len(text)]}]

TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'

Docling version

Latest: 2.11.0.

Python version

Python 3.12.0

@maciejwie maciejwie added the bug Something isn't working label Dec 12, 2024
@cau-git cau-git added the pptx issue related to pptx backend label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pptx issue related to pptx backend
Projects
None yet
Development

No branches or pull requests

3 participants