Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/With auto.partition table extraction not enabled in pdf #3844

Open
Akashtyagi opened this issue Dec 18, 2024 · 0 comments
Open

bug/With auto.partition table extraction not enabled in pdf #3844

Akashtyagi opened this issue Dec 18, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Akashtyagi
Copy link

Akashtyagi commented Dec 18, 2024

Describe the bug
Since the pdf_infer_table_structure is deprecated. Using the simple partition strategy should work, but it doesn't.

Instead skip_infer_table_types has pdf mentioned in it, hence it requires to explicitly pass "skip_infer_table_types=["jpg", "png", "heic"] with it.

Why do we have "pdf" present under skip_infer_table_types ?

To Reproduce
Provide a code snippet that reproduces the issue.

from unstructured.partition.auto import partition
elements = partition(filename="test/table.pdf", strategy='hi_res', chunking_strategy='by_title')
tables = [el for el in elements if el.category == "Table" and table.metadata.text_as_html]

Expected behavior
Ideally, just using auto.partition() should infer table structure for PDF too, as it does of other file types.

Screenshots
Screenshot 2024-12-18 at 5 09 57 PM

Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.

Additional context
Add any other context about the problem here.

@Akashtyagi Akashtyagi added the bug Something isn't working label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant