You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Since the pdf_infer_table_structure is deprecated. Using the simple partition strategy should work, but it doesn't.
Instead skip_infer_table_types has pdf mentioned in it, hence it requires to explicitly pass "skip_infer_table_types=["jpg", "png", "heic"] with it.
Why do we have "pdf" present under skip_infer_table_types ?
To Reproduce
Provide a code snippet that reproduces the issue.
from unstructured.partition.auto import partition
elements = partition(filename="test/table.pdf", strategy='hi_res', chunking_strategy='by_title')
tables = [el for el in elements if el.category == "Table" and table.metadata.text_as_html]
Expected behavior
Ideally, just using auto.partition() should infer table structure for PDF too, as it does of other file types.
Screenshots
Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
Since the pdf_infer_table_structure is deprecated. Using the simple partition strategy should work, but it doesn't.
Instead skip_infer_table_types has pdf mentioned in it, hence it requires to explicitly pass "skip_infer_table_types=["jpg", "png", "heic"] with it.
Why do we have "pdf" present under
skip_infer_table_types
?To Reproduce
Provide a code snippet that reproduces the issue.
Expected behavior
Ideally, just using auto.partition() should infer table structure for PDF too, as it does of other file types.
Screenshots
Environment Info
Please run
python scripts/collect_env.py
and paste the output here.This will help us understand more about the environment in which the bug occurred.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: