Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage with prodigy's pdf.layout.fetch recipe exits uncleanly #21

Open
thatbudakguy opened this issue Dec 17, 2024 · 1 comment
Open

Comments

@thatbudakguy
Copy link

thatbudakguy commented Dec 17, 2024

Filing this here since https://github.com/explosion/prodigy-pdf doesn't have issues enabled.

Running the pdf.layout.fetch recipe seems to always exit with:

ℹ Creating preprocessed PDFs
✔ Saved fetched data to local file
assets/my_file.jsonl
-> Cannot close object, library is destroyed. This may cause a memory leak!
zsh: bus error python -m prodigy pdf.layout.fetch assets/my_file.jsonl

The output is there, but the memory leak warnings seem like a concern. I see similar output if I CTRL+C during docling parsing; maybe the prodigy recipe is causing the parser to exit prematurely?

You can reproduce this with this PDF by putting it in a directory called pdf_test and running something like:

python -m prodigy pdf.layout.fetch assets/test_output.jsonl blank:en pdf_test

Software versions:

  • spaCy 3.7.5
  • prodigy 1.17.2
  • spacy_layout 0.0.9
  • prodigy_pdf 0.4.0
  • docling 2.13.0

Hardware: Apple M3 Pro, MacOS 14.7.1

@thatbudakguy
Copy link
Author

(I can move this to the prodigy forum if desired; just realized that's why issues are not enabled there!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant