-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_pdf fails on specific pdf locally, not through hosted api #63
Comments
Hey, @Ianpwest did you manage to solve this? |
@wolfassi123 No, there were also some other parsing issues with different character sets. The library is promising but seemingly under supported. No movement on my tickets. |
Hello @Ianpwest, @wolfassi123, I have fixed the issue and seems to be working with the sample PDF provided here. Can you do a pull from the main branch of nlm-ingestor and verify? |
Switching to the docker image for nlm-ingestor in this comment worked for me. |
If you look neatly you'll see that the " and the [ are switched in order in the local call. The error handling of a nonexistent renderFormat could be better. |
PDF in question:
JTR.pdf
This api call works great
llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all" pdf_url = "JTR.pdf" pdf_reader = LayoutPDFReader(llmsherpa_api_url) doc = pdf_reader.read_pdf(pdf_url)
This local call fails
llmsherpa_api_url = "[http://localhost:5010/api/parseDocument?renderFormat=all"](http://localhost:5010/api/parseDocument?renderFormat=all%22) pdf_url = "JTR.pdf" pdf_reader = LayoutPDFReader(llmsherpa_api_url) doc = pdf_reader.read_pdf(pdf_url)
The local version is running from the latest docker build. Other pdfs work fine. Is there a way to get a better error message? Currently receiving: KeyError: 'return_dict'
I noticed there are other issues open around this error but did not find any matching this case where it works on one and not the other.
I appreciate your time and any insight. Thanks!
The text was updated successfully, but these errors were encountered: