Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async support #109

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Add async support #109

wants to merge 3 commits into from

Conversation

hellerphilipp
Copy link

This PR introduces asynchronous APIs and refactors existing HTTP handling to improve performance and flexibility in LayoutPDFReader. Key changes include:

  • Unit Tests: Added tests for read_pdf (from URL) to ensure my changes aren't breaking anything.
  • Refactor to httpx: Replaced urllib3 with httpx in _download_pdf and _parse_pdf for support of asynchronous operations.
  • Async Alternatives: Added _download_pdf_async, _parse_pdf_async, and read_pdf_async, allowing non-blocking operations for both HTTP requests and local file reading.

This PR resolves issue #44.

- Implement test_read_pdf_with_url in test_file_reader.py to verify that LayoutPDFReader can successfully read and parse a PDF from a web URL
- Ensures read_pdf method interacts with the actual API endpoint and returns a valid Document object
- Updated _download_pdf and _parse_pdf methods to use httpx for HTTP requests, replacing urllib3 to enable easier implementation of asynchronous operations in the future
- Maintains existing functionality for PDF downloading and parsing, preserving API interactions and response handling
- Introduced async methods _download_pdf_async and _parse_pdf_async for non-blocking HTTP requests using httpx
- Added read_pdf_async alongside read_pdf, supporting asynchronous local file reading with aiofiles and URL downloads
- Added additional unit test for read_pdf_async (with PDF from URL)

Addresses issue nlmatics#44
@hellerphilipp hellerphilipp mentioned this pull request Nov 10, 2024
@hellerphilipp
Copy link
Author

hellerphilipp commented Nov 10, 2024

Hi @ansukla, I hope this is helpful and implemented in the spirit of the project. As this is my first open-source contribution I welcome any feedback!

Best,
Phil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant