Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File widget do not load all file content #6068

Open
flefevre opened this issue Feb 1, 2025 · 2 comments
Open

File widget do not load all file content #6068

flefevre opened this issue Feb 1, 2025 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@flefevre
Copy link

flefevre commented Feb 1, 2025

Bug Description

If you create a workflow with a file input and a data output, and try to load a pdf with more than 8 pages, you will that the file output content is truncated at 7 or 8 pages...

Reproduction

Steps
1-create a flow
2-with a file loader
3-put a pdf with more than 8 pages
4-execute the flow
5-output is truncated at 8 pages

Expected behavior

Should load all pages of the pdf

Who can help?

No response

Operating System

Ubuntu

Langflow Version

1.1.2

Python Version

None

Screenshot

No response

Flow File

No response

@flefevre flefevre added the bug Something isn't working label Feb 1, 2025
Copy link

dosubot bot commented Feb 1, 2025

The issue you're experiencing with the file widget truncating PDF content at 7 or 8 pages might be related to the PDF parsing method used in Langflow. The library pypdf is used for PDF parsing, specifically the PdfReader class, which extracts text from PDF files [1]. There might be a limitation or bug in how the parse_pdf_to_text function processes the pages, or there could be a configuration setting affecting this behavior.

To address this, you might want to check if there are any updates or patches available for the pypdf library or Langflow itself that address this issue. Additionally, reviewing the implementation of the parse_pdf_to_text function could provide insights into any potential limitations or areas for improvement.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@flefevre
Copy link
Author

flefevre commented Feb 2, 2025

BasicFile.json

the simpliest file widget component trying to load a basic pdf file https://cyber.gouv.fr/sites/default/files/document/Recommandations_de_s%C3%A9curit%C3%A9_pour_un_syst%C3%A8me_d_IA_g%C3%A9n%C3%A9rative.pdf

only the first pages are processed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants