You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To improve system usability and handle a broader range of document types, it is suggested to integrate OCR (Optical Character Recognition) capabilities. This enhancement will enable the system to process PDF files and images that do not contain embedded text but instead rely on scanned images or other visual formats.
Use Case:
A user uploads a PDF file that contains complicated structure (e.g., scanned documents). Currently, the system extracts 0 characters from such files, resulting in errors or failed processing steps like vector index construction. By integrating OCR, the system can extract meaningful text from these image-based files, allowing seamless processing.
The text was updated successfully, but these errors were encountered:
IANTHEREAL
changed the title
Support ingesting PDF files that contains picture
Add OCR Support for PDF and Image Files to Enhance System Usability
Jan 16, 2025
To improve system usability and handle a broader range of document types, it is suggested to integrate OCR (Optical Character Recognition) capabilities. This enhancement will enable the system to process PDF files and images that do not contain embedded text but instead rely on scanned images or other visual formats.
Use Case:
A user uploads a PDF file that contains complicated structure (e.g., scanned documents). Currently, the system extracts 0 characters from such files, resulting in errors or failed processing steps like vector index construction. By integrating OCR, the system can extract meaningful text from these image-based files, allowing seamless processing.
The text was updated successfully, but these errors were encountered: