Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[16.0] pdf content indexing: PyMupdf + Tesseract #431

Closed
wants to merge 17 commits into from

Commits on Sep 7, 2023

  1. [ADD] document_ocr

    hbrunn authored and len-foss committed Sep 7, 2023
    Configuration menu
    Copy the full SHA
    8d69a35 View commit details
    Browse the repository at this point in the history
  2. [FIX] CI

    [ADD] tests for attachments_to_filesystem
    hbrunn authored and len-foss committed Sep 7, 2023
    Configuration menu
    Copy the full SHA
    d3a37e5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    5ee5b3f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8ac1ca8 View commit details
    Browse the repository at this point in the history
  5. [FIX] use png as for pillow interchange

    hbrunn authored and len-foss committed Sep 7, 2023
    Configuration menu
    Copy the full SHA
    c10e84e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    a00735b View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    f1f13f1 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    5edbe16 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    73cff87 View commit details
    Browse the repository at this point in the history
  10. [IMP] attachment_indexation_ocr: convert pdf with fitz

    This is more performant and easily split pages to avoid
    getting into errors with maximum image size of tessearact.
    len-foss committed Sep 7, 2023
    Configuration menu
    Copy the full SHA
    6196f30 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    ecadc83 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    f31245a View commit details
    Browse the repository at this point in the history

Commits on Sep 8, 2023

  1. Configuration menu
    Copy the full SHA
    7f394be View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2024

  1. Configuration menu
    Copy the full SHA
    8d1125a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1125d29 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8e98036 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    834573b View commit details
    Browse the repository at this point in the history