Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect duplicated pages by visually comparing them #3

Open
stefnotch opened this issue Nov 30, 2022 · 2 comments
Open

Detect duplicated pages by visually comparing them #3

stefnotch opened this issue Nov 30, 2022 · 2 comments

Comments

@stefnotch
Copy link
Collaborator

Someone finally sent me some PDFs that have duplicated pages where the pages metadata got lost.

Here, the best way of identifying duplicates would probably be:

  • Comparing text (easy one, me thinks)
  • Comparing the visual output, and preferably checking if a lot of pixels have either become darker (usual slides: white background, dark foreground) or lighter (dark theme slides). This is rather slow. (Use a library like https://github.com/mapbox/pixelmatch )
@stefnotch
Copy link
Collaborator Author

@stefnotch
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant