Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add normalizedTextContents function #19027

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ryzokuken
Copy link
Collaborator

Move the code to concatenate and normalize the contents of a page into its own function in pdf_find_controller so that it could be reused elsewhere.

This is a small part of the ongoing work to address #3172, which I decided to break off and post already as a PR so that we could start iterating on it in parts and be more agile.

There's two open questions with this patch, beyond the common "what can we name this function":

  • Where should this new refactored function live? It's currently still in pdf_find_controller but that might not be ideal and we might want to move stuff to a more central location (this would incur a bigger diff unfortunately).
  • Do we need to normalize the text for our upcoming use case as well? While the concat logic is definitely required the normalization might not be entirely necessary especially with the v flag.

Move the code to concatenate and normalize the contents of a page into
its own function in pdf_find_controller so that it could be reused
elsewhere.
@Snuffleupagus
Copy link
Collaborator

This is a small part of the ongoing work to address #3172, which I decided to break off and post already as a PR so that we could start iterating on it in parts and be more agile.

Given that we don't really know what the final solution will look like yet, it's not entirely clear (at least to me) that a bunch of separate PRs is a good idea since it might lead to a lot of code churn (and changed git blame) that may or may not ultimately be necessary for the final patch(es)?

@marco-c
Copy link
Contributor

marco-c commented Nov 12, 2024

@Snuffleupagus me and @calixteman suggested @ryzokuken to break this down into multiple PRs to make reviewing easier. Another option could be to create the larger PR first (maybe as a draft, it doesn't have to be perfect) to show the full thing, then break it down into multiple PRs.

@Snuffleupagus
Copy link
Collaborator

Another option could be to create the larger PR first (maybe as a draft, it doesn't have to be perfect) to show the full thing, then break it down into multiple PRs.

That sounds better, at least to me, since that could help us avoid making "unnecessary" changes along the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants