Introduce tests for redaction to make changes easier #977
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adding tests in preparation to fix #316
In preparation to fix #316 'PDFs get way too large when redacting', I propose to add tests for the redaction logic.
This allows to execute that logic without to run the full application, reducing the feedback time.
Step 1:
A first characterization test, testing the redaction behavior with empty instructions.
Step 2:
A second characterization test, testing the redaction of the right halve of the pdf page.
Please tell me if you approve this approach.
I made a small refactoring that helped me to get the function under test.
The current approach
Right now, for every page with redactions:
This is quite a resource hungry approach, in cpu and file size.
Thoughts how to fix the file size problem:
An alternative approach must reliably remove any redacted content from the pdf and place black rectangles where the content was.
Placing the rectangles is the easy part:
Canvas.drawImage states this:
Since
drawImage()
can handle scaling, every rectangle could be the same 1x1px black image.The hard part is to remove the content.
It must not only be covered, but completely removed from the pdf.
Ps: It seems the pipeline can't execute
pdf_utils.get_image_from_pdf_page()
because it depends onpdftoppm
. Well, not my focus right now. :)