-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow conversion without image #26
Conversation
- add convert_file_without_image function, which can convert AWS-Textract-JSON to PAGE-XML without reading the original input image; this function takes the image’s dimensions as inputs, instead of reading these from the image - add optional “image-width” and “image-height” options to the CLI command, which triggers the use of the new function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks @joewiz, much appreciated!
Your implementation is already near perfect IMO – see below for some cosmetic requests. (If you don't want to work them in, let me know and I'll do it myself.)
Strange! Sounds like your |
Co-authored-by: Robert Sachunsky <[email protected]>
@bertsky Thank you for the review and suggestions! I've incorporated all of them. I would be happy to clean up the PR with a rebase, or the PR could be squashed - either way works for me. |
Excellent, many thanks! I have pushed some fixups and extended the tests. Will merge as soon as CI is happy. |
As proposed in #25, this PR adds a
convert_file_without_image
function, which can convert AWS-Textract-JSON to PAGE-XML without needing to read the original input image. This function takes the image’s dimensions as inputs, instead of reading these from the image.The PR also adds “image-width” and “image-height” options to the CLI command, which triggers the use of the new function if they are supplied.
I lack python skills and cobbled this together, so pardon the lack of tests covering the new functionality.
I did run
make test-api
, which passed. (Butmake test-cli
failed with the same error as before my changes ("No rule to make targetOUT
, needed bytest-cli
. Stop.")I performed the following tests manually, confirming that the results were as expected:
Test 1: Using original features
The following command uses the original features of the utility:
... returned a 66k PAGE-XML file, with the following excerpt on line 8 referencing the input image file:
Test 2: Using the new features via CLI
The following command uses the new features added in this PR:
... returned an identical file, except for the following excerpt on line 8, corresponding to the one above:
I'd greatly appreciate any feedback!