Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support conversion from and to Textract JSON #122

Open
scottschreckengaust opened this issue Jan 30, 2020 · 4 comments
Open

Support conversion from and to Textract JSON #122

scottschreckengaust opened this issue Jan 30, 2020 · 4 comments
Labels
enhancement Any enhancement on the software itself (excluding new transformations)

Comments

@scottschreckengaust
Copy link

Textract has an output results format in JSON.

https://docs.aws.amazon.com/textract/latest/dg/textract-dg.pdf

Specifically, the three types of analysis, https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html for the categories:

  1. text,
  2. forms, and
  3. tables
@stweil
Copy link
Member

stweil commented May 5, 2023

Conversion from Textract to PAGE XML was now added with pull request #160.

@stweil stweil added the enhancement Any enhancement on the software itself (excluding new transformations) label May 5, 2023
@bertsky
Copy link
Contributor

bertsky commented Jun 6, 2023

Alas, the new converter is still incomplete, so

  • forms, and
  • tables

do not work yet. See slub/textract2page#2

@bertsky
Copy link
Contributor

bertsky commented Aug 16, 2023

Update: tables work now, but the converter submodule needs to be updated here

@kba
Copy link
Collaborator

kba commented Sep 6, 2023

Update: tables work now, but the converter submodule needs to be updated here

I've updated the vendor submodules, including textract2page in #166. The tables branch is not yet merged to master though and I think there are files missing to properly run the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any enhancement on the software itself (excluding new transformations)
Projects
None yet
Development

No branches or pull requests

4 participants