Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Cloud Vision to PAGE-XML #125

Open
kba opened this issue Apr 29, 2020 · 8 comments
Open

Google Cloud Vision to PAGE-XML #125

kba opened this issue Apr 29, 2020 · 8 comments
Milestone

Comments

@kba
Copy link
Collaborator

kba commented Apr 29, 2020

It was mentioned before but @cneud just reminded me of https://github.com/PRImA-Research-Lab/cloud-vision-ocr-to-page . Should not be too hard to integrate and would allow using GCV results in OCR-D/Transkribus/OCR4all.

BTW: Has anyone experience with the Azure Computer Vision API in the context of OCR? As a sign of goodwill in times of Covid-19, they are currently offering a generous free tier including access to the vision API. Would be interesting to compare.

@stweil stweil added this to the v1.0.0 milestone Jun 25, 2020
@bertsky
Copy link
Contributor

bertsky commented Nov 17, 2022

BTW the existing integration of GCV as part of the PRImA converter (transform gcv page linking to alto page) is broken: it delegates to java -jar PageConverter.jar -source-xml $INFILE instead of java -jar PageConverter.jar -source-json $INFILE:

java -jar "$JAR" -neg-coords toZero -source-xml "$INFILE" -target-xml "$OUTFILE" -convert-to LATEST 2>&1

@stweil
Copy link
Member

stweil commented Nov 17, 2022

Thanks. So it was broken right from the beginning (commit 7332869).

@bertsky
Copy link
Contributor

bertsky commented Nov 17, 2022

So it was broken right from the beginning (commit 7332869).

I'm not sure. Perhaps the PRImA convert was capable of detecting the format automatically before. But it does not look like it.

Anyway, here is a fix: #156

@stweil
Copy link
Member

stweil commented Nov 17, 2022

I tried it with fixed arguments, and it fails:

java -jar vendor/JPageConverter/PageConverter.jar -neg-coords toZero -source-json 1850-Baptis-EMU-0204.txt -target-xml 1850-Baptis-EMU-0204.xml -convert-to LATEST
null
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.primaresearch.dla.page.Page.getLayout()" because "page" is null
	at org.primaresearch.dla.page.converter.PageConverter.handleNegativeCoordinates(PageConverter.java:449)
	at org.primaresearch.dla.page.converter.PageConverter.run(PageConverter.java:266)
	at org.primaresearch.dla.page.converter.PageConverter.main(PageConverter.java:161)

@bertsky
Copy link
Contributor

bertsky commented Nov 17, 2022

I tried it with fixed arguments, and it fails:

I know. That's because in this example, the input data is incomplete. See here

@bertsky
Copy link
Contributor

bertsky commented Jun 6, 2023

Since #156 we do have a working GCV converter here based on https://github.com/PRImA-Research-Lab/prima-page-converter, so there is no actual need for https://github.com/PRImA-Research-Lab/cloud-vision-ocr-to-page.

Comparing both implementations, IIUC we have:

implementation cloud-vision-ocr-to-page prima-page-converter with json input
external dependencies GCV (Java API) none (standalone)
usage online (network API) offline (JSON)
can also output ALTO no yes
yields @imageFilename yes no
yields width and height yes yes
coordinates bbox bbox
paragraphs recursive TextRegion recursive TextRegion
other region types Image+Separator+Graphic+Table Image+Separator+Graphic+Table
aggregate words to lines yes yes
confidence yes no

@kba
Copy link
Collaborator Author

kba commented Jun 9, 2023

Thanks for the comparison, very helpful.

implementation cloud-vision-ocr-to-page prima-page-converter with json input
external dependencies GCV (Java API) none (standalone)
usage online (network API) offline (JSON)

IMHO these are the strongest reasons against the cloud-vision-ocr-to-page approach.

It's unfortunate that the confidences aren't serialized, like gcv2hocr does with x_wconf for hOCR though, but with development largely stalled, nothing much we can do except rewrite ourselves.

@bertsky
Copy link
Contributor

bertsky commented Jun 9, 2023

It's unfortunate that the confidences aren't serialized, like gcv2hocr does with x_wconf for hOCR though, but with development largely stalled, nothing much we can do except rewrite ourselves.

We can (fix ourselves and) ship our own builds. I have successfully set up Eclipse and can compile most of the modules (e.g. libs, PageViewer, PageConverter).

(I have done that with PageViewer including validator error messages.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants