Skip to content

Commit

Permalink
mostly CGL und error handling (#45)
Browse files Browse the repository at this point in the history
* [CGL] corrected comment on CheckForAcceptRequest

* [CODING] all incoming request will (even erroneous) will get and log corresponding requestID.
[CODING] resource manager now additionally relies on onw size of request queues. Should avoid cases there someone places many request at once

* [CODING] more error logs now contains requestID for trouble shooting

* [CODING] investigating code for deadlock if multiple requests  are placed at same time. do not use in prod

* [DEBUGGING] further investigation of a possible deadlock in addNewRequest

* [BUGFIX] further investigation of a hang bug

* [CODING] /ocr-status will now return a valid JSON with status "not found" and CODE 200 instead of 404 Status code

* [CGL] RequestID will now be shown in the logs instead of requestID

* [BUILD] removed netgo build flag from Makefile

* Makefile now builds static linked executables

* [BUGFIX] further investigation of a hang bug

* [BUGFIX] further investigation of a hang bug, removed mutex l

* [BUGFIX] further investigation of a hang bug, removed mutex l

* [BUGFIX] further investigation of a hang bug, removed mutex l

* [BUGFIX] further investigation of a hang bug, removed mutex l

* [BUGFIX] further investigation of a hang bug, removed mutex l

* [BUGFIX] further working on fixing hang bug on many simultaneous requests

* [BUGFIX] further working on fixing hang bug on many simultaneous requests

* [BUGFIX] further working on fixing hang bug on many simultaneous requests

* [BUGFIX] corrected detection of an invalid reply_to address

* [BUGFIX] fixed a bug there a deferred request with reply_to not set was returned without of request ID, so the requester didn't know which request to ask for

* [BUGFIX] fixed a bug there the deferred requests were still tracked till timeout even if client hat successful downloaded them already

* [BUGFIX] fixed race conditions on request counter and res manager
[TODO] fix goroutine leak at  [chan send, 3 minutes] ocr_rpc_client.go:221

* [BUGFIX] fixed race conditions request counter

* [CGL] fixed comments

* [CODING] added todo for fixing leaking go routines

* [BUGFIX] go routines are not leaking anymore. There now a bug if   "deferred": true, eply_to":"" are not set. The in-flight request queue won't be cleaned up for those requests. ocr_resultorage:72 needs to be considered

* [CODING] better logging upon shutdown signal

* [CODING] working on proper timeout cancel

* [CODING] working on proper timeout cancel

* [CODING] better logging in status handler

* [CODING] correct handling of goroutines with replyto not set and deferred is true

* [CGL] just some CGL

* [CODING] go mod tidy

* [CODING] updated dependency

* [FEATURE] if the first tiff to pdf converter in sandwich engine fails, the second one will be used in order process the request

* [CODING] removed unneeded logging

* [BUGFIX] proper file name on converter switch

* [CODING] flag result_optimize is less aggressive. for gs the dCompatibilityLevel level is now 1.7 and dPDFSETTINGS=/prepress. This will result in bigger pdf with more quality

* [CODING] Update dependencies for security reason
[CODING] Transition from streadway/amqp to rabbitmq/amqp091-go

* [CODING ] error handling on os.Remove in preprocessor_rpc_worker.go

* [CGL, BUGFIXES] error handling on os.Remove and grammar spelling

* [BUGFIXES] error handling on os.Remove

Co-authored-by: Artem Mil <[email protected]>
  • Loading branch information
xf0e and Artem Mil authored Mar 3, 2022
1 parent 0a284f4 commit 973a8fb
Show file tree
Hide file tree
Showing 16 changed files with 423 additions and 271 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,11 @@ The ip address `10.0.2.15` will be used as the `RABBITMQ_HOST` env variable belo
* [Install docker-compose](https://docs.docker.com/compose/)
* `git clone https://github.com/tleyden/open-ocr.git`
* `cd open-ocr/docker-compose`
* Type ```./run.sh ``` (in case you don't have execute right type ```sudo chmod +x run.sh```
* Type ```./run.sh ``` (in case you don't have "execute" right type ```sudo chmod +x run.sh```
* The runner will ask you if you want to delete the images (choose y or n for each)
* The runner will ask you to choose between version 1 and 2
* Version 1 is using the ocr Tesseract 3.04. The memory usage is light. It is pretty fast and not costly in term of size (a simple aws instance with 1GB of ram and 8GB of storage is sufficiant). Result are acceptable
* Version 2 is using the ocr Tesseract 4.00. The memory usage is light. It is less fast than tesseract 3 and more costly in term of size (an simple aws instance with 1GB of ram is sufficient but with an EBS of 16GB of storage). Result are really better compared to version 3.04.
* Version 1 is using the ocr Tesseract 3.04. The memory usage is light. It is pretty fast and not costly in terms of size (a simple aws instance with 1GB of ram and 8GB of storage is sufficiant). Result are acceptable
* Version 2 is using the ocr Tesseract 4.00. The memory usage is light. It is less fast than tesseract 3 and more costly in terms of size (a simple aws instance with 1GB of ram is sufficient but with an EBS of 16GB of storage). Result are really better compared to version 3.04.
* To see a comparative you can have a look to the [official page of tesseract](https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance)


Expand Down Expand Up @@ -179,7 +179,7 @@ $ curl -X POST -H "Content-Type: application/json" -d '{"img_base64":"<YOUR BASE

* Uploading the image content via `multipart/related`, rather than passing an image URL. (example client code provided in the [Go REST client](http://github.com/tleyden/open-ocr-client))
* Tesseract config vars (eg, equivalent of -c arguments when using Tesseract via the command line) and Page Seg Mode
* Ability to use an image pre-processing chain, eg [Stroke Width Transform](https://github.com/tleyden/open-ocr/wiki/Stroke-Width-Transform).
* Ability to use an image pre-processing chain, e.g. [Stroke Width Transform](https://github.com/tleyden/open-ocr/wiki/Stroke-Width-Transform).
* Non-English languages

See the [REST API docs](http://docs.openocr.apiary.io/) and the [Go REST client](http://github.com/tleyden/open-ocr-client) for details.
Expand Down
14 changes: 12 additions & 2 deletions convert-pdf.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,24 @@ func (c ConvertPdf) preprocess(ocrRequest *OcrRequest) error {
if err != nil {
return err
}
defer os.Remove(tmpFileNameInput)
defer func(name string) {
err := os.Remove(name)
if err != nil {
log.Warn().Err(err).Str("component", "PREPROCESSOR_WORKER").Msg(name + " could not be removed")
}
}(tmpFileNameInput)

tmpFileNameOutput, err := createTempFileName("")
tmpFileNameOutput = fmt.Sprintf("%s.tif", tmpFileNameOutput)
if err != nil {
return err
}
defer os.Remove(tmpFileNameOutput)
defer func(name string) {
err := os.Remove(name)
if err != nil {
log.Warn().Err(err).Str("component", "PREPROCESSOR_WORKER").Msg(name + " could not be removed")
}
}(tmpFileNameOutput)

err = saveBytesToFileName(ocrRequest.ImgBytes, tmpFileNameInput)
if err != nil {
Expand Down
4 changes: 2 additions & 2 deletions docs/idea_post_ocr_base64.http
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ client.test("Request executed successfully", function() {
});

client.test("Response content-type is json", function() {
var type = response.contentType.mimeType;
const type = response.contentType.mimeType;
client.assert(type === "application/json", "Expected 'application/json' but received '" + type + "'");
});
%}
Expand All @@ -252,7 +252,7 @@ client.test("Request executed successfully", function() {
});

client.test("Response content-type is json", function() {
var type = response.contentType.mimeType;
const type = response.contentType.mimeType;
client.assert(type === "application/json", "Expected 'application/json' but received '" + type + "'");
});
%}
Expand Down
25 changes: 17 additions & 8 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,14 +1,23 @@
module github.com/xf0e/open-ocr

go 1.15
go 1.17

require (
github.com/couchbaselabs/go.assert v0.0.0-20130325201400-cfb33e3a0dac
github.com/prometheus/client_golang v1.10.0
github.com/prometheus/common v0.25.0 // indirect
github.com/rs/zerolog v1.20.0
github.com/segmentio/ksuid v1.0.3
github.com/streadway/amqp v1.0.0
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
google.golang.org/protobuf v1.25.0 // indirect
github.com/prometheus/client_golang v1.12.1
github.com/rabbitmq/amqp091-go v1.3.0
github.com/rs/zerolog v1.26.0
github.com/segmentio/ksuid v1.0.4
)

require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.1.2 // indirect
github.com/golang/protobuf v1.5.2 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.1 // indirect
github.com/prometheus/client_model v0.2.0 // indirect
github.com/prometheus/common v0.32.1 // indirect
github.com/prometheus/procfs v0.7.3 // indirect
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 // indirect
google.golang.org/protobuf v1.26.0 // indirect
)
Loading

0 comments on commit 973a8fb

Please sign in to comment.