Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper shutdown for gcp-direct in case no documents are to process #10

Open
johann-petrak opened this issue Apr 28, 2022 · 1 comment
Open

Comments

@johann-petrak
Copy link
Contributor

See https://github.com/GateNLP/gateplugin-Elasticsearch/issues/3

@ianroberts
Copy link
Member

For reference since the linked issue is in a private repo: the problem is that if a batch involves any components - input handlers, output handlers, or PRs - that start non-daemon threads at init time, and the batch has no documents that need processing (e.g. an existing report file shows all the available document IDs as already processed successfully), then the GCP Java process will hang forever. This is because in that specific scenario the matching "close" methods on the input/output handlers and Factory.deleteResource on the application - which would typically terminate the threads started at init - are not called.

Normally the initialisation happens when building the Batch object in BatchRunner but the shutdown is triggered via PooledDocumentProcessor after parallel processing is complete. In the "no documents to process" case, no PooledDocumentProcessor is ever created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants