-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some bulk requests are retried for ever even when we are trying to stop a pipeline #1116
Comments
The problem is not limited to just conditioning the retry attempt on We found that def shutdown_workers
@shutdownRequested.set(true)
@worker_threads.each do |t|
@logger.debug("Shutdown waiting for worker thread" , default_logging_keys(:thread => t.inspect))
t.join
end
filters.each(&:do_close)
outputs.each(&:do_close)
end The logic is as follows:
So the condition to stop retrying can only be fulfilled when the plugin has already stopped retrying leading to a deadlock. def shutting_down?
@stopping.true? || (execution_context.pipeline.shutdown_requested? && !execution_context.pipeline.worker_threads_draining?)
end Where:
|
Logstash information:
JVM (e.g.
java -version
):Bundled
OS version (
uname -a
if on a Unix-like system):N/A
Description of the problem including expected versus actual behavior:
At Elastic cloud we are deploying some Logstash instances as managed services, that is, their configurations get updated automatically by an external service reacting to some cloud configuration changes.
Once scenario we contemplate is the possibility of evicting certain pipelines when their target ES cluster become unhealthy, unresponsive or unavailable. That eviction action isn't much more than a dynamic reconfiguration of the pipelines which boils down to stopping the affected pipeline.
For some of the problems we do tests this works nicely as the for-ever-retry loop ares stopped when the pipeline is stopping but, for others, as bad authentication response code (
401
) the pipeline never stops as it gets trapped in the loop of retries for ever.By reading the plugin source code we found that both connectivity errors and unexpected errors have a retry safe-guard stopping the retry loop when the pool/pipeline is stopping. e.g: Last line in the following snippet.
logstash-output-elasticsearch/lib/logstash/plugin_mixins/elasticsearch/common.rb
Lines 313 to 325 in 4507240
However, other response codes (like
401
) don't:logstash-output-elasticsearch/lib/logstash/plugin_mixins/elasticsearch/common.rb
Lines 336 to 353 in 4507240
Our reasoning is that this latter branch should have the same safeguard because in this use case as well as others it might be that the pipeline was configured with parameters that are not longer valid (e.g: expired API keys) being the best solution to the management problem to just reconfigure the pipeline without restarting the whole LS instance.
We are filling a PR replacing
retry
withretry unless @stopping.true?
in the branch dealing with known retry-able events.Steps to reproduce:
Please include a minimal but complete recreation of the problem,
including (e.g.) pipeline definition(s), settings, locale, etc. The easier
you make for us to reproduce it, the more likely that somebody will take the
time to look at it.
Provide logs (if relevant):
The text was updated successfully, but these errors were encountered: