-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread Tracker #336
Comments
This seems like a very big ask. In general, Failsafe doesn't have access to the innards of the scheduler being used. Even if it did, you wouldn't want to saddle all Failsafe users with the performance hit of keeping track of all those threads. Instead, you'd want to be able to opt in. I guess one could provide an instrumented scheduler/thread pool that knew how to look into the running states of its threads, but that would inevitably slow down and distort the operation of the pool. This might be acceptable in some contexts, but it would be very hard to provide guidance on how and when to use this. |
Why would Failsafe need access to the scheduler? The thread tracker can capture
The Since the thread tracker is a policy, the programmer has to create a thread tracker policy and then use it. The programmer will have to be informed of the scalability problem if too many threads hit the policy too quickly. However, I suspect the lock contention to be low since the tracking part will be something like.
As you can see the critical region of the |
So the goal for this idea is to be able to identify threads that appear to be blocked and to dump call stack for Failsafe threads? Presumably, we'd also want a way to indicate which threads are associated with which Failsafe policies or tasks, such as a Timeout thread, retry, etc?
Definitely. Unless the tracking is very light, it should be optional. |
Perhaps, I am asking for
|
In describing this ability in the docs, what would you say it's used for? Maybe you could talk more about your use case. It's worth noting that Failsafe uses threads for a few different things. Are you just interested in tracking threads that are used for executing user-provided suppliers/runnables, or other threads that Failsafe uses internally also, such as for waiting on a retry, timeout, or rate limiter permit? |
I am interested in tracking threads that are used for executing user-provided suppliers and runnables. I have a program that launches a bunch of async tasks as a group. The entire group must complete before moving on. I use |
If |
I am looking at On the line before On the line before Add to Does this need an opt-in configuration? Yes. The result of The problem is that calling |
Grabbing the thread refs is not a challenge, DelegatingScheduler, which is used to schedule executions against an ExecutorService/ForkJoinPool, already does this to add interruption support to ForkJoinPool threads. The biggest challenge for me is making a feature like this fit nicely into the API, since the use case might evolve over time as it's better understood, leading to potential breaking changes in the API. Right now Failsafe only allows a single execution at a time, but it may allow concurrent executions under certain scenarios in the future (via #231 and #291). Assuming the status quo remains, an API would probably only return a single thread, ex: Some possible places that
I somewhat like option 2 better since it implies that an execution thread is tied to an individual execution attempt or moment in time (such as whenever a timeout happens to trigger), and can be made immutable in that event. There are caveats to exposing the execution thread since by the time a user fetches it via
For async executions, the thread that sets up a timer might not be the same thread that later runs a Supplier/Runnable. We intentionally don't use a separate thread for the Supplier/Runnable until just before they're executed, in case the execution never happens, such as when a CircuitBreaker or RateLimiter prevents the executions. |
To accommodate async, we can change the solution for both async and sync.
Scenario 1: The Execution Thread has not Started Let's say the execution thread has not started because there is a delay in its start (e.g., RateLimiter). Let's say the cancel thread; however, managed to start.
Scenario 2: The Execution Thread is Running Let's say the execution thread started the operation but has not finished. Let's say the cancel thread starts running.
Scenario 3: The Execution Thread Finished Let's say the execution thread started the operation but has not finished. Let's say the cancel thread starts running.
Because of |
Please create a thread tracker policy. It tracks what threads are currently executing inside the policy. Please add the following methods to the policy.
getThreads()
returns a snapshot of the current threads executing inside the policyisExecuting(Thread thread)
returnstrue
if the giventhread
is currently executing inside the policygetCallStacks()
returns aString
of the call stacks of all the threads currently executing inside the policyLet's say a master thread executes inside a timeout policy. While doing so, it creates a bunch of child tasks to execute using a thread pool. The master thread then waits for the child tasks to finish. If the master thread times out, the thread tracker policy can be used to log the call stacks of the threads executing the child tasks. Hence, one can diagnose where the child tasks hung.
The text was updated successfully, but these errors were encountered: