-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add blocking_fifo pool type #46
Conversation
- the pop function returns null when no units are available; there is no need to query size in advance when only one pool is present
src/sched/basic.c
Outdated
@@ -112,8 +112,7 @@ static void sched_run(ABT_sched sched) | |||
for (i = 0; i < num_pools; i++) { | |||
ABT_pool pool = pools[i]; | |||
ABTI_pool *p_pool = ABTI_pool_get_ptr(pool); | |||
size_t size = ABTI_pool_get_size(p_pool); | |||
if (size > 0) { | |||
if (num_pools == 1 || ABTI_pool_get_size(p_pool) > 0 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we completely get rid of this branch? get_size > 0
and the beginning of pop
do the same thing; they check that num_units > 0
. So might as well save on the redundant branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the new pool added by this PR (called blocking_fifo in this PR, but possibly to be renamed re: offline discussion) can block on the pop() function. If you have more than one pool associated with a scheduler then the other pool could be starved if you are using that pool.
I could rework it so that the existing schedulers have the optimization that you describe, and then add a new scheduler that is aware of the blocking behavior and responds accordingly. We already have to start a new scheduler to swap pools anyway, so it could just as well be a special scheduler intended for use with the blocking_fifo pool.
That's a little bit of code duplication, but it would let us cut down the code path for the default schedulers.
@@ -95,8 +95,7 @@ static void sched_run(ABT_sched sched) | |||
for (i = 0; i < num_pools; i++) { | |||
ABT_pool pool = p_pools[i]; | |||
ABTI_pool *p_pool = ABTI_pool_get_ptr(pool); | |||
size_t size = ABTI_pool_get_size(p_pool); | |||
if (size > 0) { | |||
if (num_pools == 1 || ABTI_pool_get_size(p_pool) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
@@ -82,8 +82,7 @@ static void sched_run(ABT_sched sched) | |||
/* Execute one work unit from the scheduler's pool */ | |||
ABT_pool pool = p_pools[0]; | |||
ABTI_pool *p_pool = ABTI_pool_get_ptr(pool); | |||
size_t size = ABTI_pool_get_size(p_pool); | |||
if (size > 0) { | |||
if (num_pools == 1 || ABTI_pool_get_size(p_pool) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above with a slight variation.
We pop
a unit, and if that unit is NULL
then work-steal.
/* hold additional pthread mutex and signal anyone who might be blocking | ||
* on pop() | ||
*/ | ||
pthread_mutex_lock(&p_data->blocking_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think acquiring the lock here is necessary. The pool push operation should be thread safe and cond_signal
does not need a lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree cond_signal() itself doesn't need a lock, but the lock in this specific example is to prevent a race between the waiter checking for the pool being empty and this function adding something to the pool. Without the mutex the you could theoretically get this order of operations, with thread a calling pop() and thread b calling push():
thread a: lock
thread a: check if pool is empty
thread b: insert into pool queue
thread b: cond_signal
thread a: cond_wait
Thread a will block in this case even though there is work in the pool when it hits the cond_wait() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the timeout != infinite avoids deadlocks in this PR, so I think it's still correct. Performance wise, my suggestion allows push() to be on a fast path and pop() on empty (including false positives) on a slow path. But I agree that if we change the semantic of the pool to include infinite wait on pop() (which I think is a likely API extension in the future), then keeping the lock for correctness is a must
Closing this PR. Will generate some new PRs with a refined design based on offline discussions. |
This PR adds a new kind of pool called ABT_POOL_BLOCKING_FIFO. It reuses existing ABT_POOL_FIFO functionality but layers atop it the ability to have the pop() function block briefly if the pool is empty. This prevents schedulers from busy spinning when no work units are available, which is helpful for use cases like persistent services that would like to idle gracefully.
This PR also includes:
Corresponding code that uses (and tests) this PR is in a development branch of Margo: https://xgitlab.cels.anl.gov/sds/margo/tree/dev-blocking-pool/src
For Margo, this change avoids the problem described in #26, eliminates the need for #25, eliminates our dependency on the abt-snoozer https://xgitlab.cels.anl.gov/sds/abt-snoozer, and eliminates our libev dependency.