add blocking_fifo pool type #46

carns · 2018-03-06T18:31:57Z

This PR adds a new kind of pool called ABT_POOL_BLOCKING_FIFO. It reuses existing ABT_POOL_FIFO functionality but layers atop it the ability to have the pop() function block briefly if the pool is empty. This prevents schedulers from busy spinning when no work units are available, which is helpful for use cases like persistent services that would like to idle gracefully.

This PR also includes:

a minor general optimization to the sched_run function of all existing schedulers to avoid unnecessary pool size() calls in the scheduler loop for the single pool case
a new test program for the blocking_fifo pool in test/basic/pool_blocking_fifo.c
new field in the pool struct that can be used internally to inspect a pool's kind
modifications to the scheduler loop setup in the basic scheduler to optimize it's behavior when the blocking fifo pool is detected (changing event polling behavior, sched_sleep behavior, and avoiding configurations that could lead to deadlock)

Corresponding code that uses (and tests) this PR is in a development branch of Margo: https://xgitlab.cels.anl.gov/sds/margo/tree/dev-blocking-pool/src

For Margo, this change avoids the problem described in #26, eliminates the need for #25, eliminates our dependency on the abt-snoozer https://xgitlab.cels.anl.gov/sds/abt-snoozer, and eliminates our libev dependency.

- the pop function returns null when no units are available; there is no need to query size in advance when only one pool is present

halimamer · 2018-03-14T22:13:49Z

src/sched/basic.c

@@ -112,8 +112,7 @@ static void sched_run(ABT_sched sched)
        for (i = 0; i < num_pools; i++) {
            ABT_pool pool = pools[i];
            ABTI_pool *p_pool = ABTI_pool_get_ptr(pool);
-            size_t size = ABTI_pool_get_size(p_pool);
-            if (size > 0) {
+            if (num_pools == 1 || ABTI_pool_get_size(p_pool) > 0 ) {


How about we completely get rid of this branch? get_size > 0 and the beginning of pop do the same thing; they check that num_units > 0. So might as well save on the redundant branch

The problem is that the new pool added by this PR (called blocking_fifo in this PR, but possibly to be renamed re: offline discussion) can block on the pop() function. If you have more than one pool associated with a scheduler then the other pool could be starved if you are using that pool.

I could rework it so that the existing schedulers have the optimization that you describe, and then add a new scheduler that is aware of the blocking behavior and responds accordingly. We already have to start a new scheduler to swap pools anyway, so it could just as well be a special scheduler intended for use with the blocking_fifo pool.

That's a little bit of code duplication, but it would let us cut down the code path for the default schedulers.

halimamer · 2018-03-14T22:14:44Z

src/sched/prio.c

@@ -95,8 +95,7 @@ static void sched_run(ABT_sched sched)
        for (i = 0; i < num_pools; i++) {
            ABT_pool pool = p_pools[i];
            ABTI_pool *p_pool = ABTI_pool_get_ptr(pool);
-            size_t size = ABTI_pool_get_size(p_pool);
-            if (size > 0) {
+            if (num_pools == 1 || ABTI_pool_get_size(p_pool) > 0) {


same comment as above

halimamer · 2018-03-14T22:16:28Z

src/sched/randws.c

@@ -82,8 +82,7 @@ static void sched_run(ABT_sched sched)
        /* Execute one work unit from the scheduler's pool */
        ABT_pool pool = p_pools[0];
        ABTI_pool *p_pool = ABTI_pool_get_ptr(pool);
-        size_t size = ABTI_pool_get_size(p_pool);
-        if (size > 0) {
+        if (num_pools == 1 || ABTI_pool_get_size(p_pool) > 0) {


same comment as above with a slight variation.
We pop a unit, and if that unit is NULL then work-steal.

halimamer · 2018-03-14T23:07:09Z

src/pool/fifo.c

+    /* hold additional pthread mutex and signal anyone who might be blocking
+     * on pop()
+     */
+    pthread_mutex_lock(&p_data->blocking_mutex);


I don't think acquiring the lock here is necessary. The pool push operation should be thread safe and cond_signal does not need a lock

I agree cond_signal() itself doesn't need a lock, but the lock in this specific example is to prevent a race between the waiter checking for the pool being empty and this function adding something to the pool. Without the mutex the you could theoretically get this order of operations, with thread a calling pop() and thread b calling push():

thread a: lock
thread a: check if pool is empty
thread b: insert into pool queue
thread b: cond_signal
thread a: cond_wait

Thread a will block in this case even though there is work in the pool when it hits the cond_wait() call.

the timeout != infinite avoids deadlocks in this PR, so I think it's still correct. Performance wise, my suggestion allows push() to be on a fast path and pop() on empty (including false positives) on a slow path. But I agree that if we change the semantic of the pool to include infinite wait on pop() (which I think is a likely API extension in the future), then keeping the lock for correctness is a must

carns · 2018-03-16T01:38:18Z

Closing this PR. Will generate some new PRs with a refined design based on offline discussions.

carns added 7 commits March 5, 2018 21:06

skip redundant pool size check and pop

24355f1

- the pop function returns null when no units are available; there is no need to query size in advance when only one pool is present

clean up superfluous definition

96b7c48

implement BLOCKING_FIFO pool

4c235b2

bug fix

f6e31bc

test program for blocking_fifo pool

bed2316

bug fix

7cab4b5

tune basic scheduler for use with blocking fifo

c473210

halimamer reviewed Mar 14, 2018

View reviewed changes

carns closed this Mar 16, 2018

carns mentioned this pull request Mar 18, 2018

implement ABT_SCHED_BASIC_WAIT #49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add blocking_fifo pool type #46

add blocking_fifo pool type #46

carns commented Mar 6, 2018

halimamer Mar 14, 2018

carns Mar 15, 2018

halimamer Mar 14, 2018

halimamer Mar 14, 2018

halimamer Mar 14, 2018

carns Mar 15, 2018

halimamer Mar 15, 2018

carns commented Mar 16, 2018

add blocking_fifo pool type #46

add blocking_fifo pool type #46

Conversation

carns commented Mar 6, 2018

halimamer Mar 14, 2018

Choose a reason for hiding this comment

carns Mar 15, 2018

Choose a reason for hiding this comment

halimamer Mar 14, 2018

Choose a reason for hiding this comment

halimamer Mar 14, 2018

Choose a reason for hiding this comment

halimamer Mar 14, 2018

Choose a reason for hiding this comment

carns Mar 15, 2018

Choose a reason for hiding this comment

halimamer Mar 15, 2018

Choose a reason for hiding this comment

carns commented Mar 16, 2018