Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing an ABT_eventual_wait_any #167

Open
mdorier opened this issue Apr 3, 2020 · 6 comments
Open

Providing an ABT_eventual_wait_any #167

mdorier opened this issue Apr 3, 2020 · 6 comments

Comments

@mdorier
Copy link
Contributor

mdorier commented Apr 3, 2020

It would be useful to have an ABT_eventual_wait_any function that takes an array of eventuals and blocks until one has been set (kind of the ABT equivalent of MPI_Waitany).

A possible prototype:

int ABT_eventual_wait_any(size_t count, ABT_eventual* eventuals, size_t* index, void **value);

where count is the number of eventuals in the array, and when returning this function sets index to the index of the eventual that completed.

@shintaro-iwasaki
Copy link
Collaborator

Thanks for your suggestion! In general, adding ABT_eventual_wait_any() has a concern about the API completeness:

  1. Adding other types of waiting functions (test_any, test_some, wait_all, test_all, ...) is a mess despite their limited use cases.
  2. Adding these functions to other synchronizations (future, rwlock, mutex, join, free, ...) can be another mess.

Maybe only ABT_{eventual/future}_wait_{any/many(=all)}() and ABT_{thread/task}_{free/join}_{any/many}() make sense, but still many. If there is a specific reason why a few of them can be more useful and meaningful than others, we can work on them sooner.


I would like to confirm two points:

1. Functionality

ABT_eventual_wait() succeeds multiple times unlike MPI_Wait(), which succeeds only once per MPI_Request, so setting NULL is needed when you repeatedly use ABT_eventual_wait_any().

ABT_eventual eventuals[1000];
[Create threads and they will set eventuals[i] eventually];

for (int i = 0; i < 1000; i++ ) {
    size_t index;
    ABT_eventual_wait_any(1000, eventuals, &index, NULL);
    [Do something];
    eventuals[index] = ABT_EVENTUAL_NULL; // <- This is needed!
}

Does it fit your use case?

2. Performance

We can do almost similar by repeating ABT_eventual_test() and calling ABT_thread_yield() at the end of each loop.
Of course, there is room for optimization (i.e., ABT_eventual_wait_any() can be faster than iterating ABT_eventual_test()) in some cases, but it depends on how your program waits on eventuals.

Anyway this performance optimization requires some efforts compared to adding a mere utility function. Is your use case of ABT_eventual_wait_any() performance critical or do you need ABT_eventual_wait_any() for convenience?

@mdorier
Copy link
Contributor Author

mdorier commented Apr 4, 2020

Regarding the other functions, they can all be implemented pretty easily with the existing ones. For example ABT_eventual_wait_all is simply a loop of ABT_eventual_wait. Same for test_some, test_all, etc. ABT_thread_join_any may be in the same situation as ABT_eventual_wait_any, but it can actually be implemented if ABT_eventual_wait_any is available (technically one could be implemented with the other and vis versa).

The problem with ABT_eventual_wait_any is that we can't implement it at the upper level without an active loop that keeps yielding.

Regarding your two points:

  1. Yes this fits our use-case.

  2. In our use-case (let's say for simplicity that we have a single ES with a single pool), one ULT is driving network progress, and 1 ULT has submitted X network operations, the completion of which is checked by X corresponding ABT_eventuals (the network progress ULT will set the corresponding eventual when an operation has completed). The progress ULT has an awareness of how many other ULTs in the pool could be run. If other ULTs could be run, it will try to make progress for 100ms, then timeout and yield to those ULTs. If no other ULTs are runnable (e.g. there is no ULT at all, or they are all blocked on an eventual/mutex/rwlock) then the progress ULT will call a blocking network progress function and not yield until a network operation was actually completed. Implementing ABT_eventual_wait_any with an active progress loop means that the ULTs will keep yielding to each other, instead of having the progress ULT simply block on reading from a socket while the other ULT blocks on an ABT_eventual_wait. Relying on a non-blocking read timing out is something we generally try to avoid.

@shintaro-iwasaki
Copy link
Collaborator

Thank you for your comment!

Conceptually functions I listed do something similar, but implementations are different. For example, only a single thread can join a thread only once in the case of ABT_thread_join_any() while multiple threads can wait on different sets of eventuals multiple times in the case of ABT_eventual_wait_any(). That is why we hesitate to support many.

I would like to ask more details how ABT_eventual_wait_any() is used in your use case.

In my understanding, your current code does the following:

  1. The progress ULT will set a eventual if the corresponding operation is completed in a progress function.
    • ABT_eventual_set() is used for this.
  2. There are many network operation ULTs, each of which is waiting on the corresponding eventual.
    • ABT_eventual_wait() is used for this. (Not ABT_eventual_wait_any(), in my understanding)
  3. When some operation ULTs can be run, the progress ULT will call a non-blocking progress function with 100ms timeout and then yield.
  4. If no operation ULTs can be run, the progress ULT will call a blocking progress function to make sure

I guess you want to use ABT_eventual_wait_any() for detecting a condition in 3: "some operation ULTs can be run". Does a network operation ULT have another eventual for this purpose (other than the eventual in 2.) or use a same eventual with ABT_eventual_reset() + state management?

I think ABT_eventual_wait_any() can optimize something internally in Argobots, but the outcome of optimization can be different from what you really want. This is why I am asking you the use case (and possibly there is a better way to optimize it).

@mdorier
Copy link
Contributor Author

mdorier commented Apr 6, 2020

Actually as I read the code I realize it's a bit different than that:

The progress ULT is on a loop that does the following:

  • Call ABT_pool_get_size to find out the number of runnable ULTs;
  • If this number is > 0, call ABT_thread_yield to give them a chance to run;
  • Call ABT_pool_get_total_size to find out the total number of ULTs in the pool (including blocked ones and including the caller);
  • If the total number is > 1, call the network progress function with a timeout of 0 (it will return immediately if there nothing to receive from the network), then call ABT_thread_yield
  • If the total number is 1, call the network progress function with a timeout of 100ms.

So actually if we use only 1 ES, we will always be in the case where total number of ULT > 1 and we will always have a timeout of 0, because there is always 1 caller of network operations + 1 progress loop in the pool.

So the real issue with ABT_eventual_wait_any is just that it can only be implemented with an active loop of ABT_eventual_test, which at best just consumes CPU, at worst leads to frequent context switching with other ULTs that could otherwise continue uninterrupted.

@shintaro-iwasaki
Copy link
Collaborator

I would like to make my concern more concrete. ABT_eventual_wait_any() is complicated and this optimization might not be what you really want.

The basic design of ABT_eventual_wait_any() will be like this.

  1. First, quickly checks all eventual readiness in a loop. If one of them is set, ABT_eventual_wait_any() returns.
  2. Otherwise, takes all the internal locks of the eventuals in the list. After that, it saves a pair of "this waiter thread and a ABT_eventual_wait_any()-specific struct (newly allocated)" to all eventuals in the list. After that the waiter thread suspends.
    • This requires another path to deal with a case when one eventual is set while marking eventuals in the list.
    • In addition, this algorithm does not consider two threads simultaneously call ABT_eventual_wait_any() with different sets of eventuals as well, so it can cause a deadlock (e.g., one waiter takes locks of A and then B and another thread tries those of B and then A). This should be fixed by adding a different mechanism.
  3. When one of the eventuals is set, it finds the waiter thread in its waiter list. If so, it also finds ABT_eventual_wait_any()-specific struct, so taking locks of all the eventuals in its eventual list and remove the waiter thread from the list in order to avoid waking up this waiter thread multiple times.
    • This does not consider two threads simultaneously call ABT_eventual_wait_any() with different sets of eventuals as well. This requires sophisticated lock-check mechanism (maybe costly).

The algorithm above can surely avoid yield in a busy loop but can be costly in a certain execution path (or need a very error-prone lock/wait-free type of algorithm). Specifically, the problem is incurred because ABT_eventual_wait_any() can be called from multiple threads and their eventual lists are given to the runtime dynamically and different.

As far as I see, your usage is more limited (e.g., only one ULT calls ABT_eventual_wait_any() and the set of eventuals and a waiter thread are predetermined (then you can use a preallocated spinlock for atomicity)). I suggest ABT_self_suspend() and ABT_thread_resume(), but if you let me know more details about how to use ABT_eventual_wait_any(), I might design a better one (or I can advise how to use ABT_self_suspend() and ABT_thread_resume()).

@mdorier
Copy link
Contributor Author

mdorier commented Apr 6, 2020

Thanks, that sounds indeed complicated.

My request for ABT_eventual_wait_any resulted from one of our Mochi users wanting to wait on multiple network operations at the same time (i.e. margo_wait_any in the library). I temporarily implemented it with an active loop of ABT_eventual_test. Let's keep it at that for now, and if that user comes back with performance concerns (he is aware that it's doing an active loop), I will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants