You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this nice project, we found it quite helpful in our ongoing efforts to parallelize a fixpoint algorithm in OCaml.
A quick suggestion: It might be a good idea to prominently document that Mutex and Condition will not work out of the box as one might expect when combined with Domainslib. This will help people new to Multicore avoid going down a potentially time-consuming rabbit hole. (Apologies if there is such a remark somewhere, I re-checked and still did not find any).
Details (can be skipped by people familiar with the difference in behavior)
It took us quite a while to understand why our algorithm was not terminating and sometimes throwing exceptions, and we managed to extract this example:
openDomainslibletmain()=let mutex =Mutex.create ()inlet pool =T.setup_pool ~num_domains:2()inlettask()=for i =0to1000do
(
Mutex.lock mutex;
let work =T.async pool (fun() -> ()) inTask.await pool work;
Mutex.unlock mutex
)
doneinDomainslib.Task.run pool (fun() ->
let p =T.async pool (fun() -> task ()) inlet p1 =T.async pool (fun() -> task ()) inlet p2 =T.async pool (fun() -> task ()) inlet p3 =T.async pool (fun() -> task ()) inTask.await pool p;
Task.await pool p1;
Task.await pool p2;
Task.await pool p3;
);
()let _ = main ()
which will either crash with
michael@michael-XPS-13-9360:~/Documents/td-parallel$ _build/default/mutexproblem.exe
Locking thread different from unlocking thread
Fatal error: exception Sys_error("Mutex.unlock: Operation not permitted")
or deadlock.
We had a similar problem also when we tried using a condition variable to wait until a certain number of tasks had reached a certain point, which did deadlock (for n domains) as soon as n tasks had reached that point.
Yes, this has been a known issue for a long time. See issue #126 here and remark here, for example.
You mentioned domain-local-await. Yes, that currently works with Domainslib and Eio. I'm currently working on Picos, which aims to provide a more comprehensive and more widely accepted solution to interoperability and replace domain-local-await and domain-local-timeout. Picos already provides replacements for the Stdlib Mutex and Condition. Unfortunately, no existing scheduler (aside from the sample schedulers in the Picos package) currently provides full compatibility with Picos. Hopefully we'll get a chance at some point to rewrite the internals of Domainslib to use Picos.
Thank you for this nice project, we found it quite helpful in our ongoing efforts to parallelize a fixpoint algorithm in OCaml.
A quick suggestion: It might be a good idea to prominently document that
Mutex
andCondition
will not work out of the box as one might expect when combined with Domainslib. This will help people new to Multicore avoid going down a potentially time-consuming rabbit hole. (Apologies if there is such a remark somewhere, I re-checked and still did not find any).Details (can be skipped by people familiar with the difference in behavior)
It took us quite a while to understand why our algorithm was not terminating and sometimes throwing exceptions, and we managed to extract this example:
which will either crash with
or deadlock.
We had a similar problem also when we tried using a condition variable to wait until a certain number of tasks had reached a certain point, which did deadlock (for n domains) as soon as n tasks had reached that point.
After looking into how Domainslib works, it of course becomes clear that one would have to use something akin to, e.g., https://github.com/ocaml-multicore/domain-local-await.
The text was updated successfully, but these errors were encountered: