-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple clients with Margo 0.2 #33
Comments
In GitLab by @carns on Nov 9, 2017, 11:09 Hi Marc, First of all, thanks for the patch! We'll have a look at this. One quick comment on how the margo initialization has changed in recent versions: you might want to have a look at the margo_init_pools() function as an alternative to margo_init(). It's a little more like the older style margo init function, in that it assumes the caller has already configured the argobots and mercury resources ahead of time rather than doing it for them. This might be better for your use case? The new margo_init() has become more of a default / convenience mode, for users that aren't interested in advanced features would rather hide the complexity of setting up those subsystems. |
In GitLab by @carns on Nov 9, 2017, 11:11 You might also want to track this issue on the Mercury github: There is a remaining todo item on the sm plugin in mercury, which is to make it's use transparent. Once that is implemented the idea is that you would not need to explicitly manage the sm plugin, it would get used automatically when you issued an RPC to a process (via any plugin) that happened to be located on the same node. |
In GitLab by @marcvef on Nov 9, 2017, 11:36 Hi Philip, Happy to help and thanks a ton for your very quick reply and suggestions! I agree margo_init_pool() looks promising as I already had the Mercury and Argobots instances setup for the previous margo version. I will check it out further! The Mercury issue looks interesting. Using shared memory automatically if the RPC destination is located on the same node would be very convenient for me and fits my use case perfectly. |
In GitLab by @shanedsnyder on Nov 9, 2017, 16:53 mentioned in commit 4537b96 |
In GitLab by @shanedsnyder on Nov 9, 2017, 17:05 Hi Marc, I just pushed a fix similar to your patch to the master branch of margo, in case you wanted to pull it down and try that out? The only difference is that we set the ABT snoozer scheduler regardless of whether Argobots was initialized in `margo_init()` or by the caller. I think we want to ensure this scheduler is enabled when calling this code path -- as Phil points out, you can use `margo_init_pools()` if you want more control over the Argobots behavior. Thanks again for the bug report! |
In GitLab by @marcvef on Nov 10, 2017, 09:00 Hi Shane, Thanks! Yes, I will migrate to using margo_instance_id mid_bmi;
margo_instance_id mid_sm;
mid_bmi = margo_init("bmi+tcp", MARGO_CLIENT_MODE, 0, 0);
//[...]
mid_sm = margo_init("na+sm", MARGO_CLIENT_MODE, 0, 0); The second Although this behavior affects Margo, perhaps ABT_snoozer or Argobots should be responsible to deal with this issue? In the meantime, Margo or ABT_snoozer could watch out for this case and present the user with a descriptive error? Perhaps with something along these lines (margo_sched_handling.patch). Hope this helps. |
In GitLab by @shanedsnyder on Nov 20, 2017, 14:52 Hi Marc, First, apologies for taking a while to get back to you, but I was mostly distracted at SC last week. I see what you mean re: the error with calling Ultimately, I think the semantic we want is that, when calling @carns, do you have an opinion? At any rate, thanks again for pointing out the issue and taking time to debug it. Very helpful. |
In GitLab by @carns on Nov 20, 2017, 15:00 My preference would be a fix in snoozer as @shanedsnyder points out. It would be great if we could have this semantic somehow:
The first two make it idempotent at the snoozer level. The second one prevents someone from accidentally thinking they've activated snoozer when they haven't. Margo can handle that 3rd case by printing a warning and continuing on, it doesn't have to be interpreted as a critical error at that point in the stack. |
In GitLab by @shanedsnyder on Nov 20, 2017, 16:54 I agree with you in principle, @carns, but I don't think we can distinguish between case 1 and case 3 above within abt-snoozer. All we can do is check whether work units prevent us from setting a new scheduler -- we can't determine whether the current scheduler is an abt-snoozer scheduler, or something else. Unless you know of a way to accomplish this? Or are advocating storing some sort of additional metadata within abt-snoozer to determine this at runtime (a list of xstreams for which abt-snoozer has been enabled, for instance)? If not, then I have to change my mind and advocate we just leave abt-snoozer alone and integrate Marc's patch. That means the Does that sound right? |
In GitLab by @marcvef on Nov 29, 2017, 08:14 Hi Shane, Philip, My apologies that it took me a few days get back to you. From my perspective, a warning should be fine if work units are already scheduled in the xstream. But, at the same time, you wouldn't ensure that Abt-snoozer is enabled for this code path anymore (which is the reason my patch errors out). Perhaps, this can be part of the warning so that the responsibility is given to the user? The issue with that is that this might create more questions for the user who might not know what the message means, given that initializing Argobots with Abt-snoozer is abstracted away in Also, while looking into the code I also couldn't find a way to check what scheduler is currently in use. This way Abt-snoozer wouldn't need to set the scheduler again if |
In GitLab by @carns on Jan 2, 2018, 17:19 I agree, I don't see any way to distinguish 1 and 3 either (it seems logical that you could check the scheduler type somehow, but I don't see how to do it in the abt api). We could open up an argobots issue on github to request that functionality, but in the mean time maybe all we can do is just treat the xstream_self_set() call as a "best effort" call and not fail the whole margo initialization if that subroutine fails? That's not very satisfying because it leaves the possibility that in some case you may end up without setting the snoozer scheduler at all, but for the use cases we care about right now it would be fine. |
In GitLab by @shanedsnyder on Jan 4, 2018, 19:15 OK, I submitted an issue on Argobots' github to ask about this problem: pmodels/argobots#31 Until we hear back on that, my plan is to just integrate Marc's patch with the change that we do not fail the entire margo initialization, but instead just print a warning that is as helpful as possible. Not the greatest long-term solution, but I think its probably better to at least allow margo to try to continue gracefully. My understanding is that the abt-snoozer scheduler is more of an optimization on the default abt schedulers that better accounts for high-latency operations (like I/O), so I would think it would mainly just affect performance if it is not used, which doesn't seem to warrant bombing out. Sound good? |
In GitLab by @carns on Jan 5, 2018, 10:39 Sounds good. If the snoozer scheduler is not activated then CPU usage will be higher but it won't affect correctness at least. |
In GitLab by @carns on Apr 5, 2018, 11:46 Will need to re-test this behavior of pmodels/argobots#49 is accepted in Argobots. This will change the strategy used by Margo to install schedulers. |
In GitLab by @mdorier on Jun 24, 2019, 09:08 Is this issue fixed, now that we don't have ABT-snoozer anymore? |
In GitLab by @carns on Jun 24, 2019, 10:24 I think we can deprecate this now and open a new ticket if we encounter a problem (if nothing else, the technical details / reproducer would be different now because of all of the changes since then). |
In GitLab by @carns on Jun 24, 2019, 10:24 closed |
In GitLab by @marcvef on Nov 9, 2017, 10:55
Hi,
I am using Margo as part of a preloading library, where I require two separate Margo clients that run with different NA plugins. One is running with the NA SM (shared memory) plugin, and the other with the NA BMI plugin with TCP (for now). Both are started to run within the caller's thread context. This would work with Margo 0.1 because users had to deal with initializing Argobots themselves.
Margo 0.2 starts Argobots transparantly to the user, resulting in multiple Argobots initializations and sets the caller ES to idle with ABT_snoozer each time (with
ABT_snoozer_xstream_self_set()
). Specifically, the latter action appears to be the culprit as the second call ofABT_snoozer_xstream_self_set()
(due to the second Margo client initialization) returns the errorcode 29 ([../../src/stream.c:811] ABT_xstream_set_main_sched: 29
). This aborts the whole client initialization process andmargo_init()
returnsMARGO_INSTANCE_NULL
.I've attached a small patch (margo_0.2_multiple_clients.patch) that highlights the issue and solves it by simply checking if Argobots is already initialized and, if so, skips both mentioned actions. However, I am questioning if this is the correct way of using multiple clients in one application.
Thanks!
The text was updated successfully, but these errors were encountered: