-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to Force Worker Spreading Across Hosts #166
Comments
@JessicaLHartog +1 for I would like to propose |
@dsKarthick : nice, "multifarious", I didn't know that word! ✋ ✋ (high five). And then following it up with "nefarious". "(multi/ne)farious". I wonder what other words end with "farious"... ahh: http://wordinfo.info/unit/3613/ip:1/il:F But back on topic: yes, totally, we should have that ( |
@erikdw I have a problem. the last picture says "Failed to detect a master: Failed to parse data of unknown label 'json.info'" .please tell me how to solve the problem. |
Again, please stop posting these messages in random pull requests and issues. Just file a new issue.
|
Sometimes the approach of "let Storm do its own thing" when it comes to scheduling worker slots can lead to interesting scenarios:
(1) A topology may wish to enable running agents in the JVM which listen on predefined ports (such as JDWP debugging or JMX remote). In such a scenario if the port is statically defined somewhere, then two workers on the same host will stall the topology. For example, if we pass
-Dcom.sun.management.jmxremote.port=9111
as one of the java childopts, we will see an exception like the one below when trying to bring up the second worker:(2) Two worker processes on the same machine for the same topology can lead to inconsistent behavior. In a very resource-hungry topology predictability about how workers are scheduled across various hosts may yield better results when it comes to verifying behavior. For example, suppose a topology writes to disk a lot. Having two workers on the same host will increase the I/O on the worker, and can cause an inability to write heartbeat messages, bringing the worker offline. When these workers are rescheduled, if they are scheduled onto two new hosts, then the topology will likely not cause a crashing of the worker.
(3) Tuning the configuration of a topology for various elements like the number of required workers as well as the number of executors per topology component can be made difficult if the first time a topology is submitted two workers are scheduled on the same host, and after some tweaking of configuration options the workers are spread across more hosts (and vice versa).
To make these behaviors (and others) more predictable it would be nice if there were an option like
topology.mesos.scheduler
that can define a TopologyScheduler. The purpose of the TopologyScheduler would be to define worker spreading across hosts. Some options that may be useful are:Default
- the current "let Storm do its own thing" behaviorOneWorkerPerHost
- worker slots for the same topology may not use offers from the same hostMinimumNumberOfHosts
- all worker slots are scheduled on the same host (or on as few hosts as possible), attempting to maximize the number of worker slots on each hostAdditional options would also then be possible should a need for them arise.
This also relates to Issue 158: enhance scheduling to act more predictably
The text was updated successfully, but these errors were encountered: