Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Each disco node sees itself as the master #631

Open
elfeto opened this issue Oct 2, 2015 · 1 comment
Open

Each disco node sees itself as the master #631

elfeto opened this issue Oct 2, 2015 · 1 comment

Comments

@elfeto
Copy link

elfeto commented Oct 2, 2015

Hi,

I have disco running with 7 nodes, master included. With the nodes with no disco process running I start the master, and the master starts the nodes like in the tutorial. The process on the node is like:

"disco 14960 0.0 0.0 290616 18636 ? Sl 10:24 0:00 /usr/lib64/erlang/erts-5.8.5/bin/beam.smp -K true -- -root /usr/lib64/erlang -progname erl -- -home /home/users/disco -- -noshell -noinput -noshell -noinput -master disco_8989_master@dtn-cn -sname disco_8989_slave@hulk -s slave slave_start disco_8989_master@dtn-cn slave_waiter_6 -connect_all false -pa /usr/lib/disco/master/ebin/ -pa /usr/lib/disco/master/deps/mochiweb/ebin -pa /usr/lib/disco/master/deps/lager/ebin -pa /usr/lib/disco/master/deps/plists/ebin -f:

There is no log on the nodes, but in the master "disco -v"

[disco@dtn-cn ~]$ disco -v | grep dtn-cn
DISCO_JOB_OWNER = [email protected]
DISCO_MASTER = http://dtn-cn.mydomain.com8989
DISCO_MASTER_HOST = dtn-cn.mydomain.com
DISCO_TEST_HOST = dtn-cn.mydomain.com
Disco master at http://dtn-cn.mydomain.com:8989

In the salve "disco-v"
[disco@hulk disco]$ disco -v | grep hulk
DISCO_JOB_OWNER = [email protected]
DISCO_MASTER = http://hulk.mydomain.com:8989
DISCO_MASTER_HOST = hulk.mydomain.com
DISCO_TEST_HOST = hulk.mydomain.com
Disco master at http://hulk.mydomain.com:8989

When I try to run a job in a node the final output is:
disco.error.CommError: Unable to access resource (http://hulk.mydomain.com:8989/disco/job/new): couldn't connect to host (is disco master running at http://hulk.mydomain.com:8989?)

Is there a way to change the master in the nodes? what can I do?

@gilessbrown
Copy link
Contributor

Outside of the actually running of the job on the worker, I do not think that the worker is tied to a particular master machine.

When running a Job the worker does know which master it is serving. Otherwise it would not be able to save the results back to DDFS (The save_results worker parameter implemented here

-spec save_ddfs(jobname(), [[url()]]) -> ok.
)

If you want to specify a Disco master node on the nodes for the purpose of running disco/ddfs commands from the node then you can use the DISCO_MASTER_HOST setting (http://disco.readthedocs.io/en/latest/lib/settings.html), for example by setting the DISCO_MASTER_HOST in the appropriate shell file when you login to the nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants