Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress "Couldn't detect a suitable IP address" messages on cluster nodes with no internet #585

Open
rormseth opened this issue Sep 2, 2022 · 1 comment

Comments

@rormseth
Copy link

rormseth commented Sep 2, 2022

We are using the SLURMCluster function in dask_jobqueue, where our compute nodes have no connection to the Internet. When starting a new cluster, it gives us this error:

cluster = SLURMCluster( queue="jh", cores=4, processes=4, memory="16g", walltime="1:00:00")

/usr/local/other/python/JH.1/GEOSpyD/4.11.0_py3.9/2022-05-25/envs/ml/lib/python3.9/site-packages/distributed/utils.py:146: RuntimeWarning: Couldn't detect a suitable IP address for reaching '8.8.8.8', defaulting to hostname: [Errno 101] Network is unreachable
  warnings.warn(

However, we know that our compute nodes do not have connection to the Internet, and functionality seems correct. Is there a way to suppress this error so as to not confuse our users?

@jrbourbeau jrbourbeau transferred this issue from dask/distributed Sep 2, 2022
@guillaumeeb
Copy link
Member

Hi @rormseth,

I'm not sure this is a dask-jobqueue specific problem. Firs of all, could you try if you get this error when using LocalCluster from distributed?

Then, I think it probably comes from your HPC cluster specific network configuration which must blocks something. I'm not saying that your node should have a connection to Internet, this is often not the case in HPC centers. However, I don't think distributed actually tries to connect to a remote address in get_ip method. I think it's just a way to find the correct network interface and hostname for the Scheduler or the Worker of your Dask cluster.

Maybe you could prevent this error by using interface kwarg to specifcy the correct network interface to use for your Scheduler and workers? Else, you should look into scheduler_options kwarg to try to specify the address or interface the scheduler should bind to. See for example #465 (comment) or #490 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants