You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --head
RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --address='192.168.0.196:6379'
python example.py
Output:
24-11-08 13:54:19,817 INFO worker.py:1601 -- Connecting to existing Ray cluster at address: 192.168.0.196:6379...
2024-11-08 13:54:19,831 INFO worker.py:1777 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265
(foo pid=45881) hello
(foo pid=45881) hello
I mitigated this issue by calling this function after starting worker node. Of course, it has many downsides and it's not the way to go in long term.
defkill_redundant_log_monitors():
""" Killing redundant log_monitor.py processes. If multiple ray nodes are started on the same machine, there will be multiple ray log_monitor.py processes monitoring the same log dir. As a result, the logs will be replicated multiple times and forwarded to driver. See issue https://github.com/ray-project/ray/issues/10392 """importpsutilimportsubprocesslog_monitor_processes= []
forprocinpsutil.process_iter(["name", "cmdline"]):
try:
cmdline=subprocess.list2cmdline(proc.cmdline())
except (psutil.AccessDenied, psutil.NoSuchProcess):
continueis_log_monitor="log_monitor.py"incmdlineifis_log_monitor:
log_monitor_processes.append(proc)
iflen(log_monitor_processes) >1:
forprocinlog_monitor_processes[1:]:
proc.kill()
The text was updated successfully, but these errors were encountered:
JakkuSakura
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Nov 8, 2024
jcotant1
added
core
Issues that should be addressed in Ray Core
and removed
core
Issues that should be addressed in Ray Core
labels
Nov 8, 2024
rynewang
added
P2
Important issue, but not time-critical
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Nov 12, 2024
What happened + What you expected to happen
I encountered this #10392 issue when I was experimenting with ray.
This issue was closed due to the inability to provide a reproducible example.
Versions / Dependencies
ray[all] 2.38.0
MacOS
Reproduction script
RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --head RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --address='192.168.0.196:6379' python example.py
Output:
24-11-08 13:54:19,817 INFO worker.py:1601 -- Connecting to existing Ray cluster at address: 192.168.0.196:6379...
2024-11-08 13:54:19,831 INFO worker.py:1777 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265
(foo pid=45881) hello
(foo pid=45881) hello
Issue Severity
Low: It annoys or frustrates me.
A workaround is at: https://github.com/intel-analytics/BigDL-2.x/pull/2799/files
I mitigated this issue by calling this function after starting worker node. Of course, it has many downsides and it's not the way to go in long term.
The text was updated successfully, but these errors were encountered: