Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Logs are duplicated if multiple nodes are running on same machine #48642

Open
JakkuSakura opened this issue Nov 8, 2024 · 1 comment · May be fixed by #48891
Open

[Core] Logs are duplicated if multiple nodes are running on same machine #48642

JakkuSakura opened this issue Nov 8, 2024 · 1 comment · May be fixed by #48891
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P2 Important issue, but not time-critical

Comments

@JakkuSakura
Copy link

JakkuSakura commented Nov 8, 2024

What happened + What you expected to happen

I encountered this #10392 issue when I was experimenting with ray.
This issue was closed due to the inability to provide a reproducible example.

Versions / Dependencies

ray[all] 2.38.0
MacOS

Reproduction script

# example.py
import ray

@ray.remote
def foo():
    print('hello')


if __name__ == '__main__':
    ray.init()
    handle = foo.remote()
    ray.get(handle)
RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --head
RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 ray start --address='192.168.0.196:6379'
python example.py

Output:
24-11-08 13:54:19,817 INFO worker.py:1601 -- Connecting to existing Ray cluster at address: 192.168.0.196:6379...
2024-11-08 13:54:19,831 INFO worker.py:1777 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265
(foo pid=45881) hello
(foo pid=45881) hello

Issue Severity

Low: It annoys or frustrates me.

A workaround is at: https://github.com/intel-analytics/BigDL-2.x/pull/2799/files

I mitigated this issue by calling this function after starting worker node. Of course, it has many downsides and it's not the way to go in long term.

def kill_redundant_log_monitors():
    """
    Killing redundant log_monitor.py processes.
    If multiple ray nodes are started on the same machine,
    there will be multiple ray log_monitor.py processes
    monitoring the same log dir. As a result, the logs
    will be replicated multiple times and forwarded to driver.
    See issue https://github.com/ray-project/ray/issues/10392
    """

    import psutil
    import subprocess
    log_monitor_processes = []
    for proc in psutil.process_iter(["name", "cmdline"]):
        try:
            cmdline = subprocess.list2cmdline(proc.cmdline())
        except (psutil.AccessDenied, psutil.NoSuchProcess):
            continue
        is_log_monitor = "log_monitor.py" in cmdline
        if is_log_monitor:
            log_monitor_processes.append(proc)

    if len(log_monitor_processes) > 1:
        for proc in log_monitor_processes[1:]:
            proc.kill()
@JakkuSakura JakkuSakura added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 8, 2024
@jcotant1 jcotant1 added core Issues that should be addressed in Ray Core and removed core Issues that should be addressed in Ray Core labels Nov 8, 2024
@kevin85421
Copy link
Member

thank you for reporting the issue!

@kevin85421 kevin85421 self-assigned this Nov 9, 2024
@rynewang rynewang added P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P2 Important issue, but not time-critical
Projects
None yet
4 participants