Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Ray core tasks tutorial not works. error msg: Error Type: WORKER_DIED #51214

Open
jankinf opened this issue Mar 10, 2025 · 0 comments
Open
Labels
docs An issue or change related to documentation triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@jankinf
Copy link

jankinf commented Mar 10, 2025

Description

Issue with Ray Remote Functions Tutorial Example

1. Severity of the issue: (select one)
[x] Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

  • Ray version: 2.42.0
  • Python version: 3.9.21
  • OS: Ubuntu 22.04
  • Cloud/Infrastructure: N/A
  • Other libs/tools (if relevant): N/A

3. What happened vs. what you expected:

  • Expected: All remote functions in the official Ray tasks tutorial should execute successfully when submitted as a Ray job
  • Actual: Only functions with explicit ray.get() calls complete successfully; tasks without ray.get() fail in the dashboard

Problem Description

I'm following the official Ray tutorial for remote functions (https://docs.ray.io/en/releases-2.42.0/ray-core/tasks.html), but the example doesn't work as expected when submitted as a Ray job. The dashboard shows that my_function runs successfully, but all four slow_function tasks fail. (Error Type: WORKER_DIED

Job finishes (1d000000) as driver exits. Marking all non-terminal tasks as failed.)

Image

Here's the tutorial code I'm running:

import ray
import time


# A regular Python function.
def normal_function():
    return 1


# By adding the `@ray.remote` decorator, a regular Python function
# becomes a Ray remote function.
@ray.remote
def my_function():
    return 1


# To invoke this remote function, use the `remote` method.
# This will immediately return an object ref (a future) and then create
# a task that will be executed on a worker process.
obj_ref = my_function.remote()

# The result can be retrieved with ``ray.get``.
assert ray.get(obj_ref) == 1


@ray.remote
def slow_function():
    time.sleep(10)
    return 1


# Ray tasks are executed in parallel.
# All computation is performed in the background, driven by Ray's internal event loop.
for _ in range(4):
    # This doesn't block.
    slow_function.remote()

I'm running it with:

RAY_ENABLE_RECORD_ACTOR_TASK_LOGGING=1 RAY_ADDRESS='http://xxx.xxx.xxx.xxx:8265' ray job submit --no-wait --working-dir . -- python ray_tutor/tasks.py

Important observation: Only when I modify the code to use ray.get() to collect the results from the slow functions does the dashboard show all tasks running successfully:

# Modified version that works
refs = [slow_function.remote() for _ in range(4)]
ray.get(refs)  # Wait for all tasks to complete

I believe this is confusing for new users following the tutorial. The example code suggests these remote tasks will run in the background, but they're failing silently when the program exits before they complete.

Questions

  1. Is this the expected behavior?
  2. Is there a way to ensure background tasks complete without explicitly calling ray.get()?

Thank you for your help!

Link

No response

@jankinf jankinf added docs An issue or change related to documentation triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs An issue or change related to documentation triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

1 participant