[Core] Ray core tasks tutorial not works. error msg: `Error Type: WORKER_DIED` #51214

jankinf · 2025-03-10T12:12:00Z

Description

Issue with Ray Remote Functions Tutorial Example

1. Severity of the issue: (select one)
[x] Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

Ray version: 2.42.0
Python version: 3.9.21
OS: Ubuntu 22.04
Cloud/Infrastructure: N/A
Other libs/tools (if relevant): N/A

3. What happened vs. what you expected:

Expected: All remote functions in the official Ray tasks tutorial should execute successfully when submitted as a Ray job
Actual: Only functions with explicit ray.get() calls complete successfully; tasks without ray.get() fail in the dashboard

Problem Description

I'm following the official Ray tutorial for remote functions (https://docs.ray.io/en/releases-2.42.0/ray-core/tasks.html), but the example doesn't work as expected when submitted as a Ray job. The dashboard shows that my_function runs successfully, but all four slow_function tasks fail. (Error Type: WORKER_DIED

Job finishes (1d000000) as driver exits. Marking all non-terminal tasks as failed.)

Here's the tutorial code I'm running:

import ray
import time


# A regular Python function.
def normal_function():
    return 1


# By adding the `@ray.remote` decorator, a regular Python function
# becomes a Ray remote function.
@ray.remote
def my_function():
    return 1


# To invoke this remote function, use the `remote` method.
# This will immediately return an object ref (a future) and then create
# a task that will be executed on a worker process.
obj_ref = my_function.remote()

# The result can be retrieved with ``ray.get``.
assert ray.get(obj_ref) == 1


@ray.remote
def slow_function():
    time.sleep(10)
    return 1


# Ray tasks are executed in parallel.
# All computation is performed in the background, driven by Ray's internal event loop.
for _ in range(4):
    # This doesn't block.
    slow_function.remote()

I'm running it with:

RAY_ENABLE_RECORD_ACTOR_TASK_LOGGING=1 RAY_ADDRESS='http://xxx.xxx.xxx.xxx:8265' ray job submit --no-wait --working-dir . -- python ray_tutor/tasks.py

Important observation: Only when I modify the code to use ray.get() to collect the results from the slow functions does the dashboard show all tasks running successfully:

# Modified version that works
refs = [slow_function.remote() for _ in range(4)]
ray.get(refs)  # Wait for all tasks to complete

I believe this is confusing for new users following the tutorial. The example code suggests these remote tasks will run in the background, but they're failing silently when the program exits before they complete.

Questions

Is this the expected behavior?
Is there a way to ensure background tasks complete without explicitly calling ray.get()?

Thank you for your help!

Link

No response

The text was updated successfully, but these errors were encountered:

jankinf added docs An issue or change related to documentation triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Ray core tasks tutorial not works. error msg: `Error Type: WORKER_DIED` #51214

[Core] Ray core tasks tutorial not works. error msg: `Error Type: WORKER_DIED` #51214

jankinf commented Mar 10, 2025

[Core] Ray core tasks tutorial not works. error msg: Error Type: WORKER_DIED #51214

[Core] Ray core tasks tutorial not works. error msg: Error Type: WORKER_DIED #51214

Comments

jankinf commented Mar 10, 2025

Description

Issue with Ray Remote Functions Tutorial Example

Problem Description

Questions

Link

[Core] Ray core tasks tutorial not works. error msg: `Error Type: WORKER_DIED` #51214

[Core] Ray core tasks tutorial not works. error msg: `Error Type: WORKER_DIED` #51214