Skip to content

Avoid deepcopy when submitting graph #8633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025
Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented May 2, 2024

This avoids at least one deepcopy of the graph and therefore reduces overhead during submit

Typically, one would simply add ToPickle/Serialize to the dict value and pass the graph through directly. However, this would make it impossible to get clear error messages on (de-)serialization exceptions which is why this was pulled out to a manual call back then.

However, passing the header and frames as dictionary values directly causes msgpack to simply copy the bytes into the msgpack bytestream instead of us passing through the frames implicitly.

This avoids an unnecessary copy

Copy link
Contributor

github-actions bot commented May 2, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    27 files  ±0      27 suites  ±0   11h 18m 56s ⏱️ + 5m 39s
 4 100 tests ±0   3 984 ✅ ±0    112 💤 ±0  4 ❌ ±0 
51 400 runs  ±0  49 084 ✅  - 3  2 309 💤 ±0  7 ❌ +3 

For more details on these failures, see this check.

Results for commit ef57b4a. ± Comparison against base commit 16aa189.

♻️ This comment has been updated with latest results.

@hendrikmakait hendrikmakait self-requested a review May 2, 2024 12:43
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fjetter! One suggestion, feel free to ignore.

@fjetter
Copy link
Member Author

fjetter commented May 3, 2024

well, that's disappointing

image

I hope/suspect that's the Serialize. I'll try with sticking to ToPickle

@hendrikmakait
Copy link
Member

What's the status of this PR?

@fjetter fjetter force-pushed the dont_use_topickle_hlg branch from a990563 to 7dbe294 Compare April 1, 2025 09:22
@fjetter
Copy link
Member Author

fjetter commented Apr 1, 2025

I rebased and removed the Scheduler->Worker msg change to isolate the graph submission change. I'll rerun benchmarks on this shortly

@fjetter
Copy link
Member Author

fjetter commented Apr 3, 2025

Perf tests on this look good for now (these are the ordinary tests). I also ran tpch but tpch didn't show any signal. Looks like most are not actually affected but there are some with strong gains. This is pretty much what I'd expect.

image

The strong red bar is test_set_index_on_uber_lyft which has a significant error

image

the noisy rechunking bars are also noise.

static-dashboard - 2025-04-01T174251.730.zip

From a perf perspective I consider this a win but CI seems to be extra flaky. I assume this changes some timings which makes the entire thing more fragile.

@fjetter
Copy link
Member Author

fjetter commented Apr 3, 2025

It seems like CI overall is currently pretty broken. Timing somewhat aligns with the HLG change but nor perfectly.

@fjetter fjetter merged commit 3f17ddb into dask:main Apr 17, 2025
25 of 31 checks passed
@fjetter fjetter deleted the dont_use_topickle_hlg branch April 17, 2025 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants