-
-
Notifications
You must be signed in to change notification settings - Fork 734
Avoid deepcopy when submitting graph #8633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ±0 27 suites ±0 11h 18m 56s ⏱️ + 5m 39s For more details on these failures, see this check. Results for commit ef57b4a. ± Comparison against base commit 16aa189. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @fjetter! One suggestion, feel free to ignore.
What's the status of this PR? |
a990563
to
7dbe294
Compare
I rebased and removed the Scheduler->Worker msg change to isolate the graph submission change. I'll rerun benchmarks on this shortly |
Perf tests on this look good for now (these are the ordinary tests). I also ran tpch but tpch didn't show any signal. Looks like most are not actually affected but there are some with strong gains. This is pretty much what I'd expect. The strong red bar is test_set_index_on_uber_lyft which has a significant error the noisy rechunking bars are also noise. static-dashboard - 2025-04-01T174251.730.zip From a perf perspective I consider this a win but CI seems to be extra flaky. I assume this changes some timings which makes the entire thing more fragile. |
It seems like CI overall is currently pretty broken. Timing somewhat aligns with the HLG change but nor perfectly. |
This avoids at least one deepcopy of the graph and therefore reduces overhead during submit
Typically, one would simply add
ToPickle/Serialize
to the dict value and pass the graph through directly. However, this would make it impossible to get clear error messages on (de-)serialization exceptions which is why this was pulled out to a manual call back then.However, passing the header and frames as dictionary values directly causes msgpack to simply copy the bytes into the msgpack bytestream instead of us passing through the frames implicitly.
This avoids an unnecessary copy