-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace push implementation with map_overlap for Dask #9712
base: main
Are you sure you want to change the base?
Conversation
This can't work in general. We switched from map_overlap to cumreduction intentionally: #6118 . I fully support improving |
See this test: xarray/xarray/tests/test_duck_array_ops.py Lines 1008 to 1028 in a00bc91
|
Oh I didn't consider None as limit, sorry about that |
You could experiment with |
I thought more about this, I didn't consider large limit values properly.
Bunch of operations that can't be fused properly because of the interactions between the 2 different arrays. I'll think a bit more if we can reduce this down somehow, but there isn't anything obvious right away (at least not to me).
|
Can you add Also, you should be able to write this as a single |
Oh, I'll check that one out. Sure, there shouldn't be any reason not to add this. I'll check out the flox implementation, getting the number of tasks down would be nice, but adding it in Dask should be a good option anyway |
Ah I guess the core issue is a dependency on |
That should be fine, we can just raise if neither is installed, similar to what you are doing here |
whats-new.rst
api.rst
Our benchmarks here showed us that ffill alone adds 4.5 million tasks to the graph which isn't great (the dataset has 550k chunks, so a multiplication of 9).
Rewriting this with map_overlap gets this down to 1.5 million tasks, which is basically the number of chunks times 3, which is the minimum that we can get to at the moment.
We merged a few map_overlap improvements today on the dask side to make this possible, but it's now a nice improvement (also makes code on the xarray side easier).
cc @dcherian