-
Notifications
You must be signed in to change notification settings - Fork 38
Consistently use Xarray for Grid Reader, Improved Dask Support #1092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@philipc2 why is #954 linked here? Are you planning to tackle this issue in this pull request? It doesn't look like you have any code in place that handles it yet, I have added an initial implementation for #954 in PR 1101. I didn't realize the issue was linked here, and it was one of my tasks for this iteration. I don't want to step on your toes or anything though if you have done work on that specific thing. |
This PR was started after a discussion with @florianziemen on better utilizing Xarray and Dask for loading our Grids. I linked #954 to this one because I was going to explore how to delay the loading of connectivity from a file until the user needs it, which would effectively work as a minimal reader. Instead of having the user manually specify that they want a minimal grid to be read in, having it automatically delayed while still preserving the metadata (i.e. connectivity names that are available, dimensions, etc.) would better align with what Xarray does. The issue with a minimal reader is that the data would still be directly loaded into memory, including and computations that need to be done on it (i.e. converting to zero index, replacing fill values). For extremely large grids this leads to a very noticeable delay in the time it takes to print out information about a @florianziemen and @erogluorhan can share their experiences also. @aaronzedwick let's spend some time on Tuesday to look at this together, I might need some help anyways. |
The only thing I'd like to share now is that Dask should be brought into the game only iif the user explicitly wants it (i.e. feeding UXarray with chunked data, Dask arrays). That said, I am glad to see the direction this PR is going towards, i.e. avoiding explicit Numpy/Dask array conversions from our Xarray data, and instead sticking to Xarray data structures as they already cater great IO experience. |
…rray into philipc2/refactor-io
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is.a much needed improvement! There are just a few point to be addressed though. Please see my inline comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the documentation updates needed according to the changes of coordinates' default from "nodes" to "face centers", I am satisfied with this PR overall. Thanks! Once they are done, I'll approve.
Closes #1093 #1137
Overview
.values
throughout the Grid readers, which preserves the original data (i.e. Dask)Grid.n_nodes_per_face
that was triggering construction inside of therepr
Related
I'm not sure if we are able to use Xarray's Lazy Variables
https://docs.xarray.dev/en/latest/internals/internal-design.html
pydata/xarray#5081