-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow FieldTimeSeries
to pass keyword arguments to jldopen
#3739
Conversation
Right On ClimaOcean there's an example how we extended this infrastructure to a particular kind of NetCDF backend: https://github.com/CliMA/ClimaOcean.jl/blob/dd9148a8f702699ddf2947d3a1a094adddbd9bb1/src/DataWrangling/JRA55.jl#L218 The implementation in this PR seems simple enough though. @ali-ramadhan maybe can you provide a little syntax example how you expect this to be used? @navidcy take a look and see what you think |
I've used it to open JLD2 files in parallel using multiple threads with u = FieldTimeSeries("fields.jld2", "u";
backend = OnDisk(),
backend_kw = Dict(:parallel_read => true)
)
Nt = length(u)
temporal_reduction = zeros(Nt)
Threads.@threads for n in 1:Nt
temporal_reduction[n] = median(u[n])
end Of course, this post-processing is much slower than just doing it on-the-fly as the simulation runs. But it's still useful to be able to do this. One especially useful use case is parallel plotting from a |
Nice! Only API suggestion is that the extra kw could be directly passed to u = FieldTimeSeries("fields.jld2", "u"; backend = OnDisk(), parallel_read=true) Under the hood this can work with something like function FieldTimeSeries(path::String, name::String; backend=InMemory(), kw...)
# etc
backend_kw = Dict(kw)
# etc
end You're the primary user though so you can state your preference! |
I agree that I'm just wondering if there will be a use case in the future that will require passing a different set of kwargs. Then we'll need |
11d727d
to
14c81a0
Compare
I thought about this a bit more and further think we should stick to keeping |
14c81a0
to
7e9f66e
Compare
We could also change |
Definitely agree the two interfaces should be consistent. I'd be okay with either, although still slightly favor the current interface in case we will want to pass two sets of kwargs in the future. Would you support merging this PR and we can open an issue to discuss the best interface? |
Yes for sure support that |
Was hoping @navidcy might review but I can give it one |
return FieldTimeSeries{LX, LY, LZ}(data, grid, backend, boundary_conditions, | ||
indices, times, path, name, time_indexing) | ||
return FieldTimeSeries{LX, LY, LZ}(data, grid, backend, boundary_conditions, indices, | ||
times, path, name, time_indexing, backend_kw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think theere needs to be changes to adapt_structure
which I do not see in this PR. Possibly this is not tested right now. @simone-silvestri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Ah true. I guess originally we didn't envision passing FieldTimeSeries
into GPU kernels and I don't see a test that does this.
But with features like FieldTimeSeries
forcing (PR #3760) an adapt_structure
will be necessary. Actually not sure how tests will pass in that PR without adapting.
I can add some tests here though. There are no tests that use backend_kw
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think there are no tests about this. The adapt for a FieldTimeSeries returns a different type: GPUAdaptedFieldTimeSeries
that is a little slimmer in terms of parameter space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but this PR breaks FieldTimeSeries
on GPU which will break everything in ClimaOcean. @simone-silvestri can you add some tests here?
Thanks @simone-silvestri for adding a GPU test for Actually I should add a test that uses the new kwargs. Would do a test where multiple threads open the same |
Updated the branch and added tests that |
Just bumping my review request 🥺 |
Co-authored-by: Simone Silvestri <[email protected]>
I am just not extremely sure about the name |
I do agree that But yeah there's a backend where the data is stored, e.g. in memory or on disk, but there's also how the data is read, e.g. JLD2 or NetCDF. "Backend kwargs" should go into the call to Maybe the better names will become clear once |
Curious if @glwagner has any thoughts on |
This PR takes some code I had to pass kwargs to
jldopen
throughFieldTimeSeries
.But from the dicussion in issue #3680 it sounds like maybe a more future-proof way to achieve this is to add the kwargs object to the backend object (either
InMemory
orOnDisk
)?I guess there's a bit of a conflict of wording here? Backend refers to how the data is stored for a
FieldTimeSeries
. So we could end up withOnDisk{NetCDF}
andOnDisk{JLD2}
for example. And maybe also forPartlyInMemory
. But for fullyInMemory
there's no "backend"?Resolves #3680