Add xarray backends for fgmax and fgout gridded formats #611

kbarnhart · 2024-05-17T22:01:02Z

Here is a PR that addresses #598.

It may not be the most efficient (I flipud and transpose each array to put them in the orientation that xarray wants), but I find that it works.

I've tested the fgmax method on dclaw and geoclaw (the chile 2010 fgout-fgmax example) with each option for the number of variables written.

I've tested the fgout method with dclaw and geoclaw for both binary and ascii type files.

One point of discussion is whether you like the variable names I've use for fgmax (they are not exactly the same as in the fgmax object (e.g., I use h_max instead of h). I'm happy to change them to be identical if that is important to either of you.

I added an example usage script in the chile2010 fgout-fgmax example as it seemed the most sensible place to put an example.

Happy to improve this as you see fit.

… run directory).

rjleveque · 2024-05-23T14:48:58Z

Thanks @kbarnhart for providing this! I haven't experimented with it much yet, but a few comments...

I had to install rioxarray, which is not part of xarray.
Nice that for binary .b files it seems to work fine regardless of whether binary32 or binary64 is used.
When reading fgmax arrays, it seems like the topography B field should also be returned by default. This is often needed in processing the results (e.g. to plot h_max only on shore).

Eventually it would be nice to integrate this better with fgout_tools.py and fgmax_tools.py so that the xarray datasets could be worked with in the same way as the ndarray option currently supported.

I hope to revamp the fgmax output options in the near future, e.g. to allow binary output and to make it easier to specify exactly what variables should be monitored. We are also working on improving the arrival time monitoring to keep track of various different times, which we have found necessary for tsunami studies in regions of vertical land motion and for maritime modeling where drawdown is important as well as increases in depth. That might be a good time to rethink a lot of the interface and try to better coordinate these tools.

But for now it seems fine to merge this in, perhaps with some tweaks, so that it can be used for your purposes.

@mandli and @ketch: I note that our new attempt at CI throws an error for this PR in the setup step:

cd geoclaw
git checkout refs/pull/611/merge
error: pathspec 'refs/pull/611/merge' did not match any file(s) known to git

Not sure if this is something to investigate?

kbarnhart · 2024-05-24T22:21:52Z

@rjleveque thanks for your comments. I'll make some updates to address them and ping you here when they are ready to consider.

happy to be involved in integrating these better.

mandli · 2024-06-02T19:57:15Z

Would the file xarray_backends.py be the location where any xarray backend would go? For example would we stick a gauge xarray backend in there?

kbarnhart · 2024-06-04T14:43:15Z

@mandli - I thought a tiny amount about this and it seemed to me like there were two options. Either all the xarray backends go in one module or each xarray backend goes inside a module associated with the relevant clawpack data structure (e.g., the FGOutBackend goes in fgout_tools.py).

I have a slight preference towards the latter option. However, I implemented the former because I didn't want to make the fgout_tools and fgmax_tools files longer and make it hard to discover the backends.

All of this is to say that I'm not sure the best place for these within clawpack.geoclaw.

mandli · 2024-06-04T14:53:50Z

@kbarnhart Those were the two options that came to my mind as well. Given that gauges are not actually handled here in GeoClaw but in PyClaw (it's common across most of the Clawpack packages) it may make more sense to keep them in the associated data structures. Similarly we could/should do something with other data formats, such as friction from land use or storm data, for which there may be multiple backends. Should be easy to change up until we merge this fully and make a release though.

mandli · 2024-06-04T14:54:27Z

Variables when it makes sense should be named the same thing probably to reduce confusion although I do like the more descriptive variable you mentioned.

kbarnhart · 2024-06-05T20:13:48Z

Some notes based on conversation with @rjleveque:

Keep the more descriptive names
Keep these backends in their current file
In addition to adding B as a variable, add level

…oclaw into add_xarray_backends

rjleveque · 2024-08-20T00:33:33Z

@kbarnhart: I'm finally getting back to looking at this and I see that the flexibility recently added to fgout variables means that these xarray tools won't always work.

What you call _qelements_geoclaw is what I called qmap (with default lists for geoclaw or dclaw) in the latest fgout_tools.FGoutGrid class, but now a user-specified qmap can used if the user does not output all variables at specifies a different fgout.q_out_vars, as described at https://www.clawpack.org/dev/fgout.html#specifying-q-out-vars.

Class fgout_tools.FGoutFrame includes various properties for computing B from h and eta or any one of these from the other two, giving more flexibility.

I'm not sure if it's worth trying to incorporate these changes in the xarray backend, or perhaps easier to read in using fgout_tools and then convert to xarrrays, or maybe fine to merge this for now and work on enhancements later?

rjleveque · 2024-08-20T00:37:04Z

Another thing you might want to add to your sample chile2010_fgmax-fgout/use_xarray_backends.py:

# combine all fgout files into a single netcdf file:
import glob
fgout_files = glob.glob('_output/fgout0001.b*')
fgouts = []
for filename in fgout_files:
    ds = xr.open_dataset(filename, engine=FGOutBackend,
                         backend_kwargs={'epsg':epsg_code})
    fgouts.append(ds)

fgout_all = xarray.concat(fgouts, dim='time')
fgout_all.to_netcdf('fgout0001.nc')

Or maybe there's a cleaner way to do this. At any rate it could be useful for reducing all the fgout files to a single file.

kbarnhart · 2024-08-20T22:52:03Z

@rjleveque - Today I've finally gotten around to updating this for compatibility with your changes. I think I've now made the changes needed to make it work and have tested it with the geoclaw example and the radial slide example over on geoflows/dclaw#17.
both work.

My only thought is that it would be really nice if dclaw could have a default that would not require always setting

fgout.q_out_vars=[...]

However, I don't think it is a big deal to always need to set this for dclaw. I've discussed with Dave and we agree that it is fine as is.

It is going to be great to not need to write all elements of q.

I added an example of opening all the files per your suggestion. There is a slightly easier way to do this with xr.open_mfdataset() that I used.

I think that this is ready to merge.

kbarnhart · 2024-08-20T22:56:22Z

src/python/geoclaw/xarray_backends.py

+    "B": "meters",
+    "huc": "meters squared per second",
+    "hvc": "meters squared per second",
+}


@rjleveque I did have a question about whether the units for huc and hvc were correct. I don't know how the corrections are applied (if they are added or multiplied or something else, so I wasn't sure what to put).

@kbarnhart: Interesting question, I think for the SGN equations the units are "meters per second squared" but for the MS equations "meters squared per second squared" because of the different scalings that get used when using these terms to update hu and hv. So I'm not sure what to put here, maybe "varies"? Normally the user would not use these terms for anything directly and would only output them for debugging purposes, perhaps.

Another related question: For dclaw you call the 7th element of q "delta_a" whereas in qmap I called it "bdif" because in digclaw_module.f90, i_bdif=7 is the index name used. We should be consistent, is delta_a a better name?

Thanks for the units information. I've changed it to 'varies'.

'bdif' is better than 'delta_a'. This has been called different things at different times. At present it is also called 'b_eroded' internally and in the dclaw data.py docstrings (I'll change that for consistency).

I've changed delta_a to bdif as well. Good catch.

rjleveque · 2024-08-21T15:47:19Z

src/python/geoclaw/xarray_backends.py

+        elif qmap == "geoclaw":
+            _qelements = _qelements_geoclaw
+        elif qmap == "geoclaw-bouss":
+            _qelements = _qelements_geoclaw_bouss


If qmap isn't one of these three choices, then I think _qelements will be left uninitialized, leading to an error later? Should this have:

else: _qelements = list(qmap.keys())

or even just set _qelements this way in general without needing _qelements_geoclaw etc and this if test at all?

Also, maybe qmap can be used more directly below without needing _qelements at all?

But I'm not sure what drop_variables is intended for, is this to further drop things that were already output in the fgout files, or is it to list variables that were not output to begin with (which is information already included in qmap if that is set properly, I think)?

looks like I had made _qelements vestigial and not removed it. Now removed.

drop_variables is a requirement of the BackendEntrypoint class. It would be used to further drop things from the fgout file (e.g., h and eta in the fgout file but only h in the xarray dataset).

https://docs.xarray.dev/en/stable/generated/xarray.backends.BackendEntrypoint.html#xarray.backends.BackendEntrypoint

rjleveque · 2024-08-21T17:11:52Z

Thanks @kbarnhart!

Reading a single fgout frame still works fine, but the new code in use_xarray_backends.py:

fgout_files = glob.glob("_output/fgout0001.b*")
ds_all = xr.open_mfdataset(
    fgout_files,
    engine=FGOutBackend,
    backend_kwargs={"epsg": epsg_code, "qmap": "geoclaw"},
)

throws an error for me when trying to read the second file:

$ python use_xarray_backends.py
Reading fgout grid info from 
    _output/fgout_grids.data
Reading input for fgno=1, point_style = 2 
Found fgout grid q_out_vars =  [1, 2, 3, 4]
Using this mapping to fgout variable names: 
      qmap =  {'h': 1, 'hu': 2, 'hv': 3, 'eta': 4, 'B': 5}
    Reading  Frame 1 at t = 0  from outdir = /Users/rjl/git/clawpack/geoclaw/examples/tsunami/chile2010_fgmax-fgout/_output
Reading fgout grid info from 
    _output/fgout_grids.data
Reading input for fgno=1, point_style = 2 
Found fgout grid q_out_vars =  [1, 2, 3, 4]
Using this mapping to fgout variable names: 
      qmap =  {'h': 1, 'hu': 2, 'hv': 3, 'eta': 4, 'B': 5}
    Reading  Frame 10 at t = 8100  from outdir = /Users/rjl/git/clawpack/geoclaw/examples/tsunami/chile2010_fgmax-fgout/_output
Traceback (most recent call last):
  File "/Users/rjl/git/clawpack/geoclaw/examples/tsunami/chile2010_fgmax-fgout/use_xarray_backends.py", line 36, in <module>
    ds_all = xr.open_mfdataset(
  File "/opt/homebrew/lib/python3.10/site-packages/xarray/backends/api.py", line 1012, in open_mfdataset
    datasets = [open_(p, **open_kwargs) for p in paths]
  File "/opt/homebrew/lib/python3.10/site-packages/xarray/backends/api.py", line 1012, in <listcomp>
    datasets = [open_(p, **open_kwargs) for p in paths]
  File "/opt/homebrew/lib/python3.10/site-packages/xarray/backends/api.py", line 576, in open_dataset
    ds = _dataset_from_backend_dataset(
  File "/opt/homebrew/lib/python3.10/site-packages/xarray/backends/api.py", line 371, in _dataset_from_backend_dataset
    ds = _chunk_ds(
  File "/opt/homebrew/lib/python3.10/site-packages/xarray/backends/api.py", line 319, in _chunk_ds
    chunkmanager = guess_chunkmanager(chunked_array_type)
  File "/opt/homebrew/lib/python3.10/site-packages/xarray/core/parallelcompat.py", line 93, in guess_chunkmanager
    raise ValueError(
ValueError: unrecognized chunk manager dask - must be one of: []

Apparently dask is required for this approach to work. Should we be requiring this, or would it be better to revert to the slightly clumsier approach I used?

(now tested for both absence of h, and absence of h, eta, and B)

kbarnhart · 2024-08-21T17:37:12Z

@rjleveque Just updated to have options for both with/without Dask. I also tested and improved the drytolerance (hardcoded to 0.001) handling. Let me know if it works for you.

If there is a way to read in the user-specified drytolerance, that is one final improvement to this that I can think of.

rjleveque · 2024-08-21T18:32:16Z

@kbarnhart: use_xarray_backends.py runs if I replace

except ImportError:  # if dask is not available, use xr.concat.

with

except ValueError:  # if dask is not available, use xr.concat.

rjleveque · 2024-08-21T18:35:29Z

Is it clear you always want to mask out dry values in B and/or eta? The user may need to know B even at dry points for plotting.

You could have a default in

  def open_dataset(... , dry_tol=None, ...)

for example to mean no masking, or let the user specify the value to use at this point?

kbarnhart · 2024-08-21T19:26:32Z

@rjleveque I've added user-specified dry_tolerance (default = 0.001, but dry_tolerance=None means no masking at all).

mask is never applied to B or eta, this seems like good default usage to me, but if you think it would be better to do something different, I'm happy to change.

Also, if it would be easier for a quick zoom to discuss, let me know. I'm pretty free for the rest of the day. Thanks for all these comments!

kbarnhart · 2024-08-21T19:26:56Z

@kbarnhart: use_xarray_backends.py runs if I replace

except ImportError:  # if dask is not available, use xr.concat.

with

except ValueError:  # if dask is not available, use xr.concat.

I should have fixed this. I just got rid of the specific exception type.

rjleveque · 2024-08-21T19:43:23Z

I think you removed ImportError from the wrong line.

Otherwise it looks fine and the dry_tolerance makes sense to me.

kbarnhart · 2024-08-21T19:52:16Z

🤦‍♀️ Thanks. Should be fixed now.

rjleveque · 2024-08-21T20:05:36Z

Works for me, I'm going to merge it.

Thanks!!

kbarnhart added 4 commits May 17, 2024 15:52

initial commit of xarray backends and example use file

b6b6308

Merge branch 'master' into add_xarray_backends

3119ed5

remove extra words

ef4c778

relax assumption of where command is issued (no longer needs to be in…

744e0e6

… run directory).

kbarnhart added 7 commits June 5, 2024 14:26

add more information about rioxarray

9925e7c

better import error for rioxarray

51fb840

add B and level

a8bc42c

Merge branch 'clawpack:master' into add_xarray_backends

972d838

add clip example

dac2a2b

Merge branch 'add_xarray_backends' of https://github.com/kbarnhart/ge…

a930916

…oclaw into add_xarray_backends

return ds

d83bd23

kbarnhart added 4 commits August 20, 2024 10:38

wip

c0ba204

Merge branch 'master' into add_xarray_backends

f51f2f6

first pass at using revised fgout data structure with backends.

e21b294

add a few more comments.

eb3cd20

kbarnhart commented Aug 20, 2024

View reviewed changes

kbarnhart added 2 commits August 21, 2024 09:45

name updates

92375ff

bdiff->bdif

f69c708

rjleveque reviewed Aug 21, 2024

View reviewed changes

remove vestigial _qelements

145233c

kbarnhart added 2 commits August 21, 2024 11:34

Add example that should work with/without dask

66d0d63

improve drytol handling

eb7981e

(now tested for both absence of h, and absence of h, eta, and B)

kbarnhart added 3 commits August 21, 2024 13:09

don't specify error type.

d4079f8

switch from hard coded drytol to user-specified drytol

61460c1

update use of dry_tolerance

f61d155

fix exceptions.

987417a

rjleveque merged commit 88cdbc7 into clawpack:master Aug 23, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xarray backends for fgmax and fgout gridded formats #611

Add xarray backends for fgmax and fgout gridded formats #611

kbarnhart commented May 17, 2024

rjleveque commented May 23, 2024

kbarnhart commented May 24, 2024

mandli commented Jun 2, 2024

kbarnhart commented Jun 4, 2024

mandli commented Jun 4, 2024

mandli commented Jun 4, 2024

kbarnhart commented Jun 5, 2024

rjleveque commented Aug 20, 2024

rjleveque commented Aug 20, 2024

kbarnhart commented Aug 20, 2024 •

edited

Loading

kbarnhart Aug 20, 2024

rjleveque Aug 21, 2024

kbarnhart Aug 21, 2024

rjleveque Aug 21, 2024

kbarnhart Aug 21, 2024

rjleveque commented Aug 21, 2024

kbarnhart commented Aug 21, 2024

rjleveque commented Aug 21, 2024

rjleveque commented Aug 21, 2024

kbarnhart commented Aug 21, 2024

kbarnhart commented Aug 21, 2024

rjleveque commented Aug 21, 2024

kbarnhart commented Aug 21, 2024

rjleveque commented Aug 21, 2024

Add xarray backends for fgmax and fgout gridded formats #611

Add xarray backends for fgmax and fgout gridded formats #611

Conversation

kbarnhart commented May 17, 2024

rjleveque commented May 23, 2024

kbarnhart commented May 24, 2024

mandli commented Jun 2, 2024

kbarnhart commented Jun 4, 2024

mandli commented Jun 4, 2024

mandli commented Jun 4, 2024

kbarnhart commented Jun 5, 2024

rjleveque commented Aug 20, 2024

rjleveque commented Aug 20, 2024

kbarnhart commented Aug 20, 2024 • edited Loading

kbarnhart Aug 20, 2024

Choose a reason for hiding this comment

rjleveque Aug 21, 2024

Choose a reason for hiding this comment

kbarnhart Aug 21, 2024

Choose a reason for hiding this comment

rjleveque Aug 21, 2024

Choose a reason for hiding this comment

kbarnhart Aug 21, 2024

Choose a reason for hiding this comment

rjleveque commented Aug 21, 2024

kbarnhart commented Aug 21, 2024

rjleveque commented Aug 21, 2024

rjleveque commented Aug 21, 2024

kbarnhart commented Aug 21, 2024

kbarnhart commented Aug 21, 2024

rjleveque commented Aug 21, 2024

kbarnhart commented Aug 21, 2024

rjleveque commented Aug 21, 2024

kbarnhart commented Aug 20, 2024 •

edited

Loading