-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚸 ICESat-2 ATL11 pre-processing to analysis ready format #10
base: main
Are you sure you want to change the base?
Conversation
Initial draft tutorial on preparing ICESat-2 ATL11 land ice height time-series data into an analysis ready format. Currently just a minimal script with some Python library imports and printing a list of library versions.
Using personal fork of datatree that patches an issue with file pointers. As of https://github.com/weiji14/datatree/commit/d18961b2d1efdf6cfbea3038d6d2f8761edceba4, it's still not perfect though as there is still some ValueError about malformed variables.
Reading the ICESat-2 ATL11 HDF5 files directly from AWS S3 bucket into xarray! Based on lots of tutorials available at https://nasa-openscapes.github.io/earthdata-cloud-cookbook. Still need some bugfixes in xarray and/or xarray-datatree for this to work fully.
Not the most elegant solution, but manually reading the ATL11 pair track groups one by one in the for loop works. If the file pointer bug is fixed, a dictionary comprehension could be used to remove the for loop. Even better if the ValueError about malformed variables disappears, which would make this a one-liner!
Adopt geosmart Jupyter Book template and other content.
No longer using personal fork, use the official version instead!
Include placeholder DBSCAN subglacial lake finder demo.
Just for the sake of a quick demo.
Bug was fixed in xarray=2022.12.0. Also linted the code a bit.
gran_ids = region.avail_granules(ids=True, cloud=True) | ||
print(len(gran_ids[0])) | ||
print(gran_ids[0][:10]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JessicaS11, I'm seeing this warning when getting the granule IDs
/srv/conda/envs/notebook/lib/python3.10/site-packages/icepyx/core/granules.py:86: UserWarning: We are still working in implementing ID generation for this data product.
warnings.warn("We are still working in implementing ID generation for this data product.", UserWarning)
Assuming that the PR at icesat2py/icepyx#426 fixes this somewhat? Will the full s3 urls (e.g. s3://nsidc-cumulus-prod-protected/ATLAS/ATL11/005/2019/09/30/ATL11_005411_0315_005_03.h5
) be returned in the list, or just the HDF5 filename (e.g. ATL11_005411_0315_005_03.h5
)? I'm hoping for the full s3 urls (because it's been a pain working out how the ATL11 files are organized in the S3 bucket) 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing this warning when getting the granule IDs
Are you using the latest dev version? I think I only noted it in slack, but
Assuming that the PR at icesat2py/icepyx#426 fixes this somewhat?
Is correct and should fix it entirely (but hasn't been pushed through to a new release yet; I thought I took the warning off but perhaps not?). It returns the full s3 urls (agree on the pain point!), and the rest of this workflow (including reading in with datatree) worked successfully for me. If you're working in CryoCloud, you may need to jump through some extra hoops to use the dev install (so I've made a habit of checking my version at every import).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool, let me try things out on the dev branch, and yes I was working on the CryoCloud 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, using pip install git+https://github.com/icesat2py/icepyx.git@development
to get icepyx=0.7.1.dev9+g66e2863
returns the full s3 urls! I can't tell you how happy I am not having to manually generate a list of 4000+ urls in a file like this ATL11_to_download.txt
anymore 😃
Gonna play with those ATL11 files and xarray-datatree a bit more now, hoping that this map_over_subtree
method can save me from writing some nasty for-loops.
Initial draft tutorial on preparing ICESat-2 ATL11 land ice height time-series data into an analysis ready format.
Preview at https://deploy-preview-10--precious-dragon-494161.netlify.app/chapters/03_prep_land_ice_height_time-series.html
Test on Pangeo Binder (note, may be broken...):
TODO:
Part of #3.