Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚸 ICESat-2 ATL11 pre-processing to analysis ready format #10

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Nov 18, 2022

Initial draft tutorial on preparing ICESat-2 ATL11 land ice height time-series data into an analysis ready format.

Preview at https://deploy-preview-10--precious-dragon-494161.netlify.app/chapters/03_prep_land_ice_height_time-series.html

Test on Pangeo Binder (note, may be broken...):

Binder

TODO:

Part of #3.

Initial draft tutorial on preparing ICESat-2 ATL11 land ice height time-series data into an analysis ready format. Currently just a minimal script with some Python library imports and printing a list of library versions.
@weiji14 weiji14 added documentation Improvements or additions to documentation preview labels Nov 18, 2022
@github-actions
Copy link

github-actions bot commented Nov 18, 2022

@geo-smart geo-smart deleted a comment from github-actions bot Nov 18, 2022
Using personal fork of datatree that patches an issue with file pointers. As of https://github.com/weiji14/datatree/commit/d18961b2d1efdf6cfbea3038d6d2f8761edceba4, it's still not perfect though as there is still some ValueError about malformed variables.
Reading the ICESat-2 ATL11 HDF5 files directly from AWS S3 bucket into xarray! Based on lots of tutorials available at https://nasa-openscapes.github.io/earthdata-cloud-cookbook. Still need some bugfixes in xarray and/or xarray-datatree for this to work fully.
Not the most elegant solution, but manually reading the ATL11 pair track groups one by one in the for loop works. If the file pointer bug is fixed, a dictionary comprehension could be used to remove the for loop. Even better if the ValueError about malformed variables disappears, which would make this a one-liner!
Adopt geosmart Jupyter Book template and other content.
No longer using personal fork, use the official version instead!
@github-actions
Copy link

github-actions bot commented Mar 2, 2023

@github-actions github-actions bot temporarily deployed to pull request March 2, 2023 11:06 Inactive
Include placeholder DBSCAN subglacial lake finder demo.
@github-actions github-actions bot temporarily deployed to pull request March 2, 2023 12:23 Inactive
@weiji14 weiji14 temporarily deployed to github-pages March 2, 2023 12:27 — with GitHub Actions Inactive
@github-actions github-actions bot temporarily deployed to pull request May 15, 2023 18:56 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 15, 2023 19:57 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 15, 2023 20:05 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 15, 2023 20:10 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 15, 2023 20:22 Inactive
@JessicaS11 JessicaS11 temporarily deployed to github-pages May 15, 2023 21:18 — with GitHub Actions Inactive
@github-actions github-actions bot temporarily deployed to pull request May 25, 2023 17:59 Inactive
@JessicaS11 JessicaS11 temporarily deployed to github-pages May 25, 2023 17:59 — with GitHub Actions Inactive
Bug was fixed in xarray=2022.12.0. Also linted the code a bit.
Comment on lines +111 to +113
gran_ids = region.avail_granules(ids=True, cloud=True)
print(len(gran_ids[0]))
print(gran_ids[0][:10])
Copy link
Member Author

@weiji14 weiji14 Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JessicaS11, I'm seeing this warning when getting the granule IDs

/srv/conda/envs/notebook/lib/python3.10/site-packages/icepyx/core/granules.py:86: UserWarning: We are still working in implementing ID generation for this data product.
  warnings.warn("We are still working in implementing ID generation for this data product.", UserWarning)

Assuming that the PR at icesat2py/icepyx#426 fixes this somewhat? Will the full s3 urls (e.g. s3://nsidc-cumulus-prod-protected/ATLAS/ATL11/005/2019/09/30/ATL11_005411_0315_005_03.h5) be returned in the list, or just the HDF5 filename (e.g. ATL11_005411_0315_005_03.h5)? I'm hoping for the full s3 urls (because it's been a pain working out how the ATL11 files are organized in the S3 bucket) 🙏

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing this warning when getting the granule IDs

Are you using the latest dev version? I think I only noted it in slack, but

Assuming that the PR at icesat2py/icepyx#426 fixes this somewhat?

Is correct and should fix it entirely (but hasn't been pushed through to a new release yet; I thought I took the warning off but perhaps not?). It returns the full s3 urls (agree on the pain point!), and the rest of this workflow (including reading in with datatree) worked successfully for me. If you're working in CryoCloud, you may need to jump through some extra hoops to use the dev install (so I've made a habit of checking my version at every import).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, let me try things out on the dev branch, and yes I was working on the CryoCloud 😃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, using pip install git+https://github.com/icesat2py/icepyx.git@development to get icepyx=0.7.1.dev9+g66e2863 returns the full s3 urls! I can't tell you how happy I am not having to manually generate a list of 4000+ urls in a file like this ATL11_to_download.txt anymore 😃

Gonna play with those ATL11 files and xarray-datatree a bit more now, hoping that this map_over_subtree method can save me from writing some nasty for-loops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation preview
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants