-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚸 ICESat-2 ATL11 pre-processing to analysis ready format #10
Draft
weiji14
wants to merge
18
commits into
main
Choose a base branch
from
preprocess-atl11
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
b9598dc
:construction: ICESat-2 ATL11 pre-processing to analysis ready format
weiji14 8498e17
:egg: Switch to using datatree patched for ICESat-2 HDF5 files
weiji14 327c585
:twisted_rightwards_arrows: Merge branch 'main' into preprocess-atl11
weiji14 7d5b5d4
:alembic: Experimental code for cloud access to ICESat-2 ATL11
weiji14 5bfa893
:poop: Read three ATL11 pair tracks into datatree using for-loop
weiji14 3b62509
:twisted_rightwards_arrows: Merge branch 'main' into preprocess-atl11
weiji14 3ee030f
:pushpin: Pin xarray-datatree to 0.0.11
weiji14 c43694b
:twisted_rightwards_arrows: Merge branch 'main' into preprocess-atl11
weiji14 6d13551
:triangular_flag_on_post: Temporarily trigger Jupyter Book build to G…
weiji14 88de11b
simplify filenames and remove unneeded notebook
JessicaS11 f5a177b
add about icesat2 and basic data access resources
JessicaS11 13b086d
add cloud data access via icepyx
JessicaS11 b11a132
work on debugging data access (in progress)
JessicaS11 96bfa5d
Merge branch 'main' into preprocess-atl11
JessicaS11 895d317
fix typos
JessicaS11 9c07ad4
fix more typos and links/formatting
JessicaS11 ee105f9
confirm s3urls are obtained for atl11; open one in datatree
JessicaS11 72da611
:adhesive_bandage: Remove .seek(0) workaround
weiji14 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
# -*- coding: utf-8 -*- | ||
# --- | ||
# jupyter: | ||
# jupytext: | ||
# text_representation: | ||
# extension: .py | ||
# format_name: percent | ||
# format_version: '1.3' | ||
# jupytext_version: 1.14.1 | ||
# kernelspec: | ||
# display_name: Python 3 (ipykernel) | ||
# language: python | ||
# name: python3 | ||
# --- | ||
|
||
# %% [markdown] tags=[] user_expressions=[] | ||
# # Accessing ICESat-2/ATL11 Land Ice Height time-series | ||
# | ||
# In this tutorial, we'll step through a data pipeline on turning point cloud | ||
# time-series data from the ICESat-2 laser altimeter into an analysis ready | ||
# format. By the end of this lesson, you should be able to: | ||
# | ||
# - Access ICESat-2 data in a Hierarchical Data Format (HDF5) from the cloud | ||
# - Construct a data pipeline that combines multiple ICESat-2 tracks and laser | ||
# beams into a flat data structure | ||
# - Store the pre-processed data in a cloud-optimized, analysis ready data | ||
# format | ||
# | ||
# References: | ||
# - https://github.com/weiji14/deepicedrain/blob/v0.4.2/atl06_to_atl11.ipynb | ||
# - https://github.com/weiji14/deepicedrain/blob/v0.4.2/atl11_play.ipynb | ||
|
||
# %% [markdown] | ||
# ## About ICESat-2 | ||
# | ||
# The Ice, Cloud, and land Elevation Satellite-2 ([ICESat-2](https://www.nasa.gov/content/goddard/about-icesat-2)) was launched in 2018. | ||
# The Advanced Topographic Laser Altimeter System, or ATLAS, is the only instrument on board. | ||
# ATLAS has a single green laser that is split into six beams, arranged in three pairs. | ||
# The 10,000 laser pulses emitted each second reach the earth and reflect off the surface before returning to the satellite, | ||
# where their travel time is recorded and ultimately used (in combination with information about the satellite's location) | ||
# used to determine the height of the surface they reflected off of. | ||
# | ||
# ### Data Products | ||
# | ||
# The photon travel times collected by ATLAS are ultimately processed into a series of [ICESat-2 data products](https://nsidc.org/data/icesat-2/products). | ||
# The data products are produced with multiple levels of processing, from geolocated photons (ATL03) to gridded time series (e.g. ATL11). | ||
# For this analysis, we use one of the highest level (3B) products: ATL11 Slope-Corrected Land Ice Height Time Series product {cite:p}`ATL11.003`. | ||
# | ||
# ### Data Access | ||
# | ||
# ICESat-2 data access is available from NSIDC through several mechanisms, including for local download and in the cloud. | ||
# A compilation of resources for accessing and working with ICESat-2 data is available in [this resource guide](https://icepyx.readthedocs.io/en/latest/community/resources.html) | ||
# and through the [NSIDC website](https://nsidc.org/data/icesat-2/tools). | ||
# Here we will access data in the cloud by getting the appropriate s3urls using [icepyx](https://icepyx.readthedocs.io/en/latest/), a Python software library and community of ICESat-2 data users, developers, and data managers. | ||
# %% [markdown] | ||
# ## Getting started | ||
# | ||
# These are the tools you’ll need. | ||
|
||
# %% | ||
import datatree | ||
import earthaccess | ||
import icepyx as ipx | ||
import xarray as xr | ||
|
||
# %% [markdown] user_expressions=[] | ||
# Just to make sure we’re on the same page, | ||
# let’s check that we’ve got compatible versions installed. | ||
|
||
# %% | ||
xr.show_versions() | ||
|
||
|
||
# %% [markdown] user_expressions=[] | ||
# ## Cloud access to ICESat-2 ATL11 files | ||
# | ||
# In this book we use [icepyx](https://icepyx.readthedocs.io/en/latest/) for gathering the necessary s3 urls to access ICESat-2 data in the cloud. | ||
# icepyx can also be used (with nearly identical syntax) to download data to your local machine. | ||
# For more details on accessing ICESat-2 data in the cloud, please check out the references below! | ||
# | ||
# References: | ||
# - https://github.com/icesat2py/icepyx/blob/development/doc/source/example_notebooks/IS2_cloud_data_access.ipynb | ||
# - https://nsidc.github.io/earthaccess/tutorials/demo/ | ||
# - https://nasa-openscapes.github.io/earthdata-cloud-cookbook/examples/NSIDC/ICESat2-CMR-AWS-S3.html#data-access-using-aws-s3 | ||
# - https://nsidc.org/data/user-resources/help-center/nasa-earthdata-cloud-data-access-guide | ||
# - https://book.cryointhecloud.com/tutorials/IS2_ATL15_surface_height_anomalies/IS2_ATL15_surface_height_anomalies.html | ||
|
||
# %% [markdown] user_expressions=[] | ||
# ### Providing credentials | ||
# | ||
# Accessing NASA data requires you to have an Earthdata Login. | ||
# You can sign up for one free at https://data.nsidc.earthdatacloud.nasa.gov/ | ||
# and learn more about NASA authentication and managing your credentials [via Earthaccess](https://nsidc.github.io/earthaccess/) and | ||
# [here](https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/04_NASA_Earthdata_Authentication.html#authentication-for-nasa-earthdata). | ||
# | ||
# By obtaining your s3 urls via icepyx, you are also able to authenticate for cloud data access (note: Earthaccess is used under the hood to do this). | ||
# %% | ||
# First we must let icepyx know where (and when) we would like data from. | ||
|
||
short_name = "ATL11" # The data product we would like to query | ||
spatial_extent = [-180.0, -85.0, 180.0, -60.0] # bounding box for Antarctica | ||
date_range = ["2018-09-15", "2023-05-31"] # entire satellite record | ||
# %% | ||
# Setup the Query object | ||
region = ipx.Query(short_name, spatial_extent, date_range) | ||
# %% | ||
region.visualize_spatial_extent() | ||
|
||
# %% | ||
# Get the granule IDs and cloud access urls (note that due to some missing ICESat-2 product metadata, icepyx is still working to provide s3 urls for some products) | ||
gran_ids = region.avail_granules(ids=True, cloud=True) | ||
print(len(gran_ids[0])) | ||
print(gran_ids[0][:10]) | ||
|
||
# %% | ||
# Authenticate using your NASA Earth Data login credentials; enter your user id and password when prompted | ||
region.earthdata_login(s3token=True) | ||
|
||
# %% | ||
# set up our s3 file system using our credentials | ||
fs_s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=region._s3login_credentials) | ||
|
||
# %% [markdown] user_expressions=[] | ||
# ## Loading into xarray | ||
# | ||
# Let's read a single ICESat-2 ATL11 HDF5 file into an `xarray` data structure! | ||
# | ||
# First we'll take a quick look at an example of an ATL11 HDF5 file. | ||
# We'll read it using [`xarray.open_dataset`](https://docs.xarray.dev/en/v2022.11.0/generated/xarray.open_dataset.html). | ||
|
||
# %% | ||
# s3_url = gran_ids[0][3] | ||
s3_url = "s3://nsidc-cumulus-prod-protected/ATLAS/ATL11/005/2019/09/30/ATL11_005411_0315_005_03.h5" | ||
|
||
with fs_s3.open(path=s3_url) as h5file: | ||
ds = xr.open_dataset(h5file, engine="h5netcdf") | ||
ds | ||
|
||
# %% [markdown] | ||
# Hmm, so there are a bunch of attributes, but no data variables. | ||
# This is because the ICESat-2 laser altimeter data is stored in 'groups' per laser. | ||
# | ||
# For ATL11, the 6 lasers have been combined into 3 pair tracks (pt1, pt2, pt3). | ||
# To read the nested data structure, we can either loop over each of these groups, | ||
# and/or use something like [`datatree.open_datatree`](https://xarray-datatree.readthedocs.io/en/latest/generated/datatree.open_datatree.html). | ||
# | ||
# References: | ||
# - https://medium.com/pangeo/easy-ipcc-part-1-multi-model-datatree-469b87cf9114 | ||
|
||
|
||
# %% | ||
with fs_s3.open(path=s3_url) as h5file: | ||
pair_track_dict = {} | ||
for pair_track in ["pt1", "pt2", "pt3"]: | ||
pair_track_dict[pair_track] = xr.open_dataset( | ||
filename_or_obj=h5file, engine="h5netcdf", group=pair_track | ||
) | ||
dt = datatree.DataTree.from_dict(d=pair_track_dict) | ||
dt | ||
|
||
# %% |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,4 +10,4 @@ dependencies: | |
- pyarrow=9.0.0 | ||
- python=3.9 | ||
- s3fs=2022.11.0 | ||
- xarray-datatree=0.0.9 | ||
- xarray-datatree=0.0.11 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JessicaS11, I'm seeing this warning when getting the granule IDs
Assuming that the PR at icesat2py/icepyx#426 fixes this somewhat? Will the full s3 urls (e.g.
s3://nsidc-cumulus-prod-protected/ATLAS/ATL11/005/2019/09/30/ATL11_005411_0315_005_03.h5
) be returned in the list, or just the HDF5 filename (e.g.ATL11_005411_0315_005_03.h5
)? I'm hoping for the full s3 urls (because it's been a pain working out how the ATL11 files are organized in the S3 bucket) 🙏There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using the latest dev version? I think I only noted it in slack, but
Is correct and should fix it entirely (but hasn't been pushed through to a new release yet; I thought I took the warning off but perhaps not?). It returns the full s3 urls (agree on the pain point!), and the rest of this workflow (including reading in with datatree) worked successfully for me. If you're working in CryoCloud, you may need to jump through some extra hoops to use the dev install (so I've made a habit of checking my version at every import).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool, let me try things out on the dev branch, and yes I was working on the CryoCloud 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, using
pip install git+https://github.com/icesat2py/icepyx.git@development
to geticepyx=0.7.1.dev9+g66e2863
returns the full s3 urls! I can't tell you how happy I am not having to manually generate a list of 4000+ urls in a file like thisATL11_to_download.txt
anymore 😃Gonna play with those ATL11 files and xarray-datatree a bit more now, hoping that this
map_over_subtree
method can save me from writing some nasty for-loops.