Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚸 ICESat-2 ATL11 pre-processing to analysis ready format #10

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/deploy-book.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ name: deploy-book
# Only run this when the main branch changes
on:
# Uncomment the 'pull_request' line below to manually re-build Jupyter Book
# pull_request:
pull_request:
push:
branches:
- main
Expand Down
3 changes: 2 additions & 1 deletion book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ parts:

- caption: Chapter Three
chapters:
- file: chapters/data
- title: Data Preparation
file: chapters/03_data_prep

- caption: Chapter Four
chapters:
Expand Down
161 changes: 161 additions & 0 deletions book/chapters/03_data_prep.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# -*- coding: utf-8 -*-
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.14.1
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
# name: python3
# ---

# %% [markdown] tags=[] user_expressions=[]
# # Accessing ICESat-2/ATL11 Land Ice Height time-series
#
# In this tutorial, we'll step through a data pipeline on turning point cloud
# time-series data from the ICESat-2 laser altimeter into an analysis ready
# format. By the end of this lesson, you should be able to:
#
# - Access ICESat-2 data in a Hierarchical Data Format (HDF5) from the cloud
# - Construct a data pipeline that combines multiple ICESat-2 tracks and laser
# beams into a flat data structure
# - Store the pre-processed data in a cloud-optimized, analysis ready data
# format
#
# References:
# - https://github.com/weiji14/deepicedrain/blob/v0.4.2/atl06_to_atl11.ipynb
# - https://github.com/weiji14/deepicedrain/blob/v0.4.2/atl11_play.ipynb

# %% [markdown]
# ## About ICESat-2
#
# The Ice, Cloud, and land Elevation Satellite-2 ([ICESat-2](https://www.nasa.gov/content/goddard/about-icesat-2)) was launched in 2018.
# The Advanced Topographic Laser Altimeter System, or ATLAS, is the only instrument on board.
# ATLAS has a single green laser that is split into six beams, arranged in three pairs.
# The 10,000 laser pulses emitted each second reach the earth and reflect off the surface before returning to the satellite,
# where their travel time is recorded and ultimately used (in combination with information about the satellite's location)
# used to determine the height of the surface they reflected off of.
#
# ### Data Products
#
# The photon travel times collected by ATLAS are ultimately processed into a series of [ICESat-2 data products](https://nsidc.org/data/icesat-2/products).
# The data products are produced with multiple levels of processing, from geolocated photons (ATL03) to gridded time series (e.g. ATL11).
# For this analysis, we use one of the highest level (3B) products: ATL11 Slope-Corrected Land Ice Height Time Series product {cite:p}`ATL11.003`.
#
# ### Data Access
#
# ICESat-2 data access is available from NSIDC through several mechanisms, including for local download and in the cloud.
# A compilation of resources for accessing and working with ICESat-2 data is available in [this resource guide](https://icepyx.readthedocs.io/en/latest/community/resources.html)
# and through the [NSIDC website](https://nsidc.org/data/icesat-2/tools).
# Here we will access data in the cloud by getting the appropriate s3urls using [icepyx](https://icepyx.readthedocs.io/en/latest/), a Python software library and community of ICESat-2 data users, developers, and data managers.
# %% [markdown]
# ## Getting started
#
# These are the tools you’ll need.

# %%
import datatree
import earthaccess
import icepyx as ipx
import xarray as xr

# %% [markdown] user_expressions=[]
# Just to make sure we’re on the same page,
# let’s check that we’ve got compatible versions installed.

# %%
xr.show_versions()


# %% [markdown] user_expressions=[]
# ## Cloud access to ICESat-2 ATL11 files
#
# In this book we use [icepyx](https://icepyx.readthedocs.io/en/latest/) for gathering the necessary s3 urls to access ICESat-2 data in the cloud.
# icepyx can also be used (with nearly identical syntax) to download data to your local machine.
# For more details on accessing ICESat-2 data in the cloud, please check out the references below!
#
# References:
# - https://github.com/icesat2py/icepyx/blob/development/doc/source/example_notebooks/IS2_cloud_data_access.ipynb
# - https://nsidc.github.io/earthaccess/tutorials/demo/
# - https://nasa-openscapes.github.io/earthdata-cloud-cookbook/examples/NSIDC/ICESat2-CMR-AWS-S3.html#data-access-using-aws-s3
# - https://nsidc.org/data/user-resources/help-center/nasa-earthdata-cloud-data-access-guide
# - https://book.cryointhecloud.com/tutorials/IS2_ATL15_surface_height_anomalies/IS2_ATL15_surface_height_anomalies.html

# %% [markdown] user_expressions=[]
# ### Providing credentials
#
# Accessing NASA data requires you to have an Earthdata Login.
# You can sign up for one free at https://data.nsidc.earthdatacloud.nasa.gov/
# and learn more about NASA authentication and managing your credentials [via Earthaccess](https://nsidc.github.io/earthaccess/) and
# [here](https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/04_NASA_Earthdata_Authentication.html#authentication-for-nasa-earthdata).
#
# By obtaining your s3 urls via icepyx, you are also able to authenticate for cloud data access (note: Earthaccess is used under the hood to do this).
# %%
# First we must let icepyx know where (and when) we would like data from.

short_name = "ATL11" # The data product we would like to query
spatial_extent = [-180.0, -85.0, 180.0, -60.0] # bounding box for Antarctica
date_range = ["2018-09-15", "2023-05-31"] # entire satellite record
# %%
# Setup the Query object
region = ipx.Query(short_name, spatial_extent, date_range)
# %%
region.visualize_spatial_extent()

# %%
# Get the granule IDs and cloud access urls (note that due to some missing ICESat-2 product metadata, icepyx is still working to provide s3 urls for some products)
gran_ids = region.avail_granules(ids=True, cloud=True)
print(len(gran_ids[0]))
print(gran_ids[0][:10])
Comment on lines +111 to +113
Copy link
Member Author

@weiji14 weiji14 Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JessicaS11, I'm seeing this warning when getting the granule IDs

/srv/conda/envs/notebook/lib/python3.10/site-packages/icepyx/core/granules.py:86: UserWarning: We are still working in implementing ID generation for this data product.
  warnings.warn("We are still working in implementing ID generation for this data product.", UserWarning)

Assuming that the PR at icesat2py/icepyx#426 fixes this somewhat? Will the full s3 urls (e.g. s3://nsidc-cumulus-prod-protected/ATLAS/ATL11/005/2019/09/30/ATL11_005411_0315_005_03.h5) be returned in the list, or just the HDF5 filename (e.g. ATL11_005411_0315_005_03.h5)? I'm hoping for the full s3 urls (because it's been a pain working out how the ATL11 files are organized in the S3 bucket) 🙏

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing this warning when getting the granule IDs

Are you using the latest dev version? I think I only noted it in slack, but

Assuming that the PR at icesat2py/icepyx#426 fixes this somewhat?

Is correct and should fix it entirely (but hasn't been pushed through to a new release yet; I thought I took the warning off but perhaps not?). It returns the full s3 urls (agree on the pain point!), and the rest of this workflow (including reading in with datatree) worked successfully for me. If you're working in CryoCloud, you may need to jump through some extra hoops to use the dev install (so I've made a habit of checking my version at every import).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, let me try things out on the dev branch, and yes I was working on the CryoCloud 😃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, using pip install git+https://github.com/icesat2py/icepyx.git@development to get icepyx=0.7.1.dev9+g66e2863 returns the full s3 urls! I can't tell you how happy I am not having to manually generate a list of 4000+ urls in a file like this ATL11_to_download.txt anymore 😃

Gonna play with those ATL11 files and xarray-datatree a bit more now, hoping that this map_over_subtree method can save me from writing some nasty for-loops.


# %%
# Authenticate using your NASA Earth Data login credentials; enter your user id and password when prompted
region.earthdata_login(s3token=True)

# %%
# set up our s3 file system using our credentials
fs_s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=region._s3login_credentials)

# %% [markdown] user_expressions=[]
# ## Loading into xarray
#
# Let's read a single ICESat-2 ATL11 HDF5 file into an `xarray` data structure!
#
# First we'll take a quick look at an example of an ATL11 HDF5 file.
# We'll read it using [`xarray.open_dataset`](https://docs.xarray.dev/en/v2022.11.0/generated/xarray.open_dataset.html).

# %%
# s3_url = gran_ids[0][3]
s3_url = "s3://nsidc-cumulus-prod-protected/ATLAS/ATL11/005/2019/09/30/ATL11_005411_0315_005_03.h5"

with fs_s3.open(path=s3_url) as h5file:
ds = xr.open_dataset(h5file, engine="h5netcdf")
ds

# %% [markdown]
# Hmm, so there are a bunch of attributes, but no data variables.
# This is because the ICESat-2 laser altimeter data is stored in 'groups' per laser.
#
# For ATL11, the 6 lasers have been combined into 3 pair tracks (pt1, pt2, pt3).
# To read the nested data structure, we can either loop over each of these groups,
# and/or use something like [`datatree.open_datatree`](https://xarray-datatree.readthedocs.io/en/latest/generated/datatree.open_datatree.html).
#
# References:
# - https://medium.com/pangeo/easy-ipcc-part-1-multi-model-datatree-469b87cf9114


# %%
with fs_s3.open(path=s3_url) as h5file:
pair_track_dict = {}
for pair_track in ["pt1", "pt2", "pt3"]:
pair_track_dict[pair_track] = xr.open_dataset(
filename_or_obj=h5file, engine="h5netcdf", group=pair_track
)
dt = datatree.DataTree.from_dict(d=pair_track_dict)
dt

# %%
42 changes: 0 additions & 42 deletions book/chapters/data.ipynb

This file was deleted.

7 changes: 3 additions & 4 deletions book/chapters/motivation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,10 @@
"\n",
"### Data\n",
"\n",
"Basics of IS2?\n",
"\n",
"For this analysis, we will use the ATLAS/ICESat-2 ATL11 Slope-Corrected Land Ice Height Time Series product {cite:p}`ATL11.003.\n",
"For this analysis, we will use the ATLAS/ICESat-2 ATL11 Slope-Corrected Land Ice Height Time Series product {cite:p}`ATL11.003`.\n",
"This data product provides a time series of land-ice surface heights.\n",
"It is a spatially organized and relatively compact product with height-change information derived from ICESat-2 observations.\n",
"More information on the ICESat-2 mission and data products are included in the [Data Preparation Chapter](./03_data_prep.py).\n",
"\n",
"### Challenges\n",
"\n",
Expand Down Expand Up @@ -72,7 +71,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:23:07) [MSC v.1927 32 bit (Intel)]"
"version": "3.7.3"
},
"vscode": {
"interpreter": {
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ dependencies:
- pyarrow=9.0.0
- python=3.9
- s3fs=2022.11.0
- xarray-datatree=0.0.9
- xarray-datatree=0.0.11