data handling tutorial added

GEUS-Glaciology-and-Climate · Nov 6, 2024 · 64c8d2e · 64c8d2e
1 parent 86791fa
commit 64c8d2e
Show file tree

Hide file tree

Showing 9 changed files with 228 additions and 99 deletions.
diff --git a/docs/background.md b/docs/background.md
@@ -8,11 +8,11 @@ The **GrIML** processing package is for classifying water bodies from satellite
 
 The **GrIML** post-processing chain follows a linear workflow. Initial rasterised binary classifications denoting water bodies can be inputted to **convert**, **filter** and **merge** into a cohesive ice marginal lake vector dataset, populated with useful **metadata** and analysed with relevant **statistical information**.
 
-<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/other/reporting/figures/griml_workflow_without_gee.png" align="center", width="400">
+<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/other/reporting/figures/griml_workflow_without_gee.png?raw=true" align="center", width="400">
 
 Each of these post-processing steps is contained within GrIML's modules, and called in turn to perform the entire processing chain. The `griml()` function invokes all post-processing steps.
 
-<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/other/reporting/figures/griml_package_structure.png" align="center", width="400">
+<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/other/reporting/figures/griml_package_structure.png?raw=true" align="center", width="400">
 
 
 ## Project motivation

diff --git a/docs/figures/iml_basic_plot.png b/docs/figures/iml_basic_plot.png
diff --git a/docs/figures/iml_pt_plot.png b/docs/figures/iml_pt_plot.png
diff --git a/docs/figures/iml_time_series_plot.png b/docs/figures/iml_time_series_plot.png
diff --git a/docs/tutorial-data.md b/docs/tutorial-data.md
@@ -51,11 +51,189 @@ Each inventory in the inventory series contains the following metadata informati
 
 ## Getting started
 
-Loading the dataset: Data available at [GEUS Dataverse](https://doi.org/10.22008/FK2/MBKW9N).
+The dataset is available on the [GEUS Dataverse](https://doi.org/10.22008/FK2/MBKW9N), which can be downloaded and unzipped either using wget:
 
-Quicklook plotting of the dataset
+```bash
+$ wget -r -e robots=off -nH --cut-dirs=3 --content-disposition "https://dataverse.geus.dk/api/datasets/:persistentId/dirindex?persistentId=doi:10.22008/FK2/MBKW9N"
+$ unzip dataverse_files.zip
+```
 
+Or with Python:
 
+```python
+import urllib
+import zipfile
+
+# Define url and extraction directory
+url = "https://dataverse.geus.dk/api/datasets/:persistentId/dirindex?persistentId=doi:10.22008/FK2/MBKW9N"
+extract_dir = "dataverse_files"
+
+# Fetch zipped files
+zip_path, _ = urllib.request.urlretrieve(url)
+
+# Unzip files to directory
+with zipfile.ZipFile(zip_path, "r") as f:
+    f.extractall(extract_dir)
+```
+
+One of the inventories in the dataset series can be opened and plotted in Python using geopandas. In this example, let's take the 2023 inventory:
+
+```python
+import geopandas as gpd
+iml = gpd.read_file("dataverse_files/20230101-ESA-GRIML-IML-fv1.shp")
+iml.plot(color="red")
+```
+
+<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/docs/figures/iml_basic_plot.png?raw=true" align="center", width="100">
+
+
+```{important}
+Make sure that the file path is correct in order to load the dataset correctly
+```
+
+Plotting all shapes can be difficult to see without zooming around and exploring the plot. We can dissolve all common lakes and then plot the centroid points of these to get a better overview:
+
+```python    
+iml_d = iml.dissolve(by="lake_id")
+iml_d["centroid"] = iml_d.geometry.centroid
+iml_d["centroid"].plot(markersize=0.5)
+```
+
+<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/docs/figures/iml_pt_plot.png?raw=true" align="center", width="100">
+
 ## Generating statistics
 
-Extracting statistics
+We can extract basic statistics from an ice marginal lake inventory in the dataset series using simple querying. Let's take the 2022 inventory in this example and first determine the number of classified lakes, and then the number of unique lakes:
+
+```python
+import geopandas as gpd
+
+# Load inventory 
+iml = gpd.read_file("dataverse_files/20220101-ESA-GRIML-IML-fv1.shp")
+
+# Dissolve by lake id to get all unique lakes as dissolved polygons
+iml_d = iml.dissolve(by='lake_id')
+
+# Print counts
+print("Total number of detected lakes: " + str(len(iml)))
+print("Total number of unique lakes: " + str(len(iml_d)))
+```
+
+We can count the number of classifications from each method, where `SAR` denotes classifications from SAR backscatter classification, `VIS` denotes classifications from multi-spectral indices, and `DEM` denotes classifications using sink detection:
+
+```python
+# Count lakes by classifications
+print(iml['method'].value_counts())
+```
+
+Let's say we would like to count the number of ice marginal lakes that share a margin with the Greenland Ice Sheet and broken down by region. We can do this by first extracting all lakes classified by a common margin with the ice sheet (`ICE_SHEET`) and then count all lakes per region:
+
+```python
+# Filter to lakes with an ice sheet margin
+iml_d_ice_sheet = iml_d[iml_d['margin'] == 'ICE_SHEET']
+
+# Count lakes by region
+print(iml_d_ice_sheet['region'].value_counts())
+```
+
+We can also determine the average, minimum and maximum lake size per region:
+
+```python
+# Calculate surface area of all unique lakes (dissolved)
+iml_d['area_sqkm'] = iml_d.geometry.area/10**6 
+
+# Group lakes by region and determine average, min, max
+print(iml_d.groupby(['region'])['area_sqkm'].mean())
+print(iml_d.groupby(['region'])['area_sqkm'].min())
+print(iml_d.groupby(['region'])['area_sqkm'].max())
+```
+
+## Cross inventory comparison
+
+All inventories in the ice marginal lake inventory series can be treated as time-series to look at change in lake abundance and size over time. Let's take an example where we will generate a time-series of lake abundance change at the margins of Greenland's periphery ice caps and glaciers. First we load all inventories as a series of GeoDataFrames:
+
+```python
+import glob
+import numpy as np
+import geopandas as gpd
+import matplotlib.pyplot as plt
+
+# Define directory
+in_dir = 'dataverse_files/*IML-fv1.shp'
+
+# Iterate through inventories
+gdfs=[]
+for f in list(sorted(glob.glob(in_dir))):
+
+    # Load inventory and dissolve by unique identifications
+    gdf = gpd.read_file(f)
+    gdf = gdf.dissolve(by='lake_id')
+    gdf['area_sqkm'] = gdf.geometry.area/10**6 
+
+    # Append to list
+    gdfs.append(gdf)
+```
+
+Then we count lakes with a shared ice cap/glacier margin from each inventory, splitting counts by region:
+
+```python
+# Create empty lists for region counts
+b=['NW', 'NO', 'NE', 'CE', 'SE', 'SW', 'CW']
+ic_nw=[]
+ic_no=[]
+ic_ne=[]
+ic_ce=[]
+ic_se=[]
+ic_sw=[]
+ic_cw=[]
+ice_cap_abun = [ic_nw, ic_no, ic_ne, ic_ce, ic_se, ic_sw, ic_cw]
+
+# Iterate through geodataframes
+for g in gdfs:
+
+    # Filter by margin type
+    icecap = g[g['margin'] == 'ICE_CAP']
+
+    # Append regional lake counts
+    for i in range(len(b)):
+        ice_cap_abun[i].append(icecap['region'].value_counts()[b[i]])
+```
+
+We can then plot all of our lake counts as a stacked bar plot:
+
+```python
+# Define plotting attributes
+years=list(range(2016,2024,1))
+col=['#045275', '#089099', '#7CCBA2', '#FCDE9C', '#F0746E', '#DC3977', '#7C1D6F']
+bottom=np.zeros(8)
+
+# Prime plotting area
+fig, ax = plt.subplots(1, figsize=(10,5))
+
+# Plot lake counts as stacked bar plots
+for i in range(len(ice_cap_abun)):
+    p = ax.bar(years, ice_cap_abun[i], 0.5, color=col[i], label=b[i], bottom=bottom)
+    bottom += ice_cap_abun[i]
+    ax.bar_label(p, label_type='center', fontsize=8)
+
+# Add legend
+ax.legend(bbox_to_anchor=(1.01,0.7))
+
+# Change plotting aesthetics
+ax.set_axisbelow(True)
+ax.yaxis.grid(color='gray', linestyle='dashed', linewidth=0.5)
+ax.set_facecolor("#f2f2f2")
+
+# Add title
+props = dict(boxstyle='round', facecolor='#6CB0D6', alpha=0.3)
+ax.text(0.01, 1.05, 'Periphery ice caps/glaciers lake abundance change', 
+         fontsize=14, horizontalalignment='left', bbox=props, transform=ax.transAxes)
+
+# Add axis labels
+ax.set_xlabel('Year', fontsize=14)
+ax.set_ylabel('Lake abundance', fontsize=14)
+
+# Show plot
+plt.show()
+```
+
diff --git a/docs/tutorials-data.md b/docs/tutorials-data.md
diff --git a/docs/tutorials-processing.md b/docs/tutorials-processing.md
@@ -33,11 +33,10 @@ start='20170701'
 end='20170831'
 ```
 
-Next we need to define the location of our raster file. A test raster file is provided with GrIML, which can be located using the `os` module.
+Next we need to define the location of our raster file. A test raster file is provided with GrIML, which can be located using the top level test directory in the repository.
 
 ```python
-import os
-infile = os.path.join(os.path.dirname(griml.__file__),'test/test_north_greenland.tif')
+infile = 'test/test_north_greenland.tif'
 ```
 
 And then the file can be converted from raster to vector classifications using the `convert` function and the input variables.
@@ -73,9 +72,8 @@ GrIML is provided with a test vector file for filtering, and a test ice mask whi
 
 ```python
 # Define input files
-import os
-infile1 = os.path.join(os.path.dirname(griml.__file__),'test/test_filter.shp') 
-infile2 = os.path.join(os.path.dirname(griml.__file__),'test/test_icemask.shp')  
+infile1 = 'test/test_filter.shp'
+infile2 = 'test/test_icemask.shp'
 
 # Define output directory to write filtered vectors to
 outdir = "PATH/TO/OUTPUT_DIRECTORY"    
@@ -110,9 +108,8 @@ filter_vectors(all_infiles, infile2, outdir)
 When covering large areas, the classifications are usually split into different files. At this stage, we will merge all files together, to form a complete inventory of ice marginal lake classifications. Test files are provided with GrIML to perform this.
 
 ```python
-import os
-infile1 = os.path.join(os.path.dirname(griml.__file__),'test/test_merge_1.shp')  
-infile2 = os.path.join(os.path.dirname(griml.__file__),'test/test_merge_2.shp')                  
+infile1 = 'test/test_merge_1.shp'
+infile2 = 'test/test_merge_2.shp'               
 ```
 
 The `merge_vectors` function is used to merge all valid vectors within a list of files. In this case, we will merge all vectors from the two files defined previously. 
@@ -149,16 +146,14 @@ Metadata can be added to the inventory with GrIML's `metadata` module. This incl
 Input files are needed for assigning a placename and a region to each lake. The placename file is a point vector file containing all placenames for a region. We use the placename database from [Oqaasileriffik](https://oqaasileriffik.gl/en/), the Language Secretariat of Greenland, for which there is an example data subset provided with GrIML. The region file is a polygon vector file containing all regions and their names. We use the Greenland Ice Sheet drainage basin regions as defined by [Mouginot and Rignot, (2019)](https://doi.org/10.7280/D1WT11), a dataset which is provided with GrIML. 
 
 ```python
-import os
-
 # Input test inventory for adding metadata to
-infile1 = os.path.join(os.path.dirname(griml.__file__),'test/test_merge_2.shp') 
+infile1 = 'test/test_merge_2.shp'
 
 # Placenames file            
-infile2 = os.path.join(os.path.dirname(griml.__file__),'test/test_placenames.shp')   
+infile2 = 'test/test_placenames.shp' 
 
 # Regions file           
-infile3 = os.path.join(os.path.dirname(griml.__file__),'test/greenland_basins_polarstereo.shp') 
+infile3 = 'test/greenland_basins_polarstereo.shp'
 ```
 
 All metadata can be added with the `add_metadata` function.