Skip to content

Commit

Permalink
data handling tutorial added
Browse files Browse the repository at this point in the history
  • Loading branch information
PennyHow committed Nov 6, 2024
1 parent 86791fa commit 64c8d2e
Show file tree
Hide file tree
Showing 9 changed files with 228 additions and 99 deletions.
4 changes: 2 additions & 2 deletions docs/background.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ The **GrIML** processing package is for classifying water bodies from satellite

The **GrIML** post-processing chain follows a linear workflow. Initial rasterised binary classifications denoting water bodies can be inputted to **convert**, **filter** and **merge** into a cohesive ice marginal lake vector dataset, populated with useful **metadata** and analysed with relevant **statistical information**.

<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/other/reporting/figures/griml_workflow_without_gee.png" align="center", width="400">
<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/other/reporting/figures/griml_workflow_without_gee.png?raw=true" align="center", width="400">

Each of these post-processing steps is contained within GrIML's modules, and called in turn to perform the entire processing chain. The `griml()` function invokes all post-processing steps.

<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/other/reporting/figures/griml_package_structure.png" align="center", width="400">
<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/other/reporting/figures/griml_package_structure.png?raw=true" align="center", width="400">


## Project motivation
Expand Down
Binary file added docs/figures/iml_basic_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figures/iml_pt_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figures/iml_time_series_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
184 changes: 181 additions & 3 deletions docs/tutorial-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,189 @@ Each inventory in the inventory series contains the following metadata informati

## Getting started

Loading the dataset: Data available at [GEUS Dataverse](https://doi.org/10.22008/FK2/MBKW9N).
The dataset is available on the [GEUS Dataverse](https://doi.org/10.22008/FK2/MBKW9N), which can be downloaded and unzipped either using wget:

Quicklook plotting of the dataset
```bash
$ wget -r -e robots=off -nH --cut-dirs=3 --content-disposition "https://dataverse.geus.dk/api/datasets/:persistentId/dirindex?persistentId=doi:10.22008/FK2/MBKW9N"
$ unzip dataverse_files.zip
```

Or with Python:

```python
import urllib
import zipfile

# Define url and extraction directory
url = "https://dataverse.geus.dk/api/datasets/:persistentId/dirindex?persistentId=doi:10.22008/FK2/MBKW9N"
extract_dir = "dataverse_files"

# Fetch zipped files
zip_path, _ = urllib.request.urlretrieve(url)

# Unzip files to directory
with zipfile.ZipFile(zip_path, "r") as f:
f.extractall(extract_dir)
```

One of the inventories in the dataset series can be opened and plotted in Python using geopandas. In this example, let's take the 2023 inventory:

```python
import geopandas as gpd
iml = gpd.read_file("dataverse_files/20230101-ESA-GRIML-IML-fv1.shp")
iml.plot(color="red")
```

<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/docs/figures/iml_basic_plot.png?raw=true" align="center", width="100">


```{important}
Make sure that the file path is correct in order to load the dataset correctly
```

Plotting all shapes can be difficult to see without zooming around and exploring the plot. We can dissolve all common lakes and then plot the centroid points of these to get a better overview:

```python
iml_d = iml.dissolve(by="lake_id")
iml_d["centroid"] = iml_d.geometry.centroid
iml_d["centroid"].plot(markersize=0.5)
```

<img src="https://raw.githubusercontent.com/GEUS-Glaciology-and-Climate/GrIML/refs/heads/main/docs/figures/iml_pt_plot.png?raw=true" align="center", width="100">

## Generating statistics

Extracting statistics
We can extract basic statistics from an ice marginal lake inventory in the dataset series using simple querying. Let's take the 2022 inventory in this example and first determine the number of classified lakes, and then the number of unique lakes:

```python
import geopandas as gpd

# Load inventory
iml = gpd.read_file("dataverse_files/20220101-ESA-GRIML-IML-fv1.shp")

# Dissolve by lake id to get all unique lakes as dissolved polygons
iml_d = iml.dissolve(by='lake_id')

# Print counts
print("Total number of detected lakes: " + str(len(iml)))
print("Total number of unique lakes: " + str(len(iml_d)))
```

We can count the number of classifications from each method, where `SAR` denotes classifications from SAR backscatter classification, `VIS` denotes classifications from multi-spectral indices, and `DEM` denotes classifications using sink detection:

```python
# Count lakes by classifications
print(iml['method'].value_counts())
```

Let's say we would like to count the number of ice marginal lakes that share a margin with the Greenland Ice Sheet and broken down by region. We can do this by first extracting all lakes classified by a common margin with the ice sheet (`ICE_SHEET`) and then count all lakes per region:

```python
# Filter to lakes with an ice sheet margin
iml_d_ice_sheet = iml_d[iml_d['margin'] == 'ICE_SHEET']

# Count lakes by region
print(iml_d_ice_sheet['region'].value_counts())
```

We can also determine the average, minimum and maximum lake size per region:

```python
# Calculate surface area of all unique lakes (dissolved)
iml_d['area_sqkm'] = iml_d.geometry.area/10**6

# Group lakes by region and determine average, min, max
print(iml_d.groupby(['region'])['area_sqkm'].mean())
print(iml_d.groupby(['region'])['area_sqkm'].min())
print(iml_d.groupby(['region'])['area_sqkm'].max())
```

## Cross inventory comparison

All inventories in the ice marginal lake inventory series can be treated as time-series to look at change in lake abundance and size over time. Let's take an example where we will generate a time-series of lake abundance change at the margins of Greenland's periphery ice caps and glaciers. First we load all inventories as a series of GeoDataFrames:

```python
import glob
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt

# Define directory
in_dir = 'dataverse_files/*IML-fv1.shp'

# Iterate through inventories
gdfs=[]
for f in list(sorted(glob.glob(in_dir))):

# Load inventory and dissolve by unique identifications
gdf = gpd.read_file(f)
gdf = gdf.dissolve(by='lake_id')
gdf['area_sqkm'] = gdf.geometry.area/10**6

# Append to list
gdfs.append(gdf)
```

Then we count lakes with a shared ice cap/glacier margin from each inventory, splitting counts by region:

```python
# Create empty lists for region counts
b=['NW', 'NO', 'NE', 'CE', 'SE', 'SW', 'CW']
ic_nw=[]
ic_no=[]
ic_ne=[]
ic_ce=[]
ic_se=[]
ic_sw=[]
ic_cw=[]
ice_cap_abun = [ic_nw, ic_no, ic_ne, ic_ce, ic_se, ic_sw, ic_cw]

# Iterate through geodataframes
for g in gdfs:

# Filter by margin type
icecap = g[g['margin'] == 'ICE_CAP']

# Append regional lake counts
for i in range(len(b)):
ice_cap_abun[i].append(icecap['region'].value_counts()[b[i]])
```

We can then plot all of our lake counts as a stacked bar plot:

```python
# Define plotting attributes
years=list(range(2016,2024,1))
col=['#045275', '#089099', '#7CCBA2', '#FCDE9C', '#F0746E', '#DC3977', '#7C1D6F']
bottom=np.zeros(8)

# Prime plotting area
fig, ax = plt.subplots(1, figsize=(10,5))

# Plot lake counts as stacked bar plots
for i in range(len(ice_cap_abun)):
p = ax.bar(years, ice_cap_abun[i], 0.5, color=col[i], label=b[i], bottom=bottom)
bottom += ice_cap_abun[i]
ax.bar_label(p, label_type='center', fontsize=8)

# Add legend
ax.legend(bbox_to_anchor=(1.01,0.7))

# Change plotting aesthetics
ax.set_axisbelow(True)
ax.yaxis.grid(color='gray', linestyle='dashed', linewidth=0.5)
ax.set_facecolor("#f2f2f2")

# Add title
props = dict(boxstyle='round', facecolor='#6CB0D6', alpha=0.3)
ax.text(0.01, 1.05, 'Periphery ice caps/glaciers lake abundance change',
fontsize=14, horizontalalignment='left', bbox=props, transform=ax.transAxes)

# Add axis labels
ax.set_xlabel('Year', fontsize=14)
ax.set_ylabel('Lake abundance', fontsize=14)

# Show plot
plt.show()
```

61 changes: 0 additions & 61 deletions docs/tutorials-data.md

This file was deleted.

23 changes: 9 additions & 14 deletions docs/tutorials-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,10 @@ start='20170701'
end='20170831'
```

Next we need to define the location of our raster file. A test raster file is provided with GrIML, which can be located using the `os` module.
Next we need to define the location of our raster file. A test raster file is provided with GrIML, which can be located using the top level test directory in the repository.

```python
import os
infile = os.path.join(os.path.dirname(griml.__file__),'test/test_north_greenland.tif')
infile = 'test/test_north_greenland.tif'
```

And then the file can be converted from raster to vector classifications using the `convert` function and the input variables.
Expand Down Expand Up @@ -73,9 +72,8 @@ GrIML is provided with a test vector file for filtering, and a test ice mask whi

```python
# Define input files
import os
infile1 = os.path.join(os.path.dirname(griml.__file__),'test/test_filter.shp')
infile2 = os.path.join(os.path.dirname(griml.__file__),'test/test_icemask.shp')
infile1 = 'test/test_filter.shp'
infile2 = 'test/test_icemask.shp'

# Define output directory to write filtered vectors to
outdir = "PATH/TO/OUTPUT_DIRECTORY"
Expand Down Expand Up @@ -110,9 +108,8 @@ filter_vectors(all_infiles, infile2, outdir)
When covering large areas, the classifications are usually split into different files. At this stage, we will merge all files together, to form a complete inventory of ice marginal lake classifications. Test files are provided with GrIML to perform this.

```python
import os
infile1 = os.path.join(os.path.dirname(griml.__file__),'test/test_merge_1.shp')
infile2 = os.path.join(os.path.dirname(griml.__file__),'test/test_merge_2.shp')
infile1 = 'test/test_merge_1.shp'
infile2 = 'test/test_merge_2.shp'
```

The `merge_vectors` function is used to merge all valid vectors within a list of files. In this case, we will merge all vectors from the two files defined previously.
Expand Down Expand Up @@ -149,16 +146,14 @@ Metadata can be added to the inventory with GrIML's `metadata` module. This incl
Input files are needed for assigning a placename and a region to each lake. The placename file is a point vector file containing all placenames for a region. We use the placename database from [Oqaasileriffik](https://oqaasileriffik.gl/en/), the Language Secretariat of Greenland, for which there is an example data subset provided with GrIML. The region file is a polygon vector file containing all regions and their names. We use the Greenland Ice Sheet drainage basin regions as defined by [Mouginot and Rignot, (2019)](https://doi.org/10.7280/D1WT11), a dataset which is provided with GrIML.

```python
import os

# Input test inventory for adding metadata to
infile1 = os.path.join(os.path.dirname(griml.__file__),'test/test_merge_2.shp')
infile1 = 'test/test_merge_2.shp'

# Placenames file
infile2 = os.path.join(os.path.dirname(griml.__file__),'test/test_placenames.shp')
infile2 = 'test/test_placenames.shp'

# Regions file
infile3 = os.path.join(os.path.dirname(griml.__file__),'test/greenland_basins_polarstereo.shp')
infile3 = 'test/greenland_basins_polarstereo.shp'
```

All metadata can be added with the `add_metadata` function.
Expand Down
Loading

0 comments on commit 64c8d2e

Please sign in to comment.