From 5bc57b0be7a35272726adae97ff240259acf91a0 Mon Sep 17 00:00:00 2001 From: Darius Couchard Date: Tue, 24 Sep 2024 09:12:29 +0200 Subject: [PATCH 1/8] Started doc --- .gitignore | 3 ++- documentation/_quarto.yaml | 3 +++ documentation/introduction.ipynb | 23 +++++++++++++++++++++++ 3 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 documentation/_quarto.yaml create mode 100644 documentation/introduction.ipynb diff --git a/.gitignore b/.gitignore index 6d012c8..d2013b7 100644 --- a/.gitignore +++ b/.gitignore @@ -162,4 +162,5 @@ cython_debug/ tests/test_openeo_gfmap/results/ notebook/ -data/ \ No newline at end of file +data/ +docs/ diff --git a/documentation/_quarto.yaml b/documentation/_quarto.yaml new file mode 100644 index 0000000..e0e0f48 --- /dev/null +++ b/documentation/_quarto.yaml @@ -0,0 +1,3 @@ +project: + type: website +output-dir: ../docs \ No newline at end of file diff --git a/documentation/introduction.ipynb b/documentation/introduction.ipynb new file mode 100644 index 0000000..49eb15a --- /dev/null +++ b/documentation/introduction.ipynb @@ -0,0 +1,23 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# OpenEO Geneal Framework for Mapping" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 0f5cbab2c4e1b118d320429a074ca6f73246a275 Mon Sep 17 00:00:00 2001 From: Darius Couchard Date: Tue, 24 Sep 2024 12:40:03 +0200 Subject: [PATCH 2/8] Wrote introduction --- documentation/features.qmd | 0 documentation/fetching.qmd | 1 + documentation/inference.qmd | 0 documentation/introduction.ipynb | 23 ---------------- documentation/introduction.qmd | 46 ++++++++++++++++++++++++++++++++ documentation/managers.qmd | 0 documentation/preprocessing.qmd | 0 7 files changed, 47 insertions(+), 23 deletions(-) create mode 100644 documentation/features.qmd create mode 100644 documentation/fetching.qmd create mode 100644 documentation/inference.qmd delete mode 100644 documentation/introduction.ipynb create mode 100644 documentation/introduction.qmd create mode 100644 documentation/managers.qmd create mode 100644 documentation/preprocessing.qmd diff --git a/documentation/features.qmd b/documentation/features.qmd new file mode 100644 index 0000000..e69de29 diff --git a/documentation/fetching.qmd b/documentation/fetching.qmd new file mode 100644 index 0000000..5c23e59 --- /dev/null +++ b/documentation/fetching.qmd @@ -0,0 +1 @@ +# Data Fetching diff --git a/documentation/inference.qmd b/documentation/inference.qmd new file mode 100644 index 0000000..e69de29 diff --git a/documentation/introduction.ipynb b/documentation/introduction.ipynb deleted file mode 100644 index 49eb15a..0000000 --- a/documentation/introduction.ipynb +++ /dev/null @@ -1,23 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# OpenEO Geneal Framework for Mapping" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "language_info": { - "name": "python" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/documentation/introduction.qmd b/documentation/introduction.qmd new file mode 100644 index 0000000..b02123a --- /dev/null +++ b/documentation/introduction.qmd @@ -0,0 +1,46 @@ +# OpenEO General Framework for Mapping + +## Introduction + +OpenEO GFMAP is a centralized repository of tools and utilities to build mapping pipelines with OpenEO. It is designed with four core principles in mind: + +1. Backend agnostic: The users shouldn’t have to take care of backend related configurations. The use of OpenEO can vary depending on the backend that is currently in use (for example, the name of data collections). The framework takes care of those differences, while the users only specify the backend they desire to use. + +2. Data consistent: providing a common pipeline for training and for inference. The best way of making sure data is processed the same way during the construction of a training dataset than during inference, is to re-use as much as code as possible. The users should be able to extract and preprocess training data with the same configuration. OpenEO leaves the possibility to perform data extraction on sparse points/polygons or directly on dense datacubes. This leaves the possibility to implement a framework that could do both tile-based inference and pixel-based or parcel-based data extraction/preprocessing using the same code. + +3. Easy and Collaborative: Pre-implementing common preprocessing/postprocessing routines. Many operations, such a compositing or linear interpolation, are very common within Remote Sensing applications and should be already implemented in the framework. This will avoid code duplication among the personal code of the framework’s users and encourage collaboration for improvement and optimization of existing techniques. + +4. Performant: Leverage OpenEO processes as much as possible for preprocessing. In the cropclass and worldcereal projects, preprocessing is performed with a combination of OpenEO processes (masking, compositing, linear interpolation) and the implementation of a Feature Extractor within an UDF (computing indices, percentiles). Ideally, OpenEO features should be used as much as possible, while the features extractor should be as simple as possible, only implementing what is currently not possible within OpenEO. + + +## How can I use GFMAP for my mapping project. + +Nowadays, mapping projects use Machine Learning moders to perform predictions. From large data catalogues (Sentinel2, Sentinel1, ...) data is being processed and then used to perform training or inference. + +One of the core difficulties lies in the workflow difference between training and inference. For example, a Ground Truth dataset could be based of sparse points scattered aroud a region, while the inference objective is to produce continuous raster maps. The data used for the training could be extracted and collected into a DataFrame, while GeoTiff or NetCDF files should be used for the input/outputs of the inference workflow. Very often, both workflows are distributed around different members of a team: while the data scientist will concentrate on the training of the model, the data engineer needs to design an efficient inference workflow to produce global maps. + +It is therefore crucial to ensure that both workflows go through as many common code paths as possible. In this way, both pipelines can process the different data types the same way. This reflects the Data Consistent core principle of GFMAP. + +Another aspect is the scalability. While OpenEO supports processing over large areas, there are limitations in how big an extent should be for every job. OpenEO GFMAP provides utilities to split, manage and archive thousands of jobs at the same time. For example, croptype data of WorldCereal consits of hundres of thousands of points scattered around the globe. In order to perform efficient extraction, GFMAP is used to split the datasets in sub-datasets based on a specific extent, then the Job Manager of GFMAP runs the jobs, tracks their status and automatically manages their assets and their STAC metadata once the job is finished. + +For a training workflow the Data Fetching, Preprocessing, Features and Job Manager sections are worth a read. + +For a inference workflow all the above as well as the Inference features are interesting. + +![OpenEO GFMAP Core Worklow](../worklow.png) + +## Sections + +This documentation covers the different building blocks that an user can use on OpenEO GFMAP: + +* [Data Fetching](#fetching): Allows to fetch multiple data sources and performs source specific preprocessing (backscatter computation for Sentinel1, resolution resampling, ...). +* Preprocessing: Offers preprocessing functionalities common to all data sources. +* Features: Assists in the creation of an UDF to compute features. Reducing the `(bands, time)` dimension into `features` (running PCA, Foundation Models, Vegetation indices...). +* Inference: Assists in the creation of an UDF to run a classification/regression model to transform the `features` into predictions. +* Managers: Split large tasks into multiple jobs, tracks the jobs, manage their STAC metadata and their assets. +* Utils: Miscalleneous utilities (choosing the best Sentinel1 orbit, managing STAC catalogues, working with NetCDF files...) + +## Related projects + +The following projects were build using OpenEO GFMAP: +* WorldCereal: An user-friendly global croptype mapping framework. [(Public GitHub project)](https://github.com/WorldCereal/worldcereal-classification/) diff --git a/documentation/managers.qmd b/documentation/managers.qmd new file mode 100644 index 0000000..e69de29 diff --git a/documentation/preprocessing.qmd b/documentation/preprocessing.qmd new file mode 100644 index 0000000..e69de29 From ecd9e3a177968c9ee66e84c1e77bcc23e2c26d02 Mon Sep 17 00:00:00 2001 From: Darius Couchard Date: Tue, 24 Sep 2024 13:00:37 +0200 Subject: [PATCH 3/8] Formatting --- documentation/_quarto.yaml | 24 +++++++++++++++++++++++- documentation/introduction.qmd | 25 +++++++++++++++---------- documentation/utilities.qmd | 0 3 files changed, 38 insertions(+), 11 deletions(-) create mode 100644 documentation/utilities.qmd diff --git a/documentation/_quarto.yaml b/documentation/_quarto.yaml index e0e0f48..dd809e1 100644 --- a/documentation/_quarto.yaml +++ b/documentation/_quarto.yaml @@ -1,3 +1,25 @@ project: type: website -output-dir: ../docs \ No newline at end of file +output-dir: ../docs +resources: + - "data/**" +navbar: + title: "GFMAP" + right: + - text: "Introduction" + href: introduction.qmd + - text: "Sections" + dropdown: + - text: "Data Fetching" + href: fetching.qmd + - text: "Preprocessing" + href: preprocessing.qmd + - text: "Features" + href: features.qmd + - text: "Inference" + href: inference.qmd + - text: "Managers" + href: managers.qmd + - text: "Utils" + href: utilities.qmd + \ No newline at end of file diff --git a/documentation/introduction.qmd b/documentation/introduction.qmd index b02123a..3065c02 100644 --- a/documentation/introduction.qmd +++ b/documentation/introduction.qmd @@ -1,16 +1,20 @@ +--- +title: "GFMAP - Introduction" +--- + # OpenEO General Framework for Mapping ## Introduction OpenEO GFMAP is a centralized repository of tools and utilities to build mapping pipelines with OpenEO. It is designed with four core principles in mind: -1. Backend agnostic: The users shouldn’t have to take care of backend related configurations. The use of OpenEO can vary depending on the backend that is currently in use (for example, the name of data collections). The framework takes care of those differences, while the users only specify the backend they desire to use. +1. Backend agnostic: The users shouldn’t have to take care of backend related configurations. The use of OpenEO can vary depending on the backend that is currently in use (for example, the name of data collections). The framework takes care of those differences, while the users only specify the backend they desire to use. -2. Data consistent: providing a common pipeline for training and for inference. The best way of making sure data is processed the same way during the construction of a training dataset than during inference, is to re-use as much as code as possible. The users should be able to extract and preprocess training data with the same configuration. OpenEO leaves the possibility to perform data extraction on sparse points/polygons or directly on dense datacubes. This leaves the possibility to implement a framework that could do both tile-based inference and pixel-based or parcel-based data extraction/preprocessing using the same code. +2. Data consistent: providing a common pipeline for training and for inference. The best way of making sure data is processed the same way during the construction of a training dataset than during inference, is to re-use as much as code as possible. The users should be able to extract and preprocess training data with the same configuration. OpenEO leaves the possibility to perform data extraction on sparse points/polygons or directly on dense datacubes. This leaves the possibility to implement a framework that could do both tile-based inference and pixel-based or parcel-based data extraction/preprocessing using the same code. -3. Easy and Collaborative: Pre-implementing common preprocessing/postprocessing routines. Many operations, such a compositing or linear interpolation, are very common within Remote Sensing applications and should be already implemented in the framework. This will avoid code duplication among the personal code of the framework’s users and encourage collaboration for improvement and optimization of existing techniques. +3. Easy and Collaborative: Pre-implementing common preprocessing/postprocessing routines. Many operations, such a compositing or linear interpolation, are very common within Remote Sensing applications and should be already implemented in the framework. This will avoid code duplication among the personal code of the framework’s users and encourage collaboration for improvement and optimization of existing techniques. -4. Performant: Leverage OpenEO processes as much as possible for preprocessing. In the cropclass and worldcereal projects, preprocessing is performed with a combination of OpenEO processes (masking, compositing, linear interpolation) and the implementation of a Feature Extractor within an UDF (computing indices, percentiles). Ideally, OpenEO features should be used as much as possible, while the features extractor should be as simple as possible, only implementing what is currently not possible within OpenEO. +4. Performant: Leverage OpenEO processes as much as possible for preprocessing. In the cropclass and worldcereal projects, preprocessing is performed with a combination of OpenEO processes (masking, compositing, linear interpolation) and the implementation of a Feature Extractor within an UDF (computing indices, percentiles). Ideally, OpenEO features should be used as much as possible, while the features extractor should be as simple as possible, only implementing what is currently not possible within OpenEO. ## How can I use GFMAP for my mapping project. @@ -21,20 +25,20 @@ One of the core difficulties lies in the workflow difference between training an It is therefore crucial to ensure that both workflows go through as many common code paths as possible. In this way, both pipelines can process the different data types the same way. This reflects the Data Consistent core principle of GFMAP. -Another aspect is the scalability. While OpenEO supports processing over large areas, there are limitations in how big an extent should be for every job. OpenEO GFMAP provides utilities to split, manage and archive thousands of jobs at the same time. For example, croptype data of WorldCereal consits of hundres of thousands of points scattered around the globe. In order to perform efficient extraction, GFMAP is used to split the datasets in sub-datasets based on a specific extent, then the Job Manager of GFMAP runs the jobs, tracks their status and automatically manages their assets and their STAC metadata once the job is finished. +Another aspect is the scalability: While OpenEO supports processing over large areas, there are limitations in how big an extent should be for every job. OpenEO GFMAP provides utilities to split, manage and archive thousands of jobs at the same time. For example, croptype data of WorldCereal consits of hundres of thousands of points scattered around the globe. In order to perform efficient extraction, GFMAP is used to split the datasets in sub-datasets based on a specific extent, then the Job Manager of GFMAP runs the jobs, tracks their status and automatically manages their assets and their STAC metadata once the job is finished. -For a training workflow the Data Fetching, Preprocessing, Features and Job Manager sections are worth a read. +For a training workflow the Data Fetching, Preprocessing, Features and Job Manager sections are worth a read. -For a inference workflow all the above as well as the Inference features are interesting. +For a inference workflow all the above as well as the Inference features are interesting. -![OpenEO GFMAP Core Worklow](../worklow.png) +![OpenEO GFMAP Core Worklow](worklow.png) ## Sections This documentation covers the different building blocks that an user can use on OpenEO GFMAP: -* [Data Fetching](#fetching): Allows to fetch multiple data sources and performs source specific preprocessing (backscatter computation for Sentinel1, resolution resampling, ...). -* Preprocessing: Offers preprocessing functionalities common to all data sources. +* [Data Fetching](fetching.qmd): Allows to fetch multiple data sources and performs source specific preprocessing (backscatter computation for Sentinel1, resolution resampling, ...). +* [Preprocessing](preprocessing.qmd): Offers preprocessing functionalities common to all data sources. * Features: Assists in the creation of an UDF to compute features. Reducing the `(bands, time)` dimension into `features` (running PCA, Foundation Models, Vegetation indices...). * Inference: Assists in the creation of an UDF to run a classification/regression model to transform the `features` into predictions. * Managers: Split large tasks into multiple jobs, tracks the jobs, manage their STAC metadata and their assets. @@ -43,4 +47,5 @@ This documentation covers the different building blocks that an user can use on ## Related projects The following projects were build using OpenEO GFMAP: + * WorldCereal: An user-friendly global croptype mapping framework. [(Public GitHub project)](https://github.com/WorldCereal/worldcereal-classification/) diff --git a/documentation/utilities.qmd b/documentation/utilities.qmd new file mode 100644 index 0000000..e69de29 From 9ef2c03bd6e79c169898fa17bf4a0e17e30e366d Mon Sep 17 00:00:00 2001 From: Darius Couchard Date: Tue, 24 Sep 2024 13:05:43 +0200 Subject: [PATCH 4/8] Fixing navbar --- documentation/_quarto.yaml | 39 +++++++++++++++++----------------- documentation/introduction.qmd | 6 ++---- 2 files changed, 22 insertions(+), 23 deletions(-) diff --git a/documentation/_quarto.yaml b/documentation/_quarto.yaml index dd809e1..5e79ee0 100644 --- a/documentation/_quarto.yaml +++ b/documentation/_quarto.yaml @@ -3,23 +3,24 @@ project: output-dir: ../docs resources: - "data/**" -navbar: - title: "GFMAP" - right: - - text: "Introduction" - href: introduction.qmd - - text: "Sections" - dropdown: - - text: "Data Fetching" - href: fetching.qmd - - text: "Preprocessing" - href: preprocessing.qmd - - text: "Features" - href: features.qmd - - text: "Inference" - href: inference.qmd - - text: "Managers" - href: managers.qmd - - text: "Utils" - href: utilities.qmd +website: + navbar: + title: "GFMAP" + right: + - text: "Introduction" + href: introduction.qmd + - text: "Sections" + dropdown: + - text: "Data Fetching" + href: fetching.qmd + - text: "Preprocessing" + href: preprocessing.qmd + - text: "Features" + href: features.qmd + - text: "Inference" + href: inference.qmd + - text: "Managers" + href: managers.qmd + - text: "Utils" + href: utilities.qmd \ No newline at end of file diff --git a/documentation/introduction.qmd b/documentation/introduction.qmd index 3065c02..d27d354 100644 --- a/documentation/introduction.qmd +++ b/documentation/introduction.qmd @@ -1,9 +1,7 @@ --- -title: "GFMAP - Introduction" +title: "OpenEO General Framework for Mapping" --- -# OpenEO General Framework for Mapping - ## Introduction OpenEO GFMAP is a centralized repository of tools and utilities to build mapping pipelines with OpenEO. It is designed with four core principles in mind: @@ -31,7 +29,7 @@ For a training workflow the Data Fetching, Preprocessing, Features and Jo For a inference workflow all the above as well as the Inference features are interesting. -![OpenEO GFMAP Core Worklow](worklow.png) +![OpenEO GFMAP Core Worklow](./data/worklow.png) ## Sections From 2ac53cef5949ea547086d43d6267c642a3e781a7 Mon Sep 17 00:00:00 2001 From: Darius Couchard Date: Tue, 24 Sep 2024 13:09:17 +0200 Subject: [PATCH 5/8] Working on yaml --- documentation/_quarto.yaml | 40 ++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/documentation/_quarto.yaml b/documentation/_quarto.yaml index 5e79ee0..cb749df 100644 --- a/documentation/_quarto.yaml +++ b/documentation/_quarto.yaml @@ -4,23 +4,25 @@ output-dir: ../docs resources: - "data/**" website: - navbar: - title: "GFMAP" - right: - - text: "Introduction" - href: introduction.qmd - - text: "Sections" - dropdown: - - text: "Data Fetching" - href: fetching.qmd - - text: "Preprocessing" - href: preprocessing.qmd - - text: "Features" - href: features.qmd - - text: "Inference" - href: inference.qmd - - text: "Managers" - href: managers.qmd - - text: "Utils" - href: utilities.qmd + sidebar: + style: "floating" + collapse-level: 1 + - section: "Introduction" + contents: + - text: "Introduction" + href: introduction.qmd + - section: "Sections" + contents: + - text: "Data Fetching" + href: fetching.qmd + - text: "Preprocessing" + href: preprocessing.qmd + - text: "Features" + href: features.qmd + - text: "Inference" + href: inference.qmd + - text: "Managers" + href: managers.qmd + - text: "Utils" + href: utilities.qmd \ No newline at end of file From 3f60fda4222eb1e3e398bbffc57a262df6abcba7 Mon Sep 17 00:00:00 2001 From: Darius Couchard Date: Tue, 24 Sep 2024 13:10:59 +0200 Subject: [PATCH 6/8] Working on yaml --- documentation/_quarto.yaml | 37 +++++++++++++++++++------------------ 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/documentation/_quarto.yaml b/documentation/_quarto.yaml index cb749df..bb72383 100644 --- a/documentation/_quarto.yaml +++ b/documentation/_quarto.yaml @@ -7,22 +7,23 @@ website: sidebar: style: "floating" collapse-level: 1 - - section: "Introduction" - contents: - - text: "Introduction" - href: introduction.qmd - - section: "Sections" - contents: - - text: "Data Fetching" - href: fetching.qmd - - text: "Preprocessing" - href: preprocessing.qmd - - text: "Features" - href: features.qmd - - text: "Inference" - href: inference.qmd - - text: "Managers" - href: managers.qmd - - text: "Utils" - href: utilities.qmd + contents: + - section: "Introduction" + contents: + - text: "Introduction" + href: introduction.qmd + - section: "Sections" + contents: + - text: "Data Fetching" + href: fetching.qmd + - text: "Preprocessing" + href: preprocessing.qmd + - text: "Features" + href: features.qmd + - text: "Inference" + href: inference.qmd + - text: "Managers" + href: managers.qmd + - text: "Utils" + href: utilities.qmd \ No newline at end of file From 534cfea985c1b7a0ab372421df3e9fbb27a0abc6 Mon Sep 17 00:00:00 2001 From: Darius Couchard Date: Wed, 25 Sep 2024 15:47:35 +0200 Subject: [PATCH 7/8] Updated documentation --- documentation/_quarto.yaml | 5 +- documentation/features.qmd | 121 +++++++++++++++++++ documentation/fetching.qmd | 205 +++++++++++++++++++++++++++++++- documentation/inference.qmd | 15 +++ documentation/introduction.qmd | 10 +- documentation/managers.qmd | 4 + documentation/preprocessing.qmd | 8 ++ documentation/utilities.qmd | 27 +++++ 8 files changed, 386 insertions(+), 9 deletions(-) diff --git a/documentation/_quarto.yaml b/documentation/_quarto.yaml index bb72383..99b683c 100644 --- a/documentation/_quarto.yaml +++ b/documentation/_quarto.yaml @@ -6,9 +6,9 @@ resources: website: sidebar: style: "floating" - collapse-level: 1 + collapse-level: 2 contents: - - section: "Introduction" + - section: "Home" contents: - text: "Introduction" href: introduction.qmd @@ -26,4 +26,3 @@ website: href: managers.qmd - text: "Utils" href: utilities.qmd - \ No newline at end of file diff --git a/documentation/features.qmd b/documentation/features.qmd index e69de29..1fbe52e 100644 --- a/documentation/features.qmd +++ b/documentation/features.qmd @@ -0,0 +1,121 @@ +--- +title: "Features extraction" +--- + +In mapping pipelines, feature extraction typically occurs after the pre-processing of data fetched from the sources. Feature extraction allows to create new features from source data, and reduce timeseries signals in specific features. For example, timeseries Sentinel2 data is can be derived into spectral indices such as NDVI, but also vegetation indices such as Start of Season, Peak of Season, etc... + +Additionally, reducing timeseries data into features is an ideal task for foundation models. The flagship project of GFMAP WorldCereal is using [Presto](https://github.com/nasaharvest/presto) to perform data fusion of multiple data sources and to return vectors of 128 features. + +Feature extractors are therefore python classes that will be transformed in an UDF, allowing to transform a cube of dimensions `(bands, t, x, y)` to `(features, x, y)`. Essentially, this is a transformation from a 4D cube to a 3D cube. To implement your own feature extractor, you simply need to create your derived class based on `openeo_gfmap.features.feature_extractor.PatchFeatureExtractor`. + +### Example of feature extractor + +Let's imagine a feature extractor that performs spatial smoothing and computes the median of the signal for the red, green and blue bands and well as the latitude and longitude values. + + +```python +from openeo_gfmap.features.feature_extractor import PatchFeatureExtractor, apply_feature_extractor + +class MyExtractor(PatchFeatureExtractor): + + def output_labels(self) -> List[str]: + if self._parameters.get("extract_latlons", False): + return ["red", "green", "blue", "lat", "lon"] + else: + return ["red", "green", "blue"] + + def execute(self, inarr: xr.DataArray): + import xarray as xr + from scipy.ndimage import gaussian_filter + + # Performs some gaussian filtering to blur the RGB bands + rgb_bands = inarr.sel(bands=["S2-L2A-B04", "S2-L2A-B03", "S2-L2A-B02"]) + + for band in rgb_bands.bands: + for timestamp in rgb_bands.t: + rgb_bands.loc[{"bands": band, "t": timestamp}] = gaussian_filter( + rgb_bands.loc[{"bands": band, "t": timestamp}], sigma=1.0 + ) + + # Compute the median on the time band + rgb_bands = rgb_bands.median(dim="t").assign_coords( + {"bands": ["red", "green", "blue"]} + ) + + if self._parameters.get("extract_latlons", False): + # Compute the latlons, feature implemented in the superclass PatchFeatureExtractor + latlons = self.get_latlons(inarr) + return xr.concat([rgb_bands, latlons], dim="bands") + + return rgb_bands + +``` + +* The `output_labels` function returns what bands are expected to be returned by the feature extractor. The function can also read custom parameters for the extractor through `self._parameters.get(...)`, allowing to dynamically evaluate what are the expected output labels. + +* The execute function is the entrypoint of user code, and is expected to return a 3D `xarray.DataArray` with dimension names (bands, y, x). In this output cube, the `bands` dimensions are actually what we consider `features`, but must be named as `bands` for OpenEO compatibility. + +* Imports of libraries should be included in the individual functions or inside the class. The class source code will be read and added in a generated UDF, but code outside the class will not be exported. + +* In this example of `MyExtractore.execute` function, we compute the latlons with a function defined in a superclass `PatchFeatureExtractor.get_latlons`. + + +More advanced Feature Extractors can be found in the [WorldCereal Repository](https://github.com/WorldCereal/worldcereal-classification/blob/main/src/worldcereal/openeo/feature_extractor.py). + + +Here is an example of executing this feature extraction on a patch: + +```python + +# We first fetch the data from sources. +from openeo_gfmap import BoundingBoxExtent, SpatialContext, TemporalContext +from openeo_gfmap.backend import BACKEND_CONNECTIONS, Backend, BackendContext, cdse_connection +from openeo_gfmap.fetching import ( + FetchType, + build_sentinel2_l2a_extractor, +) + +SPATIAL_EXTENT = BoundingBoxExtent( + west=5.05, + south=51.21, + east=5.06, + north=51.22, + epsg=4326, +) + +# Recent dates for first extent +TEMPORAL_CONTEXT = ["2023-04-01", "2023-05-01"] + +# Bands to extract +bands = [ + "S2-L2A-B04", + "S2-L2A-B03", + "S2-L2A-B02", +] + +context = BackendContext(Backend.CDSE) + +# Creating an OpenEO client connection +connection = cdse_connection() + +fetcher = build_sentinel2_l2a_extractor( + backend_context=context, bands=bands, fetch_type=FetchType.TILE +) + +cube = fetcher.get_cube(connection, spatial_extent, temporal_extent) + +# We then run the extractor +from openeo_gfmap.features.feature_extractor import apply_feature_extractor + +# Parameters, can be dinamically changed by user inputs... +parameters = { + "extract_latlons": True +} + +features = apply_feature_extractor(MyExtractor, cube, parameters) + + +# Start the job asynchronously. +``` + +Note: you can test your Feature Extractor locally using `openeo_gfmap.features.feature_extractor.apply_feature_extractor_local`. \ No newline at end of file diff --git a/documentation/fetching.qmd b/documentation/fetching.qmd index 5c23e59..2f91ed1 100644 --- a/documentation/fetching.qmd +++ b/documentation/fetching.qmd @@ -1 +1,204 @@ -# Data Fetching +--- +title: "Data Fetching" +--- + +### Supported collections + +The submodule `openeo_gfmap.fetching` provides ways to access data catalogues and perform source-specific preprocessing for multiple collections. + +So far, the following collections are well supported, provinding standardized band mappings and preprocessing routines: + +* Sentinel 2 L2A +* Sentinel 1 L1C +* Copernicus 30 DEM +* AGERA5 + +For example, the output band names for Sentinel2 are: + +```python +from openeo_gfmap.fetching.s2 import BASE_SENTINEL2_L2A_MAPPING + +BASE_SENTINEL2_L2A_MAPPING +``` + +### Additional collections + +Using the generic extractor in `openeo_gfmap.fetching.generic`, you can also provide other collections available in OpenEO backends, or other collections through a STAC catalogue. Those collections band names will not be set to standardized names but instead names as provided in the collection or in the STAC catalogue. + +Ideally, GFMAP should support in priority collections that are well used. If an unsupported collections is frequently used, then GFMAP should work to integrate it. + +### Different fetch types + +GFMAP contains different data types, categorized under the `openeo_gfmap.FetchType` enum: + +* Tile based data, which consists of raster data over a continuous area. +* Polygon based data, which is also raster data, but multiple patches are scattered around a region. +* Point base data, where the output is a `VectorCube` and results are saved in a DataFrame. + +```python +from openeo_gfmap import FetchType + +[value for value in FetchType] +``` + +Whenever using a collection fetcher, the user is required to specify which Fetch Type will be used. + +### Fetcher parameters + +Whenever constructing a collection fetcher, additional parameters can be provided in the form of a python dictionnary. + +Parameters related to collection loading are provided in a subdictionnary under the `load_collection` key. + +| Parameter name | Description | Supported Collections | Example | +|:--------------:|:------------|:----------------------|:--------| +|tileID | Filters on a specific MGRS tile. | Sentinel2 L2A | "20LMR" | +|eo:cloud_cover | Filters on estimated cloud cover, between 0 and 100 | Sentinel2 L2A | `lambda val: val <= 95.0` | +|sat:orbit_state | Filters on satellite orbit direction, either `ASCENDING` or `DESCENDING` | Sentinel1 L1C | `lambda val: val == "ASCENDING" | +|polarisation | Filters on the polarisations present in the product | Sentinel1 L1C | `lambda val: val == "VV&VH"` | + +Other collection parameters exists and are dependant on the collection that is being fetched. Refer to the [OpenEO's Python Client API](https://open-eo.github.io/openeo-python-client/api.html#openeo.rest.datacube.DataCube.load_collection) and the [collection properties](https://hub.openeo.org/) for more parameters. + +In addition to collection parameters, other parameters are available and related to the source-specific preprocessing operations. + +| Parameter name | Description | Supported Collections | Example | +|:--------------:|:------------|:----------------------|:--------| +|tilesize | Size of the tile to load at a time. | All | {"update_arguments": {"feature_flag": {"tilesize": 1}}} | +|target_resolution | Resolution on which to sample the collection, experessed in meters. | All | 10 | +|resampling_method | Method on which to resample. | All | "near" | +|target_crs | CRS code - in EPSG - on which to warp the fetched data. | All | 32631 | +|elevation_model | Elevation model to use to compute the Backscatter. | Sentinel 1 | "COPERNICUS_30" | +|coefficient | Backscatter coefficient to compute. | Sentinel 1 | "sigma0-ellipsoid" | + +### Example: extracting a tile from Sentinel2. + +Extracting a continuous tile from the Sentinel 2 collection. + +```python + +from openeo_gfmap import BoundingBoxExtent, SpatialContext, TemporalContext +from openeo_gfmap.backend import BACKEND_CONNECTIONS, Backend, BackendContext, cdse_connection +from openeo_gfmap.fetching import ( + FetchType, + build_sentinel2_l2a_extractor, +) + +SPATIAL_EXTENT = BoundingBoxExtent( + west=5.0515130512706845, + south=51.215806593713, + east=5.060320484557499, + north=51.22149744530769, + epsg=4326, +) + +# Recent dates for first extent +TEMPORAL_CONTEXT = ["2023-04-01", "2023-05-01"] + +# Bands to extract +bands = [ + "S2-L2A-B01", + "S2-L2A-B04", + "S2-L2A-B08", + "S2-L2A-B11", + "S2-L2A-SCL", + "S2-L2A-AOT", +] + +context = BackendContext(Backend.CDSE) + +# Creating an OpenEO client connection +connection = cdse_connection() + +fetcher = build_sentinel2_l2a_extractor( + backend_context=context, bands=bands, fetch_type=FetchType.TILE +) + +cube = fetcher.get_cube(connection, spatial_extent, temporal_extent) + +# Create a job asynchronously or run it synchronously; +``` + +### Example: extracting sparse polygons from Sentinel2 + +Extracting sparse polygons from a large extent in Sentinel2, note the parameters used used in this +case. + +```python +from pathlib import Path + +import geopandas as gpd +import geojson + +POINTS_EXTRACTIONS_DF = Path(__file__).parent.parent / "tests/test_openeo_gfmap/resources/puglia_extractions_polygons.gpkg" + +# Converting the files to a Lat/LON GeoJSON format +extraction_df = gpd.read_file(POINTS_EXTRACTIONS_DF) +geojson_feats = extraction_df.geometry.__geo_interface__ +spatial_context = geojson.GeoJSON( + {"type": "FeatureCollection", "features": geojson_features["features"]} +) + +# Alternatively, you can submit a GeoParquet file in an artifactory bucket and use it from a public +# URL. This will allow you to specify polygons in a specific UTM projection. + +# Define the temporal context +temporal_context = TemporalContext( + start_date=extraction_df.iloc[0]["start_date"], + end_date=extraction_df.iloc[0]["end_date"], +) + +fetching_parameters = { + "target_crs": 3035, + "load_collection": { + "eo:cloud_cover": lambda val: val <= 80.0 # Max 80% cloud cover per product + } +} + +fethcer = build_sentinel2_l2a_extractor( + backend_context=context, + bands=bands, + fetch_type=FetchType.POLYGON, + **fetching_parameters, +) + +cube = fethcer.get_cube(connection, spatial_context, temporal_context) + +# Create a job asynchronously or run it synchronously; +``` + +### Example: extracting a dataset of points from Sentinel2 + +Extracting a dataset of points is similar, however you will be required to add manually an aggregation operation after getting the cube from the fetcher. + +Whenever extracting points, processing is done as it if was a set of polygons of small size, allowing to use the same processes and workflow in OpenEO than whenever processing tiles or sparse polyongs. To get points, you then need to run an aggregating operation, for example: `cube.aggregate_spatial(spatial_context, reducer="mean")`. + +```python + +# Dataset of polygons for POINT based extraction +POINT_EXTRACTION_DF = ( + Path(__file__).parent.parent / "tests/test_openeo_gfmap/resources/malawi_extraction_polygons.gpkg" +) + +extraction_df = gpd.read_file(POINT_EXTRACTION_DF) + +# Convert GeoDataFrame to feature collection to build spatial context +geojson_features = extraction_df.geometry.__geo_interface__ +spatial_context = geojson.GeoJSON( + {"type": "FeatureCollection", "features": geojson_features["features"]} +) + +# Build the temporal context +temporal_context = TemporalContext( + start_date=extraction_df.iloc[0]["start_date"], + end_date=extraction_df.iloc[0]["end_date"], +) + +extractor = build_sentinel2_l2a_extractor( + backend_context=context, + bands=bands, + fetch_type=FetchType.POINT, +) + +cube = extractor.get_cube(connection, spatial_context, temporal_context) + +cube = cube.aggregate_spatial(spatial_context, reducer="mean") +``` diff --git a/documentation/inference.qmd b/documentation/inference.qmd index e69de29..ecd2782 100644 --- a/documentation/inference.qmd +++ b/documentation/inference.qmd @@ -0,0 +1,15 @@ +--- +title: "Model Inference" +--- + +Once features are computed, models predictions (classification, regression) are done in the Model Inference class. + +While being similar in logic on how the [Feature Extractor](features.qmd) is working, the Model Inference doesn't reduce 4 dimensions into 3 but insteads transforms the `features` dimensions in `predictions`. + +Essentially, a Model Inference class will load a model run inference and write probabilities. It can also be used to perfrom Postprocessing on predictions, as done in [WorldCereal](https://github.com/WorldCereal/worldcereal-classification/blob/main/src/worldcereal/openeo/postprocess.py). + +### Running a model inference. + +An example of a ModelInference to perform predictions can be found [here](https://github.com/WorldCereal/worldcereal-classification/blob/main/src/worldcereal/openeo/inference.py#L86). As whenever implementing a `FeatureExtractor`, two functions need to be at minimum implemented: `output_labels` and `execute`. The execute function is expected to return a 3D cube, with the dimensions `(bands, x, y)`. + +To then run a model inference class, use the function `openeo_gfmap.inference.model_inferece.apply_model_inference`. diff --git a/documentation/introduction.qmd b/documentation/introduction.qmd index d27d354..9a5cfa9 100644 --- a/documentation/introduction.qmd +++ b/documentation/introduction.qmd @@ -29,7 +29,7 @@ For a training workflow the Data Fetching, Preprocessing, Features and Jo For a inference workflow all the above as well as the Inference features are interesting. -![OpenEO GFMAP Core Worklow](./data/worklow.png) +![OpenEO GFMAP Core Worklow](data/worklow.png) ## Sections @@ -37,10 +37,10 @@ This documentation covers the different building blocks that an user can use on * [Data Fetching](fetching.qmd): Allows to fetch multiple data sources and performs source specific preprocessing (backscatter computation for Sentinel1, resolution resampling, ...). * [Preprocessing](preprocessing.qmd): Offers preprocessing functionalities common to all data sources. -* Features: Assists in the creation of an UDF to compute features. Reducing the `(bands, time)` dimension into `features` (running PCA, Foundation Models, Vegetation indices...). -* Inference: Assists in the creation of an UDF to run a classification/regression model to transform the `features` into predictions. -* Managers: Split large tasks into multiple jobs, tracks the jobs, manage their STAC metadata and their assets. -* Utils: Miscalleneous utilities (choosing the best Sentinel1 orbit, managing STAC catalogues, working with NetCDF files...) +* [Features](features.qmd): Assists in the creation of an UDF to compute features. Reducing the `(bands, time)` dimension into `features` (running PCA, Foundation Models, Vegetation indices...). +* [Inference](inference.qmd): Assists in the creation of an UDF to run a classification/regression model to transform the `features` into predictions. +* [Managers](managers.qmd): Split large tasks into multiple jobs, tracks the jobs, manage their STAC metadata and their assets. +* [Utils](utils.qmd): Miscalleneous utilities (choosing the best Sentinel1 orbit, managing STAC catalogues, working with NetCDF files...) ## Related projects diff --git a/documentation/managers.qmd b/documentation/managers.qmd index e69de29..8953e38 100644 --- a/documentation/managers.qmd +++ b/documentation/managers.qmd @@ -0,0 +1,4 @@ +--- +title: "Job Managers": +--- + diff --git a/documentation/preprocessing.qmd b/documentation/preprocessing.qmd index e69de29..c91ed8c 100644 --- a/documentation/preprocessing.qmd +++ b/documentation/preprocessing.qmd @@ -0,0 +1,8 @@ +--- +title: "Preprocessing utilities" +--- + +The functions available in the `openeo_gfmap.preprocessing` submodule are mostly wrappers around OpenEO's processes. While GFMAP offers nothing new in terms of compositing, interpolation and band scaling, it collects a series of interesting tools not yet implemented natively in OpenEO: + +* Cloud Masking: Best Available Pixel (BAP) Cloud Masking, Deep Learning Cloud Masking (to come soon?) +* SAR Preprocessing: Transform floating values of SAR backscatter into a compressed uint16 representation. diff --git a/documentation/utilities.qmd b/documentation/utilities.qmd index e69de29..86c34c7 100644 --- a/documentation/utilities.qmd +++ b/documentation/utilities.qmd @@ -0,0 +1,27 @@ +--- +title: "Utilities" +--- + +There are several utilies implemented in OpenEO GFMAP + + +## Source catalogue related utilities + +* `select_s1_orbitstate_vvvh`: Allows you to choose which is the best orbit to select from a SENTINEL1 catalogue between "ASCENDING" and "DESCENDING" from a spatio-temporal context. Works at the moment only for the `CDSE`, `CDSE-STAGING` and `FED` backend, as it is using CloudFerro's catalogues. Is using a multitude of factors, please refer to the functions docstring. + + +## Intervals + +* `quintad_intervals`: Allows you to compute 5-days interval that never overarch two months, similarly to "dekad" (10-days) intervals. + +## STAC catalogues + +* `split_collection_by_epsg`: Splits a given stac catalogue or path to a stac catalogue into multiple catalogues based on the local projection of each item. + +## Build dataframe + +* Builds a proper dataframe from a geojson result (from a VectorCube). + +## NetCDF + +* `update_nc_attributes`: Allows to update the attributes of a netcdf file without needing to rewrite it (useful for large files). From c37d73547ad5e7e523b0c588d48878e77cd40b0c Mon Sep 17 00:00:00 2001 From: Darius Couchard Date: Wed, 25 Sep 2024 15:51:46 +0200 Subject: [PATCH 8/8] Removed typo --- documentation/managers.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/managers.qmd b/documentation/managers.qmd index 8953e38..2746506 100644 --- a/documentation/managers.qmd +++ b/documentation/managers.qmd @@ -1,4 +1,4 @@ --- -title: "Job Managers": +title: "Job Managers" ---