Skip to content

Commit

Permalink
Release version 1.5.0, Merge pull request #234 from sentinel-hub/develop
Browse files Browse the repository at this point in the history
Release version 1.5.0
  • Loading branch information
zigaLuksic authored Apr 25, 2023
2 parents ec43cd0 + 0cb02b8 commit 0e2fa52
Show file tree
Hide file tree
Showing 163 changed files with 4,672 additions and 5,452 deletions.
14 changes: 0 additions & 14 deletions .coveragerc

This file was deleted.

4 changes: 2 additions & 2 deletions .github/workflows/ci_action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:

- name: Install packages
run: |
pip install -e .[DEV]
pip install -e .[DEV,ML]
- name: Run mypy
run: |
Expand Down Expand Up @@ -88,7 +88,7 @@ jobs:
sudo apt-get install -y build-essential gdal-bin libgdal-dev graphviz proj-bin gcc libproj-dev libspatialindex-dev
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
pip install -e .[DEV]
pip install -e .[DEV,ML]
pip install gdal==$(gdal-config --version | awk -F'[.]' '{print $1"."$2}')
- name: Run fast tests
Expand Down
23 changes: 15 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,27 @@ repos:
- id: check-merge-conflict
- id: debug-statements

- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.0.0-alpha.6"
hooks:
- id: prettier
exclude: "tests/(test_stats|test_project)/"
types_or: [json]

- repo: https://github.com/psf/black
rev: 22.12.0
rev: 23.3.0
hooks:
- id: black
language_version: python3

- repo: https://github.com/pycqa/isort
rev: 5.11.4
rev: 5.12.0
hooks:
- id: isort
name: isort (python)

- repo: https://github.com/PyCQA/autoflake
rev: v2.0.0
rev: v2.0.2
hooks:
- id: autoflake
args:
Expand All @@ -40,13 +47,13 @@ repos:
hooks:
- id: flake8
additional_dependencies:
- flake8-bugbear
- flake8-comprehensions
- flake8-simplify
- flake8-typing-imports
- flake8-bugbear==23.2.13
- flake8-comprehensions==3.10.1
- flake8-simplify==0.19.3
- flake8-typing-imports==1.14.0

- repo: https://github.com/nbQA-dev/nbQA
rev: 1.6.0
rev: 1.7.0
hooks:
- id: nbqa-black
- id: nbqa-isort
Expand Down
4 changes: 0 additions & 4 deletions MANIFEST.in

This file was deleted.

6 changes: 2 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
# Makefile for creating a new release of the package and uploading it to PyPI

PYTHON = python3

help:
@echo "Use 'make upload' to upload the package to PyPI"

upload:
rm -r dist | true
$(PYTHON) setup.py sdist bdist_wheel
python -m build --sdist --wheel
twine upload --skip-existing dist/*

# For testing:
test-upload:
rm -r dist | true
$(PYTHON) setup.py sdist bdist_wheel
python -m build --sdist --wheel
twine upload --repository testpypi --skip-existing dist/*
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ Running pipelines is easiest by using the CLI provided by **`eo-grow`**. For all

## Documentation

For more information on the package visit [readthedocs](https://eo-grow.readthedocs.io/en/latest/).

Explanatory examples can be found [here](https://github.com/sentinel-hub/eo-grow/tree/main/examples).

More details on the config language used by **`eo-grow`** can be found [here](https://github.com/sentinel-hub/eo-grow/tree/main/docs/source/config-language.md).
Expand Down
202 changes: 202 additions & 0 deletions docs/source/common-configuration-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# Common Configuration Patterns

## Using config templates

When you need to write a config for a pipeline, you can avoid rummaging through documentation by using the template command `eogrow-template`.

Invoking `eogrow-template "eogrow.pipelines.zipmap.ZipMapPipeline" "zipmap.json"` creates a file with the content:
```json
{
"pipeline": "eogrow.pipelines.zipmap.ZipMapPipeline",
"pipeline_name": "<< Optional[str] >>",
"workers": "<< 1 : int >>",
"use_ray": "<< 'auto' : Union[Literal['auto'], bool] >>",
"input_features": {
"<< type >>": "List[InputFeatureSchema]",
"<< nested schema >>": "<class 'eogrow.pipelines.zipmap.InputFeatureSchema'>",
"<< sub-template >>": {
"feature": "<< Tuple[FeatureType, str] >>",
"folder_key": "<< str >>",
"include_bbox_and_timestamp": "<< True : bool >>"
}
},
...
}
```
You can now remove any parameters you do not need and fill out the rest.

Parameter values are of form `"<< default : type >>"`, or `"<< default : type // description >>"` if you use the `--add-description` flag.

The parameters are in order of definition, causing `ZipMap` specific parameters come at the end (we switched the order a bit in the example).

In cases of nested schema, you get the output as the above for `"input_features"` which tells you what the type of the nesting is, and the template for the nested pydantic model.

For managers the template does not provide a schema directly, but the functionality is not restricted to pipelines, you can also invoke `eogrow-template "eogrow.core.logging.LoggingManager" "logging_manager.json"` to get templates for the logging manager.

## Global config

Most of the configuration files have a lot in common. This tends to be especially true for fields describing managers:
- `area`
- `storage`
- `logging`

From our experience, it is sometimes easiest to create a so-called *global configuration*, which contains all such fields.

```
{ // global_config.json
"area": {
...
},
"storage": {
...
},
"logging": {
...
}
}
```

This is then used in pipeline configurations.

```
{ // export.json
"pipeline": "eogrow.pipelines.export_maps.ExportMapsPipeline",
"**global_config": "${config_path}/global_config.json",
"feature": ["data", "BANDS"],
"map_dtype": "int16",
"cogify": true,
...
}
```

This keeps pipeline configs shorter and more readable. One can also use multiple such files, for instance one for each manager. This makes it easy to have pipelines that work on different resolutions, where it's possible to just switch between `"**area_config": "${config_path}/area_10m.json"` and `"**area_config": "${config_path}/area_30m.json"`.

How fine-grained your config structure becomes is usually project-specific. Spreading it too thinly makes it harder to follow what precisely will be in the end config.

### Adjusting settings from the global config

In some cases, the settings from a global config (or from a different config file that you are importing) need to be overridden. Imagine that a pipeline produces a ton of useless warnings, and you only wish to ignore them for that specific pipeline.

```
{ // export.json
"pipeline": "eogrow.pipelines.export_maps.ExportMapsPipeline",
"**global_config": "${config_path}/global_config.json",
"logging": {
"capture_warnings": false
},
"feature": ["data", "BANDS"],
"map_dtype": "int16",
"cogify": true,
...
}
```

The processed configuration will have all the logging settings from `global_config.json`, except for `"capture_warnings"`. See [config language rules](config-language.html) for config joins.

## Pipeline chains

Pipeline chains are briefly touched in the config language docs, but only at the syntax level. Here we'll show two common usage patterns.

### End-to-end pipeline chain

In certain use cases we have multiple pipelines that are meant to be run in a certain succession. A great way of organizing that is via order-prefix naming, so `03_export_pipeline.json` is to be run as the third pipeline.

But the user still needs to run them in the correct order and by hand. This we can automate with a simple pipeline chain that links them together:
```
[ // end_to_end_run.json
{"**download": "${config_path}/01_download.json"},
{"**preprocess": "${config_path}/02_preprocess_data.json"},
{"**predict": "${config_path}/03_use_model.json"},
{"**export": "${config_path}/04_export_maps.json"},
{"**ingest": "${config_path}/05_ingest_byoc.json"},
]
```

A simple `eogrow end_to_end_run.json` now runs all of these pipelines one after another.

### Rerunning with different parameters

In experimentation we often want to run the same pipeline for multiple parameter values. With a tiny bit of boilerplate this can also be taken care of with config chains.

```
[ // run_threshold_experiments.json
{
"variables": {"threshold": 0.1},
"**pipeline": "${config_path}/extract_trees.json"
},
{
"variables": {"threshold": 0.2},
"**pipeline": "${config_path}/extract_trees.json"
},
{
"variables": {"threshold": 0.3},
"**pipeline": "${config_path}/extract_trees.json"
},
{
"variables": {"threshold": 0.4},
"**pipeline": "${config_path}/extract_trees.json"
}
]
```

### Using variables with pipelines

While there is no syntactic sugar for specifying pipeline-chain-wide variables in JSON files, one can do that through CLI. Running `eogrow end_to_end_run.json -v "year:2019"` will set the variable `year` to 2019 for all pipelines in the chain.

## Path modification via variables

In some cases one wants fine grained control over path specifications. The following is a simplified example of how one can provide separate download paths for a large amount of batch pipelines.

```
{ // global_config.json
"storage": {
"structure": {
"batch_tiffs": "batch-download/tiffs/year-${var:year}-${var:quarter}",
...
},
...
},
...
}
```

```
{ // batch_download.json
"pipeline": "eogrow.pipelines.download_batch.BatchDownloadPipeline",
"**global_config": "${config_path}/global_config.json",
"output_folder_key": "batch_tiff",
"inputs": [
{
"data_collection": "SENTINEL2_L2A",
"time_period": "${var:year}-${var:quarter}"
},
...
],
...
}
```

We now just need to provide the variables when running the config. This can be done either through the CLI via `eogrow batch_download.json -v "year:2019" -v "quarter:Q1"` or (for increased reproducibility) create configs with the variables specified in advance:

```
{ // batch_download_2019_Q4.json
"**pipeline": "${config_path}/batch_download.json",
"variables": {"year": 2019, "quarter": "Q4"}
}
```

In such cases, we advise you do not provide any variables in the core pipeline configuration (i.e. "batch_download.json") so that the config parsing fails if not all variables are specified. Otherwise you risk typo-specific problems such as specifying a value for `"yaer"` which won't override the `"year"` variable (and you download data for 2019 instead of 2020).

A similar specific-paths mechanism can also be achieved by modifying the storage manager directly from the final config:
```
{ // batch_download_2019_Q4.json
"**pipeline": "${config_path}/batch_download.json",
"variables": {"year": 2019, "quarter": "Q4"}
"storage": {
"structure": {
"batch_tiffs": "batch-download/tiffs/year-2019-Q4"
}
}
}
```
While that is sufficient for many cases and more explicit, variables are preffered and might be less error-prone in case of complex folder structures.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"sphinx.ext.githubpages",
"nbsphinx",
"sphinx_rtd_theme",
"m2r2",
"sphinx_mdinclude",
"sphinxcontrib.autodoc_pydantic",
]

Expand Down
4 changes: 3 additions & 1 deletion docs/source/config-language.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Additional notes:
- So far, config language is not completely OS-agnostic and it might not support Windows file paths.


### Pipeline joins
### Pipeline chains

A typical configuration is a dictionary with pipeline parameters. However, it can also be a list of dictionaries. In this case each dictionary must contain parameters of a single pipeline. The order of dictionaries defines the consecutive order in which pipelines will be run. Example:

Expand All @@ -44,3 +44,5 @@ A typical configuration is a dictionary with pipeline parameters. However, it ca
...
]
```

There is currently no functionality to merge multiple pipeline chains, except by manually concatenating their contents into a single file.
Loading

0 comments on commit 0e2fa52

Please sign in to comment.