Skip to content

Commit

Permalink
transport network creation docs
Browse files Browse the repository at this point in the history
  • Loading branch information
thomas-fred committed Jul 21, 2023
1 parent d29b6fc commit 3ab0e79
Showing 1 changed file with 147 additions and 38 deletions.
185 changes: 147 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,9 @@
[![pyTest](https://github.com/nismod/open-gira/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/nismod/open-gira/actions/workflows/test.yml)
[![snakemake workflow](https://img.shields.io/badge/snakemake-open--gira-informational)](https://snakemake.github.io/snakemake-workflow-catalog/?usage=nismod/open-gira)

This open-source [snakemake](https://snakemake.readthedocs.io/en/stable/) workflow will
analyse physical climate risks to infrastructure networks using global open data.

The related open-source Python library [snail](https://github.com/nismod/snail) provides
some of the core vector-raster intersection functionality.
This open-source [snakemake](https://snakemake.readthedocs.io/en/stable/)
workflow can be used to analyse physical climate risks to infrastructure
networks using global open data.

> Work in Progress
>
Expand All @@ -27,13 +25,13 @@ some of the core vector-raster intersection functionality.
## Installation

### Conda
### Conda packages

This repository comes with a `environment.yml` file describing the `conda` and
`PyPI` packages required to run `open-gira`. The `open-gira` developers recommend
using either [mamba](https://mamba.readthedocs.io/en/latest/index.html) or
[micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html#micromamba)
to install and manage these `conda` packages.
`PyPI` packages required to run `open-gira`. The `open-gira` developers
recommend using either [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html#micromamba)
or [mamba](https://mamba.readthedocs.io/en/latest/index.html) to install and
manage these `conda` packages.

#### Locally

Expand All @@ -48,7 +46,8 @@ And to activate the environment:
micromamba activate open-gira
```

You are now ready to request result files, triggering jobs in the process.
You are now ready to request result files, triggering analysis jobs in the
process.

#### Cluster

Expand All @@ -59,10 +58,16 @@ micromamba create -n snakemake python=3.9 snakemake
```

In this context, `snakemake` itself can manage the other required dependencies,
creating other environments as necessary. Activate the orchestration
environment before requesting targets with the following:
creating other environments as necessary. To activate the orchestration
environment:
```bash
micromamba activate snakemake
```

To run the workflow on a [cluster](https://snakemake.readthedocs.io/en/stable/executing/cluster.html)
you will need to provide a [profile](https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles),
requesting targets as follows:
```bash
snakemake --profile <path_to_cluster_config> -- <target_file>
```

Expand All @@ -75,43 +80,113 @@ installation instructions [here](https://github.com/isciences/exactextract).

## Running tests

Workflow steps are tested using a small sample dataset. Run:

```
Workflow steps are tested using a small sample dataset. To run the tests:
```bash
python -m pytest tests
```

## Usage

TODO: General introduction to `snakemake`
`open-gira` is comprised of a set of `snakemake` rules which call scripts and
library code to request data, process it and produce results.

The snakemake configuration details are in `config/config.yml`. You can edit
this to set the target OSM infrastructure datasets, number of spatial slices, and
hazard datasets. See
[config/README.md](https://github.com/nismod/open-gira/blob/main/config/README.md)
for details on the configuration variables. For new users, the default values should suffice.
The key idea of `snakemake` is similar to `make` in that the workflow is
determined from the end (the files users want) to the beginning (the files
users have, if any) by applying general rules with pattern matching on file and
folder names.

### Network creation
A example invocation looks like:
```
snakemake --cores 2 -- results/wales-latest_filter-road/edges.geoparquet
```

TODO
Here, we ask `snakemake` to use up to 2 CPUs to produce a target file, in this
case, the edges of the Welsh road network. To check what work we're going to
request before commencing, use the `-n` flag:
```
snakemake -n --cores 2 -- results/wales-latest_filter-road/edges.geoparquet
```

#### Road
This will explain which rules will be required to run to produce the target
file. It may be helpful to [visualise](https://snakemake.readthedocs.io/en/stable/executing/cli.html#visualization)
which rules are expected to run, too.

TODO
### Configuration

The snakemake configuration details are in `config/config.yml`. You can edit
this to set the target OSM infrastructure datasets, number of spatial slices, and
hazard datasets. See below and [config/README.md](https://github.com/nismod/open-gira/blob/main/config/README.md)
for more details on the configuration variables.

### Available pipelines

#### Network creation

`open-gira` can currently create several types of connected infrastructure
network from open data.

##### Road

We can create a topologically connected road network for a given area from
[OpenStreetMap](https://www.openstreetmap.org) (OSM) data. The resulting
network can be annotated with data retrieved from OSM (e.g. highway
classification, surface type), along with data looked up from user-supplied
sources (e.g. rehabilitation costs). The network edges will be labelled with
from nodes and to nodes, describing the connectedness of the network.

To specify a desired network:
- Review and amend the spreadsheets in `bundled_data/transport`, these supply
information that is used to gap-fill or extend what can be determined from OSM alone.
- Review and amend `config/config.yaml`:
- The `infrastructure_datasets` map should contain a key pointing to an `.osm.pbf`
file URL for desired area. There are currently entries for the planet,
for (some definition of) continents and several countries. We use
the [geofabrik](http://download.geofabrik.de/) service for continent and
country-level OSM extracts.
- Check the OSM filter file pointed to by `network_filters.road`.
This file specifies which [elements](https://wiki.openstreetmap.org/wiki/Elements)
(nodes, ways or relations) to keep (or reject) from the multitude of data
in an OSM file. See the filter expressions section
[here](https://docs.osmcode.org/osmium/latest/osmium-tags-filter.html)
for more information on the syntax of these files.
- Check and amend `keep_tags.road`. This list of strings specifies which
`tags` (attributes) to retain on the filtered elements we extract from
the `.osm.pbf` file.
- Review `slice_count`. This controls the degree of parallelism possible.
With it set to 1, there is no spatial slicing (we create the network in
a single chunk). To speed network creation for large domains, it can be
set to a larger square number. The first square number greater than your
number of available CPUs is a good heuristic.
- Check and amend the values of `transport.road`, which provide some
defaults for OSM data gap-filling.

And to create the network, by way of example:
```
snakemake --cores all -- results/egypt-latest_filter-road/edges.geoparquet
```

#### Rail
##### Rail

TODO
The process for creating a rail network is essentially the same as for road.
Please see the road section above for the relevant options that can be configured.

#### Electricity grid creation
An example network creation call would be:
```
snakemake --cores all -- results/egypt-latest_filter-rail/edges.geoparquet
```

Note that the nodes file, `results/egypt-latest_filter-rail/nodes.geoparquet`
will by default contain the stations and their names.

##### Electricity grid creation

TODO

### Risk assessment
#### Risk assessment

TODO

#### Transport / flooding
##### Transport / flooding

The pipeline starts from a OpenStreetMap dataset (_e.g._
`europe-latest`) and produces network/flood hazard intersection data,
Expand Down Expand Up @@ -168,21 +243,39 @@ This is a directional acyclic graph (DAG) of a simplified version of the workflo
that uses just one OSM dataset, one hazard dataset, and one slice:
![DAG of the Snakefile workflow](docs/src/img/DAG-simple.png)

#### Electricity grid / tropical cyclone
##### Electricity grid / tropical cyclone

TODO

### Cleaning intermediate outputs
#### Cleaning intermediate outputs

You can remove all intermediate files by running
You can remove intermediate files by running the `clean` rule. To check what will be deleted,
```bash
snakemake -c1 -R clean
```

### Utilities

`open-gira` comes with a few small utilities outside the `snakemake` workflows.

#### Geoparquet -> Geopackage

As standard we use the `.geoparquet` format to store vector data on disk.
Unfortunately common GIS software such as QGIS may not yet support this file
format. To convert file(s) to geopackage, use:
```
snakemake -c1 -R clean
python workflow/scripts/pq_to_gpkg.py <path_to_geoparquet_1> <path_to_geoparquet_2> <...>
```

Note that this will *not* remove the final data files
`<output_dir>/<dataset>_filter-<filters>_hazard-<hazard>.geoparquet`,
nor will it remove the original input files `<output_dir>/input/*`.
This will write `.geopackage` files beside their source `.geoparquet`.

#### Unpickling interactive plots

`matplotlib` plots can be interactive (zoom, pan, etc.), but not as static
images. Some rules produce pickled plot files. To view these, use:
```
python workflow/scripts/unpickle_plot.py <path_to_pickled_plot>
```

## Documentation

Expand All @@ -202,6 +295,22 @@ open book/index.html

Or run `mdbook serve` to run a server and rebuild the docs as you make changes.

## Related projects

Two libraries have been developed in tandem with `open-gira` and provide some
key functionality.

### snail

The open-source Python library [snail](https://github.com/nismod/snail)
is used for vector-raster intersection, e.g. identifying which road segments
might be affected by a set of flood map hazard rasters.

### snkit

The [snkit](https://github.com/tomalrussell/snkit) library is used for
network cleaning and assembly.

## Acknowledgments

This research received funding from the FCDO Climate Compatible Growth
Expand Down

0 comments on commit 3ab0e79

Please sign in to comment.