Tutorial

Welcome dear contributor 🖐

This page explains how you can contribute to this ambitious project 💪💪💪 Various areas of contribution have been identified:

Integrating a new city
Extracting a new layer
Referencing an extracted layer
Computing a new final indicators

Integrating a new city

The UrbanShift project consists of delivering a set of layers and indicators at the city level in order to provide insights for specific themes: urban biodiversity status, greenspace monitoring... These indicators are calculated by computing zonal statistics aggregated at different administrative levels within the cities and based on available global open data.

The administrative boundaries of the target cities are key inputs for enabling this process.

For each city, we distinguish between two levels of analysis:

The units of analysis: corresponding to the list of administrative entities within the selected city/region
The areas of interest: corresponding to the city-wide areas that we build based on the union of units-of-analysis geometries

For including cities in the indicators framework, we need to:

Collect and store the shapefiles corresponding to the administrative boundaries of the two levels of analysis with respect to a specific format and schemas
Fill a parameter file referring to our referential of cities that we use as key input for the whole process

Administrative boundaries

The administrative boundaries of UrbanShift cities are stored as separate geojson files hosted in aws s3 bucket: s3://cities-urbanshift/data/boundaries/v_0/. Every city is defined by two geojson files:

one file corresponding to the list of geometries of the sub-city administrative areas (called units of analysis)
one file corresponding to the union of the units of analysis and referring to the city-wide geometry (called areas of interest)

The geojson files names follow a specific format representing a concatenation of: the country isco code 3, the city name and administrative level. For example the boundary file of level 2 administrative areas in SanJose city is named as boundary-CRI-San_Jose-ADM2.geojson

The geojson files must include at least these 4 fields:

field	description
geo_id	It is the unique identifier of every geometry within the whole list of geometries in all the cities. This field is very important since it is used for joining the indicator table and geometries. It is generated as a concatenation of `geo_name`, `geo_level` and an iterator number. For example, if we have two geometries in the city `CRI-San_Jose` corresponding to the admin level 2, they will have as identifiers: 'CRI-San_Jose_ADM2_1 `and 'CRI-San_Jose_ADM2_2`.
geo_level	The administrative level (ADM1, ADN2, ADM3...). If the geometry is generated as the union of a list of sub-geometries in an admin level x, we should define it as `ADMx-union`
geo_name	The name of the city as a concatenation of the country iso code 3 and the common name. For example, `CRI-San_jose` for `San Jose` in Costa Rica (CRI)
geo_parent_name	Since we are considering two levels of analysis (the area of interest and the units of analysis) we track this parent/child relationship between the geometries using this field.

You can find in this link an example of a geojson file corresponding to the area of interest for San Jose.

You can find in this link an example of a geojson file corresponding to the units of analysis for San Jose.

Parameter boundary table

The parameter table is a key input in the whole process because it contains the list of cities we are processing and paths to the different boundaries by level. It is stored as a csv file in the aws s3 bucket parameter file.

This should be updated every time we want to add a new city or change the area of interest or the units of analysis.

field	description
geo_name	The technical name of the city as specified previously: a concatenation of country code and city name
level	The administrative level label (region, city, metropolitan...).
aoi_boundary_name	The administrative level name corresponding to the defined area of interest (ADM1, ADM1union, ADM2, ADM2union,...)
units_boundary_name	The administrative level name corresponding to the defined units of analysis (ADM1, ADM2,...)
city_name	The city label without the iso3 code for visualization.
country_name	The country name
country_code	The country ISO code 3
continent	The continent name (America, Africa, Europe...)

Extracting a new layer

One of the goal of Urbanshift project consists of extracting and documenting geospatial data for our partners cities. The extracted data and generated metadata describing it are shared through a Data Hub. These data can be then explored, visualized and downloaded by the cities.

In order to extract city-wide data we need:

The city-wide administrative boundaries
The global data source from which we want to extract subset of data.

The extraction of city-wide layers consists of generating a subset the global data source based on the city-wide geometry. The generated subset file need to be stored in a specific file in aws s3 and it will be reused by the apps and reports consuming the output data. Only geojson and Geotiff formats are accepted for the extracted layers. The name of the extracted layers should contain the cities' identifiers (country code + city name) as defined in the administrative boundaries files.

For example, the extracted layer from ESA World cover data for San Jose city is stored in this file: https://cities-urbanshift.s3.eu-west-3.amazonaws.com/data/land_use/esa_world_cover/v_0/CHN-Ningbo-ADM3union-ESA-world_cover-2000.tif. An example of notebook used for extracting this layer is available in this link.

Referencing a new layer

Once we extracted the layers for the different available cities, we need to reference the new extracted data by adding the metadata describing various properties: data source, spatial and temporal resolution and extent, format...

For each layer, we generate a json metadata file compiling the metadata for the list of cities. This metadata file is stored in the same folder as the corresponding extracted layers and is named metadata.json. You can find an example of a metadata file for ESA World Cover layers in this link.

An example of notebook used for generating the metadata file is available in this link.

Computing a new indicator

If we are using an additional data source for computing a new indicator, we should start by extracting the city-wide layers and referencing as explained previously them before computing the indicator.

The indicators should be calculated at the different levels: Areas of interest and the units of analysis.

The computed indicators are stored in a csv file combining all the cities and sub-cities identifiers. Every new indicator should be added to this dataframe in a sperate column. This can be done by making a simple left-join between the new calculated indicator table and the existing indicator table using the geo_id field. An example of notebook used for computing the indicator "percent of tree cover" using "Tree Modaic Land dataset" is available in this link.

geo_id	geo_level	geo_name	indicator 1	indicator 2	...	indicator n
CRI-San_jose_ADM-2_1	ADM2	CRI-San_jose	x1	x2	....	xn
CRI-San_jose_ADM-2_2	ADM2	CRI-San_jose	x1	x2	....	xn
...

The indicator table compiling the list of computed indicators a specific csv file sored here: s3://cities-urbanshift/indicators/cities_indicators.csv. This file is updated every time we integrate a new indicator.

The indicator table should be initialized in these cases: integration of a new city, deletion of an existing city, a change in the sub-city features. You can find the notebook to initialize the indicator table in this link.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly