-
Notifications
You must be signed in to change notification settings - Fork 1
Tutorial
Welcome dear contributor 🖐
This page explains how you can contribute to this ambitious project 💪💪💪 Various areas of contribution have been identified:
- Integrating a new city
- Extracting a new layer
- Referencing an extracted layer
- Computing a new final indicators
The UrbanShift project consists of delivering a set of layers and indicators at the city level in order to provide insights for specific themes: urban biodiversity status, greenspace monitoring... These indicators are calculated by computing zonal statistics aggregated at different administrative levels within the cities and based on available global open data.
The administrative boundaries of the target cities are key inputs for enabling this process.
For each city, we distinguish between two levels of analysis:
- The units of analysis: corresponding to the list of administrative entities within the selected city/region
- The areas of interest: corresponding to the city-wide areas that we build based on the union of units-of-analysis geometries
For including cities in the indicators framework, we need to:
- Collect and store the shapefiles corresponding to the administrative boundaries of the two levels of analysis with respect to a specific format and schemas
- Fill a parameter file referring to our referential of cities that we use as key input for the whole process
The administrative boundaries of UrbanShift cities are stored as separate geojson files hosted in aws s3 bucket: s3://cities-urbanshift/data/boundaries/v_0/. Every city is defined by two geojson files:
- one file corresponding to the list of geometries of the sub-city administrative areas (called units of analysis)
- one file corresponding to the union of the units of analysis and referring to the city-wide geometry (called areas of interest)
The geojson files names follow a specific format representing a concatenation of: the country isco code 3, the city name and administrative level. For example the boundary file of level 2 administrative areas in SanJose city is named as boundary-CRI-San_Jose-ADM2.geojson
The geojson files must include at least these 4 fields:
field | description |
---|---|
geo_id | It is the unique identifier of every geometry within the whole list of geometries in all the cities. This field is very important since it is used for joining the indicator table and geometries. It is generated as a concatenation of geo_name , geo_level and an iterator number. For example, if we have two geometries in the city CRI-San_Jose corresponding to the admin level 2, they will have as identifiers: 'CRI-San_Jose_ADM2_1 and 'CRI-San_Jose_ADM2_2 . |
geo_level | The administrative level (ADM1, ADN2, ADM3...). If the geometry is generated as the union of a list of sub-geometries in an admin level x, we should define it as ADMx-union
|
geo_name | The name of the city as a concatenation of the country iso code 3 and the common name. For example, CRI-San_jose for San Jose in Costa Rica (CRI) |
geo_parent_name | Since we are considering two levels of analysis (the area of interest and the units of analysis) we track this parent/child relationship between the geometries using this field. |
You can find in this link an example of a geojson file corresponding to the area of interest for San Jose.
You can find in this link an example of a geojson file corresponding to the units of analysis for San Jose.
The parameter table is a key input in the whole process because it contains the list of cities we are processing and paths to the different boundaries by level. It is stored as a csv file in the aws s3 bucket parameter file.
This should be updated every time we want to add a new city or change the area of interest or the units of analysis.
field | description |
---|---|
geo_name | The technical name of the city as specified previously: a concatenation of country code and city name |
level | The administrative level label (region, city, metropolitan...). |
aoi_boundary_name | The administrative level name corresponding to the defined area of interest (ADM1, ADM1union, ADM2, ADM2union,...) |
units_boundary_name | The administrative level name corresponding to the defined units of analysis (ADM1, ADM2,...) |
city_name | The city label without the iso3 code for visualization. |
country_name | The country name |
country_code | The country ISO code 3 |
continent | The continent name (America, Africa, Europe...) |
One of the goal of Urbanshift project consists of extracting and documenting geospatial data for our partners cities. The extracted data and generated metadata describing it are shared through a Data Hub. These data can be then explored, visualized and downloaded by the cities.
In order to extract city-wide data we need:
- The city-wide administrative boundaries
- The global data source from which we want to extract subset of data.
The extraction of city-wide layers consists of generating a subset the global data source based on the city-wide geometry. The generated subset file need to be stored in a specific file in aws s3 and it will be reused by the apps and reports consuming the output data. Only geojson and Geotiff formats are accepted for the extracted layers. The name of the extracted layers should contain the cities' identifiers (country code + city name) as defined in the administrative boundaries files.
For example, the extracted layer from ESA World cover data for San Jose city is stored in this file: https://cities-urbanshift.s3.eu-west-3.amazonaws.com/data/land_use/esa_world_cover/v_0/CHN-Ningbo-ADM3union-ESA-world_cover-2000.tif. An example of notebook used for extracting this layer is available in this link.
Once we extracted the layers for the different available cities, we need to reference the new extracted data by adding the metadata describing various properties: data source, spatial and temporal resolution and extent, format...
For each layer, we generate a json metadata file compiling the metadata for the list of cities. This metadata file is stored in the same folder as the corresponding extracted layers and is named metadata.json. You can find an example of a metadata file for ESA World Cover layers in this link.
An example of notebook used for generating the metadata file is available in this link.
If we are using an additional data source for computing a new indicator, we should start by extracting the city-wide layers and referencing as explained previously them before computing the indicator.
The indicators should be calculated at the different levels: Areas of interest and the units of analysis.
The computed indicators are stored in a csv file combining all the cities and sub-cities identifiers. Every new indicator should be added to this dataframe in a sperate column. This can be done by making a simple left-join between the new calculated indicator table and the existing indicator table using the geo_id field. An example of notebook used for computing the indicator "percent of tree cover" using "Tree Modaic Land dataset" is available in this link.
geo_id | geo_level | geo_name | indicator 1 | indicator 2 | ... | indicator n |
---|---|---|---|---|---|---|
CRI-San_jose_ADM-2_1 | ADM2 | CRI-San_jose | x1 | x2 | .... | xn |
CRI-San_jose_ADM-2_2 | ADM2 | CRI-San_jose | x1 | x2 | .... | xn |
... |
The indicator table compiling the list of computed indicators a specific csv file sored here: s3://cities-urbanshift/indicators/cities_indicators.csv. This file is updated every time we integrate a new indicator.
The indicator table should be initialized in these cases: integration of a new city, deletion of an existing city, a change in the sub-city features. You can find the notebook to initialize the indicator table in this link.