We want to automate mapping of urban vegetation from street level imagery (SLI) to create an informative layer for exploring which areas of a city might be at greater risk during future heat waves and for engaging people in discussions about green and nature-based solutions for climate adaptation.
The collection and analysis of SLI can be used as part of a citizen science workflow. We hypothesize that a participatory approach to assessing issues like extreme heat hazards from climate change can empower communities by: giving them a greater understanding of the hazards facing their community, building their confidence in designing effective actions to reduce the impacts of those hazards, and strengthening connections between community members.
This project was inspired by the Treepedia project from MIT Senseable City Lab. Treepedia aimed to raise a proactive awareness of urban vegetation improvement, using computer vision techniques applied to Google Street View images. Treepedia measured and mapped the amount of vegetation cover along a city's streets by computing the Green View Index (GVI) on Google Street View (GSV) panoramas. Their method considered the obstruction of tree canopies and classified the images accordingly.
We plan to either collect imagery ourselves and use it in a local-only workflow, or we can leverage the crowd-sourced, openly licensed imagery uploaded to Mapillary. This will give us greater control over the recency of the images and the geographic coverage of the images, and it may lower costs.
Satellite imagery is a well-known source of data, but launching a satellite is a costly endeavor and there are limits to what we can interpret from it. SLI provides an accessible way to gather images from a different perspective with exciting implications for humanitarians.
- Effective hardware for SLI is no longer limited to vehicle-top rigs costing tens of thousands of dollars like those used by Google for their Street View service. There are a variety of options for high quality digital action cameras at a reasonable price that are durable, rugged, can capture at a frequent interval, and can automatically tag images with GPS coordinates.
- There are more powerful computing resources, better data storage and transfer options, and innovations in algorithms and machine learning tools are making it easier and more accessible to quickly collect data, process large amounts of data, and automate extraction of insights.
- There is a growing ecosystem of open tools and open data for SLI that organizations can use, build on, and contribute to.
This project is a collaboration with Civic Tech DC.
If you are interested in joining the project, please check out CONTRIBUTING.md
.
Note
This project has a Makefile
with some convenience commands that came be invoked like make <command name>
, e.g.. make requirements
. You can either use them or do things yourself manually.
- Create a Python virtual environment.
- You can use the shortcut command
make create_environment
. - The log will tell you how to activate the environment. Do so with:
source .venv/bin/activate
- You can use the shortcut command
- Install the project and its requirements.
pip install -e .
- You can use the shortcut command
make requirements
to do the same thing.
- You can use the shortcut command
- Put your raw OpenStreetMaps road vector data in
data/raw
.- Your raw data should be geospatial vector features of type
LineString
. The features must include standard OpenStreetMap keysosm_id
andhighway
. - For example, download
Three_Rivers_Michigan_USA_line.zip
todata/raw/Three_Rivers_Michigan_USA_line.zip
. Note that this Google Drive link is only accessible to approved project members.
- Your raw data should be geospatial vector features of type
- Make a copy of the
.env.example
file, removing the.example
from the end of the filename.- To download images from Mapillary you will need to create a (free) account and replace
MY_MAPILLARY_CLIENT_TOKEN
in the.env
file with your own token. See the "Setting up API access and obtaining a client token" section on this Mapillary help page. You only need to enable READ access scope on your token.
- To download images from Mapillary you will need to create a (free) account and replace
The first step is to sample points along the roads in your provided data. You can use the create_points.py
script to sample these points and write them out to a new file. This script filters out certain types of highways, and then samples points along each remaining road. By default, the sampling distance is 20 meters.
For example, if you're using the Three_Rivers_Michigan_USA_line.zip
data mentioned in the "Setup" section above:
python -m src.create_points data/raw/Three_Rivers_Michigan_USA_line.zip data/interim/Three_Rivers_Michigan_USA_points.gpkg
This will write out a zipped shapefile containing the sampled points to data/interim/Three_Rivers_Michigan_USA_points.gpkg
. The input and output formats can be any vector-based spatial data format supported by geopandas, such as shapefiles, GeoJSON, and GeoPackage. The output format is automatically inferred from the file extension.
For additional documentation on how to use create_points.py
, you can run:
python -m src.create_points --help
Both the input files and output files support any file formats that geopandas supports, so long as it can correctly infer the format from the file extension. See the geopandas documentation for more details.
We want a 360 image for each of the sampled points. There is more than option for the imagery source, but you have to choose one option. You cannot use multiple sources (at least at this time). You can use the assign_images.py
script to find the closest image to each point and generate a new file with the data included. The output will have _images
appended to the filename.
For example, if you're continuing from the example in previous steps and already generated a Three_Rivers_Michigan_USA_points.gpkg
file:
python -m src.assign_images data/interim/Three_Rivers_Michigan_USA_points.gpkg MAPILLARY data/raw/images/Three_Rivers_Michigan_USA/ data/interim/Three_Rivers_Michigan_USA_points_images.gpkg
Now that we have a point feature for each image, we want to calculate a Green View
Index (GVI) score for each image and assign that score to the relevant point feature.
We can use the assign_gvi_to_points.py
script
for this.
For more information on how GVI is calculated, see Li et al. (2015), Who lives in greener neighborhoods?, Urban Forestry & Urban Greening 14, pp.751--759.
This example follows from the files and directories created in previous steps and saves an output to a new file.
python -m src.assign_gvi_to_points data/raw/mapillary data/interim/Three_Rivers_Michigan_USA_points_images.gpkg data/processed/Three_Rivers_GVI.gpkg
![NOTE] Support for config files is a work in progress. We will add config file support progressively to the pipeline steps. See Issue #38 for progress.
All command-line options for the pipeline CLI steps can also be provided in a TOML-format configuration file. An example config file can be found in configs/example.toml
.
To use a config file, you can pass a config file using the --config
option flag. For example, if running create_points
, you can do:
python -m src.create_points \
data/raw/Three_Rivers_Michigan_USA_line.zip \
data/interim/Three_Rivers_Michigan_USA_points.gpkg \
--config configs/example.toml
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml <- Single source of truth for dependencies, build system, etc
└── src <- Source code for use in this project.
└── __init__.py <- Makes src a Python module
└── create_points.py <- Creates a list of points along the roads of an area
└── assign_images.py <- Matches images to the list of points (downloading or from local)
└── assign_gvi_to_points.py <- Calculates a Green View Index (GVI) from the images
Project structure based on the cookiecutter data science project template. #cookiecutterdatascience