Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cm/dev #5

Merged
merged 19 commits into from
Aug 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Files
**/*/*.ipynb
config/clipImage.yaml
config/logReg.yaml
scripts/functions/merge_gt.py

# Folders
data/
Expand Down
118 changes: 76 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Green roofs: automatic detection of roof vegetation, vegetation type and covered surface
# Green roofs: automatic detection of roof vegetation <!--, vegetation type and covered surface-->

This project provides a suite of Python scripts allowing the end-user to use machine learning to detect green roofs on land survey building footprint based on orthophotos.

## Hardware requirements

No specific requirements.

## Software Requirements
## Software requirements

* Python 3.9: The dependencies may be installed with either `pip` or `conda`, by making use of the provided `requirements.txt` file. The following method was tested successfully on a Windows system:

Expand All @@ -22,88 +22,122 @@ No specific requirements.
├── config # config files
├── data # data to process, see addendum
├───scripts
│ │ calculate_raster.py # compute the NDVI and luminosity raster of the orthoimage tiles
│ | clip_image.py # clip the orthoimages for the aoi extent.
│ | greenery.py # main workflow, greenery detection and logistic regression
│ | log_reg.py # workflow for logistic regression only (ex. from intermediate results of greenery.py)
│ | roof_stats.py # scripts to study and prepare the ground truth layer
│ │ calculate_raster.py # computes the NDVI and luminosity rasters of the orthoimage tiles
│ | clip_image.py # clips the orthoimages for the aoi extent
│ | greenery.py # potential greenery detection by applying threshold on NDVI and luminosity rasters
│ | infere_ml.py # infers with the trained machine learning algorithms
│ | train_ml.py # trains and tests machine learning algorithms (logistic regression or random forest)
│ | roof_stats.py # computes the descriptors for the machine learning algorithms
│ |
│ └───functions # set of functions used in python scripts
│ └───functions # set of functions used in Python scripts
└── setup # requirements for environment installation
```

## Scripts and Procedure
## Scripts and procedure

The following abbreviations are used:

* **AOI**: area of interest

* **GT**: ground truth

* **LR**: logistic regression
* **RF**: random forest

Scripts are run in combination with hard-coded configuration files in the following order:

1. `clip_image.py`
2. `calculate_raster.py`
3. `roof_stats.py`
4. `greenery.py`
5. `log_reg.py`
3. `greenery.py`
4. `roof_stats.py`
5. `train_ml.py`
6. `infere_ml.py`

### Input data

## Data preparation
1. `clip_image.py`: The goal of this script is to clip images with a AOI vector layer. In a first step, the AOI is buffered by 50 m. This vector layer is then used as an input to clip aerial imagery data.
* Use clip_image.yaml to specify the inputs data.
2. `calculate_raster.py`: compute NDVI and luminosity rasters. Watch out for the right band in functions `calculate_ndvi` and `calculate_luminosity`.
* Use logReg.yaml to specify the inputs data.
3. `roof_stats.py`: compute statistics of NDVI and luminostiy values per roofs to help define thresholds. Split the roofs into a training and a test dataset.
* Use logReg.yaml to specify the inputs data.
* Please verifiy that the join option ("predicate") in [`functions/fct_misc.py`](./scripts/functions/fct_misc.py) in Line 83 is "within".
#### Ground truth

## Logistic regression approach
The logistic regression approach was developed inspired by Louis-Lucas et al (1) and implemented for the specific project in `functions/fct_misc.py`.
The ground truth consists of a vector layer with the geometry of buildings from the land survey. Each building has a unique identifier, a label `green_tag` "green or not" and a class of vegetation type `green_cls` : bare (b), terrace (t), spontaneous (s), extensive (e), lawn (l) or intensive (i).

4. `greenery.py`: identify greenery on roofs based on NDVI values and luminosity to make a selection of roofs before training a logistic regression.
* Use logReg.yaml to specify the inputs data.
#### Images

5. `log_reg`: focuses on the logistic regression part of the pipeline.
* Use logReg.yaml to specify the inputs data.
Images should be NRGB. If the band order is different, please edit `calculate_raster.py` and adjust the band ordering in `roof_stats.py`.


## Data preparation
1. `clip_image.py`: The goal of this script is to clip images with a AOI vector layer. In a first step, the AOI is buffered by 50 m. This vector layer is then used as an input to clip aerial imagery data.
* Use `clip_image.yaml` to specify the inputs data.
2. `calculate_raster.py`: computes NDVI and luminosity rasters. Watch out for the right band numbering in functions `calculate_ndvi` and `calculate_luminosity`.
* Use `logReg.yaml` to specify the inputs and outputs directories.
* ortho_directory
* ndvi_directory
* lum_directory

Steps (3) and (4) are about preparing the descriptors for the ML algorithms. `greenery.py` produces a polygon vector layer of potential greenery on roofs based on NDVI and luminosity values, and computes potential greenery ratio per roofs. This script is optional, because one may want to compute descriptors in (4) on the entire roofs and not on the potential green parts of the roofs.
* Use `logReg.yaml` to specify the common inputs data to `greenery.py`and to `roofs_stats.py`.
* tile_delimitation
* gt
* green_tag
* green_cls
* chm_layer
* results_directory
* egid_train_test
* th_ndvi
* th_lum
* epsg
3. `greenery.py`: produces a polygon vector layer of potential greenery on roofs based on NDVI and luminosity values, and computes potential greenery ratio per roofs. This script is optional. One may want to compute descriptors on the entire roof and not on the potential green parts of the roofs.
* Use `logReg.yaml` to specify the inputs data.
* hydra:run:dir
* roofs_file
* roofs_layer

4. `roof_stats.py`: computes statistics of NDVI and luminosity values per roofs. Splits the roofs into a training and a test dataset.
* Use`logReg.yaml` to specify the inputs data.
* roofs_file
* roofs_layer


## Machine learning
The machine learning approach was inspired by Louis-Lucas et al. (1) and adapted for the specificity of the project. In between, the machine learning algorithms and the descriptors used became rather different.
* Use `logReg.yaml` to specify the common parameters of the model to train and infer with in `train_ml.py` and in `infer_ml.py` respectively.

* cls_ml
* model_ml
* trained_model_dir

1. `train_ml.py`: trains a logistic regression or a random forest and evaluates the trained algorithm on a test dataset.
* Use `logReg.yaml` to specify the inputs data.
* roofs_file
* roofs_layer
2. `infer_ml.py`: infers for descriptors computed with `roof_stats.py`.
* Use `logReg.yaml` to specify the inputs data.
* roofs_file
* roofs_layer

## Addendum

### Documentation
The full documentation of the project is available on the [STDL's technical website](https://tech.stdl.ch/PROJ-VEGROOFS/).

### Data

#### Ground truth

The ground truth consists of ...
* Labelling of ground truth by the beneficiaries (Februar 2024)


#### Folder structure
### Folder structure
Here following a proposition of data structure.

```
├── data # dataset folder
├── 01_initial # initial data
├── AOI # AOI shape file
├── ground_truth # ground truth shape file
└── scratch # aerial images
├── aoi # AOI shape file
├── gt # ground truth shape file
└── images # aerial images
├── extent # tile extent computed at the beginning of the workflow
└── tiles # image tiles
├── 02_intermediate # intermediate results and processed data
├── th # hydra documentation of values tested for the thresholds.
├── th # hydra timestamp folders for the tested thresholds.
└── images
├── tiles # clipped images
├── extent # clipped tile extent
├── luminosity # luminosity tiles computed from NirRGB tiles
└── ndvi # NDVI tiles computed from NirRGB tiles
└── 03_results # results of the workflows (training and test partition)
└── scratch # roof stats, boxplots
└── image_gt # roof stats, boxplots, machine learning outputs on GT
└── image_inf # roof stats and machine learning outputs for inferences
```

### References
Expand Down
9 changes: 4 additions & 5 deletions config/clipImage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,10 @@
clip_image:
working_directory: C:/Users/cmarmy/Documents/STDL/proj-vegroofs/data
inputs:
ortho_directory: 01_initial/images/rs_tlm/ge/tiles/
aoi: 01_initial/aoi/STDL_GE_AOI.shp
ortho_directory: 01_initial/images/infer_moitie/tiles/
aoi: 01_initial/images/infer_moitie/extent/extent.shp
epsg: 'epsg:2056'
predicate_sjoin: 'intersects'
outputs:
clip_ortho_directory: 02_intermediate/images/rs_tlm/tiles
extent_ortho_directory: 01_initial/images/rs_tlm/extent
clip_ortho_directory: 02_intermediate/images/infer_moitie/tiles
extent_ortho_directory: 01_initial/images/infer_moitie/extent

11 changes: 5 additions & 6 deletions config/clipImage_dummy.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
clip_image:
working_directory: /proj-vegroofs/data
inputs:
ortho_directory: 01_initial/images/rs_tlm/tiles/
aoi: 01_initial/aoi/STDL_ZH_AOI.shp
epsg: 'epsg:2056'
predicate_sjoin: 'intersects'
ortho_directory: 01_initial/images/tiles/ # directory of the original images
aoi: 01_initial/aoi/aoi.shp # area of interest to cut the images with
epsg: 'epsg:2056' # EPSG of the project
outputs:
clip_ortho_directory: 02_intermediate/images/rs_tlm/tiles
extent_ortho_directory: 01_initial/images/rs_tlm/extent
clip_ortho_directory: 02_intermediate/images/tiles # directory for the clipped images in ouptut
extent_ortho_directory: 01_initial/images/extent # directory for the computed extent of the original images

51 changes: 18 additions & 33 deletions config/logReg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,26 @@ hydra:
run:
dir: 02_intermediate/th/${now:%Y-%m-%d}/${now:%H-%M-%S}

# dev:
# working_directory: C:/Users/cmarmy/Documents/STDL/proj-vegroofs/data_test
# ortho_directory: 02_intermediate/images/rs_tlm/tiles
# tile_delimitation: 02_intermediate/images/rs_tlm/extent/
# ndvi_directory: 02_intermediate/images/rs_tlm/ndvi
# lum_directory: 02_intermediate/images/rs_tlm/lum
# results_directory: 03_results/rs_tlm
# roofs_gt: 01_initial/gt/gt_test.shp
# roofs_layer: gt_zh
# roofs_lr: 02_intermediate/th/2024-06-14/10-10-21/0_765_gt_tot_green.shp
# chm_layer: 02_intermediate/autres/CHM_AOI.gpkg
# th_ndvi: 0 # no thresholding -1
# th_lum: 765 # no thresholding 765
# egid_train_test: egid_train_test_gt.csv
# predicate_sjoin: 'intersects'
# cls_lr: 'multi' # 'multi' 'multi_aggreg'
# model_ml: 'LR' # 'LR' 'RF'
# epsg: 'epsg:2056'

prod:
dev:
working_directory: C:/Users/cmarmy/Documents/STDL/proj-vegroofs/data
ortho_directory: 02_intermediate/images/rs_tlm/tiles
tile_delimitation: 02_intermediate/images/rs_tlm/extent/
ndvi_directory: 02_intermediate/images/rs_tlm/ndvi
lum_directory: 02_intermediate/images/rs_tlm/lum
roofs_gt: 02_intermediate/gt/gt_tot.gpkg # 02_intermediate/autres/swissbuilding_aoi.gpkg
roofs_layer: joined_layer
chm_layer: 02_intermediate/autres/CHM_AOI.gpkg
results_directory: 03_results/rs_tlm_sb_noratio/
ortho_directory: 02_intermediate/images/infer_moitie/tiles
tile_delimitation: 02_intermediate/images/infer_moitie/extent/
ndvi_directory: 02_intermediate/images/infer_moitie/ndvi
lum_directory: 02_intermediate/images/infer_moitie/lum
roofs_file: 02_intermediate/th/2024-08-15/09-12-47/0_500_green_roofs.shp # 02_intermediate/gt/inf_roofs.gpkg #
roofs_layer:
gt: False
green_tag: 'veg_new_3'
green_cls: 'class_3'
chm_layer: 02_intermediate/autres/CHM_AOI_inf.gpkg
results_directory: 03_results/infer_moitie/
egid_train_test: egid_train_test_gt.csv
roofs_lr: 02_intermediate/autres/swissbuilding_aoi.gpkg # 02_intermediate/gt/gt_tot.gpkg #02_intermediate/th/2024-06-14/15-02-41/0_765_gt_tot_green.shp
predicate_sjoin: 'within'
th_ndvi: -1 # no thresholding -1
th_lum: 210000 # no thresholding 765 or 210000
cls_lr: 'multi' # 'binary' 'multi' 'multi_aggreg'
th_ndvi: 0 # no thresholding -1
th_lum: 500 # no thresholding 765 or 210000
cls_ml: 'binary' # 'binary' 'multi' 'multi_aggreg'
model_ml: 'LR' # 'LR' 'RF'
trained_model_dir: 03_results/scratch_gt/
epsg: 'epsg:2056'



39 changes: 20 additions & 19 deletions config/logReg_dummy.yaml
Original file line number Diff line number Diff line change
@@ -1,25 +1,26 @@
hydra:
run:
dir: 02_intermediate/th/${now:%Y-%m-%d}/${now:%H-%M-%S}
dir: 02_intermediate/th/training_or_inference_outputs/${now:%Y-%m-%d}/${now:%H-%M-%S} # output directory for potential greenery detection (vector)

prod:
prod:
working_directory: /proj-vegroofs/data
ortho_directory: 02_intermediate/images/scratch/tiles
tile_delimitation: 02_intermediate/images/scratch/extent/
ndvi_directory: 02_intermediate/images/scratch/ndvi
lum_directory: 02_intermediate/images/scratch/lum
roofs_gt: 02_intermediate/gt/gt_tot.gpkg
roofs_layer: gt_to_label
chm_layer: 02_intermediate/autres/CHM_AOI.gpkg
results_directory: 03_results/scratch_gt/
egid_train_test: egid_train_test_gt.csv
roofs_lr: 02_intermediate/th/2024-06-14/10-10-21/0_765_gt_tot_green.shp
predicate_sjoin: 'within'
th_ndvi: 0 # no thresholding -1
th_lum: 765 # no thresholding 765
cls_lr: 'multi' # 'multi' 'multi_aggreg'
model_ml: 'LR' # 'LR' 'RF'
epsg: 'epsg:2056'
working_directory: /proj-vegroofs/data
ortho_directory: 02_intermediate/images/tiles # directory of the clipped images
tile_delimitation: 02_intermediate/images/extent/ # directory for the computed extent of the clipped images
ndvi_directory: 02_intermediate/images/ndvi # directory for the NDVI rasters in ouptut
lum_directory: 02_intermediate/images/lum # directory for the luminosity rasters in ouptut
roofs_file: 02_intermediate/gt/gt_tot.gpkg # gt building vector layer or `*green_roofs.gpkg` layer from `greenery.py`
roofs_layer: # if roofs_file is in the GPKG format and contain several layers
gt: True # True (when training/testing with GT) or False (when infering)
green_tag: 'veg_new' # attribute field for "green or not" in roofs_file
green_cls: 'class' # attribute field for vegetation classes in roofs_file
chm_layer: 02_intermediate/autres/CHM_AOI.gpkg # canopy height vector layer for masking of overhanging vegetation
results_directory: 03_results/training_or_inference_outputs/ # directory for stats and machine learning ouptuts
egid_train_test: egid_train_test_gt.csv # CSV with split of the GT in train and test datasets
th_ndvi: 0 # no thresholding: -1. For greenery.py: e.g. 0.
th_lum: 765 # no thresholding: 765 (8-bit), 21000 (16-bit). For greenery.py: e.g. 500 (8-bit), 13725 (16-bit).
cls_ml: 'binary' # choice of classification scheme: 'binary' or 'multi'
model_ml: 'RF' # choice of algorithms: 'LR'or 'RF'
trained_model_dir: 03_results/training_outputs/ # directory where to save the trained model for reuse
epsg: 'epsg:2056' # EPSG of the project


4 changes: 2 additions & 2 deletions scripts/calculate_raster.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def calculate_lum(tile, band_nbr_red=1, band_nbr_green=2, band_nbr_blue=3, path=

# load input parameters
with open(args.config_file) as fp:
cfg = yaml.load(fp, Loader=yaml.FullLoader)['prod']
cfg = yaml.load(fp, Loader=yaml.FullLoader)['dev']

logger.info('Defining constants...')

Expand Down Expand Up @@ -115,4 +115,4 @@ def calculate_lum(tile, band_nbr_red=1, band_nbr_green=2, band_nbr_blue=3, path=
lum_tile_path=os.path.join(LUM_DIR, tile.split('/')[-1].replace('.tif', '_lum.tif'))
_ = calculate_lum(tile, path=lum_tile_path)

logger.success(f'The files were written in the folder {NDVI_DIR} and {LUM_DIR}.')
logger.success(f'The files were written in the folders {NDVI_DIR} and {LUM_DIR}.')
8 changes: 3 additions & 5 deletions scripts/clip_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@
INPUTS=cfg['inputs']
ORTHO_DIR=INPUTS['ortho_directory']
AOI=INPUTS['aoi']
PREDICATE=INPUTS['predicate_sjoin']
EPSG=INPUTS['epsg']

OUTPUTS=cfg['outputs']
Expand All @@ -48,6 +47,8 @@
os.chdir(WORKING_DIR)
fct_misc.ensure_dir_exists(OUTPUT_DIR)

fct_misc.generate_extent(ORTHO_DIR, TILE_DELIMITATION, EPSG)
tiles=gpd.read_file(TILE_DELIMITATION)

logger.info('Reading AOI geometries...')

Expand All @@ -59,10 +60,7 @@
row = row.copy()
aoi.loc[index, 'geometry'] = row.geometry.buffer(50,join_style=2)

fct_misc.generate_extent(ORTHO_DIR, TILE_DELIMITATION, EPSG)
tiles=gpd.read_file(TILE_DELIMITATION)

aoi_clipped=fct_misc.clip_labels(labels_gdf=aoi, tiles_gdf=tiles, predicate_sjoin=PREDICATE)
aoi_clipped=fct_misc.clip_labels(labels_gdf=aoi, tiles_gdf=tiles, predicate_sjoin='intersects')
aoi_clipped=aoi_clipped.reset_index(drop=True)

i=1
Expand Down
Loading
Loading