Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/more misc issues #82

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from
Draft

Conversation

herbiebradley
Copy link
Member

@herbiebradley herbiebradley commented Jul 15, 2022

This PR is for general improvements to GeoGraph necessary to run our case studies. So far, this PR contains code to improve the loading speed for all geographs and updates the pre-commit configuration file.

The loading speed improvement comes from two sources:

  1. Detecting if PyGEOS is installed and doing bulk queries of the spatial index accordingly. However, this has the drawback of causing a GDAL conflict with Shapely which slows down stuff like habitat calculations. Therefore I am not setting PyGEOS as a package requirement, and I simply provided a branch in the loading code if PyGEOS is installed. Fortunately, PyGEOS will very soon be integrated into Shapely in Shapely 2.0, which should give significant (probably >50%) reductions in loading time and significant benefits to other calculations.
  2. If PyGEOS is not installed, we attain around a 20% reduction in loading time by simply removing unnecessary node attributes which took some time to calculate and were rarely used.

I investigated the main bottlenecks in the most common graph operations, and despite guessing that the networkx graph library would be a potential source, I concluded that almost all of the code is bottlenecked by polygon and spatial index operations. Further speedups can mostly be gained from vectorising polygon operations (e.g. with PyGEOS), speed improvements in the underlying libraries like GDAL, and algorithmic improvements.

I also noticed significant performance improvements in all functions (around 20-30% reduction in computation time) from upgrading to Python 3.10 and the latest versions of rasterio, fiona, Shapely, and geopandas (mostly thanks to performance improvements in the underlying GDAL) - but the requirements file will be sorted out in a separate PR.

TODOs:

  • Add more metrics, such as integral index of connectivity, so that we can compare with other papers.
  • Speed up calculation of the average component isolation metric so that it can run in decent time on our case studies.

@herbiebradley herbiebradley added kind: enhancement New feature or request kind: feature kind: performance performance improvement labels Jul 15, 2022
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement New feature or request kind: feature kind: performance performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants