-
Notifications
You must be signed in to change notification settings - Fork 363
Roadmap
This document is in progress while this note appears. Release schedule may change following discussion and estimates.
- Enabling processing of large scale geo-spatial data
- Support ad-hoc analytic workflows
- Provide clear on-boarding and project architecture documentation
- Enable machine learning workflows on satellite imagery.
There are two types of potential users for GeoTrellis: (1) those focused on application development and (2) those focused on data analysis.
-
Product Developers
Product developers create an application/system with a GIS component. They are interested in a modular and stable API, key features that solve the "hard" problem. This has been the primary focus for GeoTrellis development leading up to the 1.0.0 release.
-
Data Scientist
Data scientist is interested in extracting information from combining multiple datasets through ad-hoc analysis. Much of the effort to hit this use case has been through GeoPySpark but it relates to core goals of GeoTrellis.
- Data science focus reaches a wider audience than Spark/Scala application developers.
- Ad-hoc analysis puts more pressure on maturity and composability of the API, aiding all users.
- Ad-hoc analysis more often deals with data that is heterogeneous in projection and resolution, exposing more performance problems.
- Many important social questions, like measuring deforestation, impact of climate change on given area or industry are best handled through ad-hoc analysis.
Objective: Release a version at the end of every quarter
The development on 2.0 features will start before 1.2 release. The workflow will be to bump head to 2.0 as soon as first 2.0 feature goes in and back-merge 1.2 PRs into a release branch as they are implemented.
The main focus is to support data science use case for GeoTrellis where the driving use case is GeoPySpark. This results in focus on some key new features and optimizations of central operations.
-
New Operations
- Euclidean Distance
- Viewshed
- RDD Rasterization
-
Problems
- GeoTiff band interleave streaming
-
geotrellis-spark-sql
- Spark DataFrame Support
- SparkSQL Support
- SparkML Integration
-
Layer IO SPI
Decouple different back-ends through use of Java Service Provider Interface to load:
AttributeStore
LayerReader
LayerWriter
The interface should be based on producing these classes from
URI
which fully configures them. -
TileView
Accumulates transformations on tile which avoid intermediate allocation when they are again transformed into Tile.
-
Facets
- Local
- Focal: how do we cursor ?
- Benchmark focal view vs focal tile operation.
- Check if cursor is compatible with random strategy.
- Resample
- Reproject
What can be done for focal? Current focal methods are stateful in order to optimize overall transformation. We can't translate that because the call to facal is going to be obscured behind produced view. This implies that it has to be available for random access.
-
Opens
- Lower memory footprint for Tile transformations
- Lower memory footprint for RDD transformations
- Map Algebra RPC
-
-
Optimize central operations
Optimizations here will have positive impact on applications across the board.
- Pyramid: in a single shuffle step
- Reproject: avoid collect metadata after reprojection
-
GeoTiff Layers: Support for storing GeoTrellis layers as and reading directly from GeoTiffs
- Includes support for Cloud Optimized GeoTiffs
-
Batch Pipeline
https://github.com/locationtech/geotrellis/blob/master/docs/architecture/005-etl-pipeline.rst
-
Vector Tile updates
Bring vector tile class up to date with vector pipe development. Provide and document API for simple Vector/VectorTile/Raster workflows.
-
LiDAR Support
Read and sort points from `.laz` files into Hadoop friendly formats. Generate elevation rasters from lidar point clouds through either IDW or Delaunay triangulation. Will result in `geotrellis.pdal` subproject.
-
TensorFlow Integration
Use TensorFlow model to label a raster layer.
This release will focus on addressing API issues that we and our users have hit through usage of GeoTrellis. Some new abstractions will be introduces to unify the multiple context of performing the same operation.
-
Cloud optimized GeoTiff Layer
Ability to save GeoTrellis layers as a set of GeoTiffs. Each GeoTiff would a meta-tile and provide a segment layout optimized for TMS fetches.
Objectives:
- reduce friction between GeoTrellis and other GIS tools
- provide meta-tile support
- enable band sub-setting layer reads
-
Spatial Indexing
- SFCurve Dependency
- Temporal Binning
-
Machine Learning
- Converting spatio-temporal imagery into training sets
- Pattern of Life
- ML Model Application
-
MAML RPC
Support a project that creates TMS endpoints from MAML definitions
-
Cross-Resolution Raster Operations
Ability to predictably perform operations on non-aligned rasters with pixels. AKA: Map Algebra over rasters.
The product must have specified resolution => resample input to that resolution. Intersecting these rasters requires a spatial-join. Problem:
- tiles too big
- tiles intersect wrong
- how do you collect neighbors on non-tiled rasters?
- need some use-cases
-
Affected
raster1 raster2 raster1 raster2 raster1 raster2
-
Requires
- NoData Semantics
-
Lazy Layer IO…
Reading rasters as metadata-first would give a chance to filter and join the future rasters before they're read fully. This is useful for both Ingest and reading layers through
LayerReader
-
Unified MapAlgebra API
-
TileLike
: Abstracts overTile
,MultibandTile
,TileView
-
LayerLike
: Abstracts overRDD[(K,V)]
,Seq[(K,V)]
,Map[K, V]
-
-
Separation of spark/collections API
There should be a way to perform collections operations without bringing in expansive `spark-core` dependency. This however will require moving some of the utility classes that describe tile layers but do not require spark outside of
spark
package. -
NoData Semantics
Parameteraize this behariov:
1 + ND = 1
or1 + ND = ND
- OGC Standards
- GeoServer Plugin for GeoTrellis raster layers
- Further LIDAR work
- LayerReader/Writer
- Relies only on Avro
- Relies on SprayJson
- No segmented reads
- No abstraction between Tile/MultibandTile