Roadmap

GeoTrellis Project Roadmap

GeoTrellis Project Roadmap

This document is in progress while this note appears. Release schedule may change following discussion and estimates.

Overall Objectives

Enabling processing of large scale geo-spatial data
Support ad-hoc analytic workflows
Provide clear on-boarding and project architecture documentation
Enable machine learning workflows on satellite imagery.

Focus on ad hoc workflows

There are two types of potential users for GeoTrellis: (1) those focused on application development and (2) those focused on data analysis.

Product Developers

Product developers create an application/system with a GIS component. They are interested in a modular and stable API, key features that solve the "hard" problem. This has been the primary focus for GeoTrellis development leading up to the 1.0.0 release.
Data Scientist

Data scientist is interested in extracting information from combining multiple datasets through ad-hoc analysis. Much of the effort to hit this use case has been through GeoPySpark but it relates to core goals of GeoTrellis.
1. Data science focus reaches a wider audience than Spark/Scala application developers.
2. Ad-hoc analysis puts more pressure on maturity and composability of the API, aiding all users.
3. Ad-hoc analysis more often deals with data that is heterogeneous in projection and resolution, exposing more performance problems.
4. Many important social questions, like measuring deforestation, impact of climate change on given area or industry are best handled through ad-hoc analysis.

Release Schedule

Objective: Release a version at the end of every quarter

The development on 2.0 features will start before 1.2 release. The workflow will be to bump head to 2.0 as soon as first 2.0 feature goes in and back-merge 1.2 PRs into a release branch as they are implemented.

2017 July - Sept (GeoTrellis 1.2)

The main focus is to support data science use case for GeoTrellis where the driving use case is GeoPySpark. This results in focus on some key new features and optimizations of central operations.

New Operations
- Euclidean Distance
- Viewshed
- RDD Rasterization
Problems
- GeoTiff band interleave streaming
geotrellis-spark-sql
- Spark DataFrame Support
- SparkSQL Support
- SparkML Integration
Layer IO SPI

Decouple different back-ends through use of Java Service Provider Interface to load:
- AttributeStore
- LayerReader
- LayerWriter
The interface should be based on producing these classes from URI which fully configures them.
TileView

Accumulates transformations on tile which avoid intermediate allocation when they are again transformed into Tile.
1. Facets
  - Local
  - Focal: how do we cursor ?
    - Benchmark focal view vs focal tile operation.
    - Check if cursor is compatible with random strategy.
  - Resample
  - Reproject
  What can be done for focal? Current focal methods are stateful in order to optimize overall transformation. We can't translate that because the call to facal is going to be obscured behind produced view. This implies that it has to be available for random access.
2. Opens
  - Lower memory footprint for Tile transformations
  - Lower memory footprint for RDD transformations
  - Map Algebra RPC
Optimize central operations

Optimizations here will have positive impact on applications across the board.
- Pyramid: in a single shuffle step
- Reproject: avoid collect metadata after reprojection
GeoTiff Layers: Support for storing GeoTrellis layers as and reading directly from GeoTiffs
- Includes support for Cloud Optimized GeoTiffs
Batch Pipeline

https://github.com/locationtech/geotrellis/blob/master/docs/architecture/005-etl-pipeline.rst
Vector Tile updates

Bring vector tile class up to date with vector pipe development. Provide and document API for simple Vector/VectorTile/Raster workflows.
LiDAR Support

Read and sort points from `.laz` files into Hadoop friendly formats. Generate elevation rasters from lidar point clouds through either IDW or Delaunay triangulation. Will result in `geotrellis.pdal` subproject.

TensorFlow Integration

Use TensorFlow model to label a raster layer.

2017 Oct - Dec (GeoTrellis 2.0)

This release will focus on addressing API issues that we and our users have hit through usage of GeoTrellis. Some new abstractions will be introduces to unify the multiple context of performing the same operation.

Cloud optimized GeoTiff Layer

Ability to save GeoTrellis layers as a set of GeoTiffs. Each GeoTiff would a meta-tile and provide a segment layout optimized for TMS fetches.

Objectives:
- reduce friction between GeoTrellis and other GIS tools
- provide meta-tile support
- enable band sub-setting layer reads
Spatial Indexing
- SFCurve Dependency
- Temporal Binning
Machine Learning
- Converting spatio-temporal imagery into training sets
- Pattern of Life
- ML Model Application
MAML RPC

Support a project that creates TMS endpoints from MAML definitions
Cross-Resolution Raster Operations

Ability to predictably perform operations on non-aligned rasters with pixels. AKA: Map Algebra over rasters.

The product must have specified resolution => resample input to that resolution. Intersecting these rasters requires a spatial-join. Problem:
- tiles too big
- tiles intersect wrong
- how do you collect neighbors on non-tiled rasters?
- need some use-cases
1. Affected
  
  raster1 raster2 raster1 raster2 raster1 raster2
2. Requires
  - NoData Semantics
Lazy Layer IO…

Reading rasters as metadata-first would give a chance to filter and join the future rasters before they're read fully. This is useful for both Ingest and reading layers through LayerReader
Unified MapAlgebra API
- TileLike: Abstracts over Tile, MultibandTile, TileView
- LayerLike: Abstracts over RDD[(K,V)], Seq[(K,V)], Map[K, V]
Separation of spark/collections API

There should be a way to perform collections operations without bringing in expansive `spark-core` dependency. This however will require moving some of the utility classes that describe tile layers but do not require spark outside of spark package.
NoData Semantics

Parameteraize this behariov: 1 + ND = 1 or 1 + ND = ND

Future Releases

OGC Standards
- GeoServer Plugin for GeoTrellis raster layers
Further LIDAR work

API Refactor Candidates

LayerReader/Writer
- Relies only on Avro
- Relies on SprayJson
- No segmented reads
No abstraction between Tile/MultibandTile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly