Replies: 1 comment
-
|
Update — June 2026 A few things have changed since posting this, worth flagging for anyone following the thread. The provider now has its own repo: Moved it out of the Waystones monorepo so it's easier to find, reference, and contribute to independently of everything else we're running. Note on the demo: The demo URL is the same but no longer runs this provider. After running the pygeoapi stack in production we found cold start and response latency on remote object storage was the main constraint for our use case, so we ended up building a standalone Go OAPIF server (oapif-go) specifically for GeoParquet on object storage. The demo now runs on that. It's not a pygeoapi replacement — it has none of pygeoapi's provider breadth — but for the narrow GeoParquet-on-R2 path it gets cold start under 300ms. The DuckDB provider here remains the right approach for anyone who wants to stay on pygeoapi. The two questions from the original post still stand — happy to write a proper PR if there's appetite, or accept that a community plugin listing is the better fit. Either way the code is now in a dedicated repo and will be maintained there. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been running a pygeoapi provider for GeoParquet files on remote object storage (Cloudflare R2) in production for a few months and wanted to gauge interest in upstreaming it before writing a formal PR.
The gap
The existing OGR provider lacks native Parquet predicate pushdown — bbox and pagination don't map to row-group pruning, which means unnecessary data transfer on remote storage. This provider uses DuckDB + httpfs to push LIMIT/OFFSET and bbox predicates directly into the Parquet scan, so only relevant row groups are fetched.
What it does
s3://(any S3-compatible endpoint) andhttps://(public CDN)ST_Transforminside the DuckDB query — same pass as geometry fetch and GeoJSON serialization, no per-feature Python roundtripsDemo and code
Live demo: https://demo.waystones.cloud
The demo runs on an on-demand Cloudflare Container with no persistent state — the first API request warms up the DuckDB worker, after which requests are fast.
Implementation: https://github.com/waystones-nexus/pygeoapi-duckdb-geoparquet
I'm also presenting on this stack at FOSS4G Hiroshima in September, which feels like a natural moment to point people toward an upstream contribution if there is one.
Two questions before I write a PR
duckdbacceptable as an optional dependency alongside the existing optional deps?Happy to discuss. The main design decision worth flagging upfront is the shared connection singleton — correct for Gunicorn's pre-fork model but worth documenting carefully for other deployment scenarios.
Beta Was this translation helpful? Give feedback.
All reactions