Open
Description
I have R versions of the cell logic from raster,terra in vaster and similar for tiling in grout, both unpolished:
https://github.com/hypertidy/vaster
https://github.com/hypertidy/grout
I think these make sense on their own, grid logic independent of any file or data handling and I'd like to see (or build myself) python and other lang versions. I'm also interested to contribute some of this to GDAL itself, there's at least a few cases I would use it for features in the lib-apps I want, but that needs a broader review atm.
- what is the minimal or sensible set of functionsfuns are of dim, extent or a combination, should also have objects that provide methods (or closures that record a dim+extent)should funs be vectorized, i.e.multiple sets of dim+extentwhat index conversions needed for tiles-as-children rasters, netcdf vs gdal indexes etcgrid alignment, compare gdal projwin to raster snap in,out,near - and ability for gdalwarp to act as RasterIO with a snap option
These are just brain dump ideas atm for things I've been doing in R and want more broadly in gdal and elsewhere 🙏
Activity
paleolimbot commentedon Aug 31, 2022
Keep dumping here! I had a bunch scraped out here, too: https://github.com/paleolimbot/grd/blob/master/R/cell.R
mdsumner commentedon Sep 1, 2022
I didn't forget about those ... not entirely anyway! But, it's been very illuminating to strip down to just the bare essentials, and then see that some functions are only a function of dimension, some are of extent and dimension, and some compare extents (basically the snap stuff).
I see you have pretty serious snap options in grd ... I'm resistant to having an object that is also for data and vis for this functionality (a 0-dim array is a nice trick but makes me uncomfortable) - which is why I didn't just run with grd ... but, I'm also drawn to having an OOP solution - I guess there are at root functions for grid logic, and then there's a heirarchy of tools -objects that do variously
I'm fleshing out my interactions with this logic as I slowly become independent from raster - recently I wrote
raster::trim()
from scratch, just to see what the logic is like - and like many vis and extraction and reprojection tasks for a given map, there's very often a back-and-forth, get enough data to find the "nearblack" margin, then apply that to a warp-streamed subset read. (That's a data-dependent task though, and perhaps better done by gdal with nearblack anyway - some of these things I've been thinking of a GDAL-api hooks that don't exist yet and I could write).I'm interested very much in getting this family of grid logic that's entirely independent of data - things like polygon extractions from netcdf time series, what you really want is the 2D cell index of those polygons, then batch those into netcdf chunks - and the key idea here is that the indexing logic and query plan is entirely independent of the actual data source. I'm low level fleshing this out with a colleague in the climate model space, and he has very large workflows of interest, it's not just me and my tools ;)
mdsumner commentedon Sep 1, 2022
and like, GDAL is crasy fast to rasterize polygons, as is {fasterize} - but I don't want a polygon-value burned tif as output, I want a table of cell index and polygon ID that I use for this plan-query batching - and for that I need index-converters from global cell (extent+dimension) to chunk cell (tiled arithmetic converts a global cell to a chunk-in-memory index).
more thoughts than code atm, but I have a lot of these pieces around :)
mdsumner commentedon Sep 4, 2022
at some point I'll fold in the logic for netcdf from tidync, and flesh out the translators I've been talking about, and then explore what's needed for a proper api vs just R funs
paleolimbot commentedon Sep 5, 2022
I made a place for "cell logic" for you to get started! PR into https://github.com/paleolimbot/geoarrow-cpp/blob/main/src/geoarrow/index_math.hpp (and make sure to add tests into https://github.com/paleolimbot/geoarrow-cpp/blob/main/src/geoarrow/index_math_test.cc !). If you're interested, I'm happy to set up a meeting to set up your VSCode to get started 😄
mdsumner commentedon Sep 5, 2022
👌
mdsumner commentedon Sep 7, 2022
I definitely need the hand-holding! I think it would be valuable :)
paleolimbot commentedon Sep 7, 2022
Let's do it! It's tough for me to meet outside 8am - 4pm America/Halifax because of the kids or we can work through it via Twitter message. The gist of it is: open up geoarrow-cpp in VSCode, install the CMake extension, then open the "command palette" (Control-Shift-P) and choose CMake: configure, then CMake: build, then Cmake: run tests.
mdsumner commentedon Sep 14, 2022
related pydata/xarray#5081
mdsumner commentedon Sep 18, 2022
just reading Danielle's blog with a couple of rasterization steps, we could use a sparse cell approach - not profound or anything but a clear example for some crossover discussion: https://blog.djnavarro.net/posts/2022-08-23_visualising-a-billion-rows/
mdsumner commentedon Sep 18, 2022
all the more reason for me to get these funs in here, I keep realising implications, and variations on the index conversions 🙏