Skip to content

discuss these cell logic functions  #1

Open
@mdsumner

Description

@mdsumner

I have R versions of the cell logic from raster,terra in vaster and similar for tiling in grout, both unpolished:

https://github.com/hypertidy/vaster

https://github.com/hypertidy/grout

I think these make sense on their own, grid logic independent of any file or data handling and I'd like to see (or build myself) python and other lang versions. I'm also interested to contribute some of this to GDAL itself, there's at least a few cases I would use it for features in the lib-apps I want, but that needs a broader review atm.

  • what is the minimal or sensible set of functions
    funs are of dim, extent or a combination, should also have objects that provide methods (or closures that record a dim+extent)
    should funs be vectorized, i.e.multiple sets of dim+extent
    what index conversions needed for tiles-as-children rasters, netcdf vs gdal indexes etc
    grid alignment, compare gdal projwin to raster snap in,out,near - and ability for gdalwarp to act as RasterIO with a snap option

These are just brain dump ideas atm for things I've been doing in R and want more broadly in gdal and elsewhere 🙏

geoarrow/geoarrow#24 (comment)

Activity

paleolimbot

paleolimbot commented on Aug 31, 2022

@paleolimbot
Owner

Keep dumping here! I had a bunch scraped out here, too: https://github.com/paleolimbot/grd/blob/master/R/cell.R

mdsumner

mdsumner commented on Sep 1, 2022

@mdsumner
Author

I didn't forget about those ... not entirely anyway! But, it's been very illuminating to strip down to just the bare essentials, and then see that some functions are only a function of dimension, some are of extent and dimension, and some compare extents (basically the snap stuff).

I see you have pretty serious snap options in grd ... I'm resistant to having an object that is also for data and vis for this functionality (a 0-dim array is a nice trick but makes me uncomfortable) - which is why I didn't just run with grd ... but, I'm also drawn to having an OOP solution - I guess there are at root functions for grid logic, and then there's a heirarchy of tools -objects that do variously

  • knows its raster-ness (extent + dimension)
  • knows only its alignment (the origin + resolution) - on reflection I guess that's what a (shear == 0) geotransform is ... hmm
  • knows only its dimension (a bare image, with a default extent - variously [0,1] or [0,dim] depending on context)
  • knows the above stuff and is ready to bare-metal read/vis/stream from sources that have these properties

I'm fleshing out my interactions with this logic as I slowly become independent from raster - recently I wrote raster::trim() from scratch, just to see what the logic is like - and like many vis and extraction and reprojection tasks for a given map, there's very often a back-and-forth, get enough data to find the "nearblack" margin, then apply that to a warp-streamed subset read. (That's a data-dependent task though, and perhaps better done by gdal with nearblack anyway - some of these things I've been thinking of a GDAL-api hooks that don't exist yet and I could write).

I'm interested very much in getting this family of grid logic that's entirely independent of data - things like polygon extractions from netcdf time series, what you really want is the 2D cell index of those polygons, then batch those into netcdf chunks - and the key idea here is that the indexing logic and query plan is entirely independent of the actual data source. I'm low level fleshing this out with a colleague in the climate model space, and he has very large workflows of interest, it's not just me and my tools ;)

mdsumner

mdsumner commented on Sep 1, 2022

@mdsumner
Author

and like, GDAL is crasy fast to rasterize polygons, as is {fasterize} - but I don't want a polygon-value burned tif as output, I want a table of cell index and polygon ID that I use for this plan-query batching - and for that I need index-converters from global cell (extent+dimension) to chunk cell (tiled arithmetic converts a global cell to a chunk-in-memory index).

more thoughts than code atm, but I have a lot of these pieces around :)

mdsumner

mdsumner commented on Sep 4, 2022

@mdsumner
Author

at some point I'll fold in the logic for netcdf from tidync, and flesh out the translators I've been talking about, and then explore what's needed for a proper api vs just R funs

paleolimbot

paleolimbot commented on Sep 5, 2022

@paleolimbot
Owner

I made a place for "cell logic" for you to get started! PR into https://github.com/paleolimbot/geoarrow-cpp/blob/main/src/geoarrow/index_math.hpp (and make sure to add tests into https://github.com/paleolimbot/geoarrow-cpp/blob/main/src/geoarrow/index_math_test.cc !). If you're interested, I'm happy to set up a meeting to set up your VSCode to get started 😄

mdsumner

mdsumner commented on Sep 5, 2022

@mdsumner
Author

👌

mdsumner

mdsumner commented on Sep 7, 2022

@mdsumner
Author

I definitely need the hand-holding! I think it would be valuable :)

paleolimbot

paleolimbot commented on Sep 7, 2022

@paleolimbot
Owner

Let's do it! It's tough for me to meet outside 8am - 4pm America/Halifax because of the kids or we can work through it via Twitter message. The gist of it is: open up geoarrow-cpp in VSCode, install the CMake extension, then open the "command palette" (Control-Shift-P) and choose CMake: configure, then CMake: build, then Cmake: run tests.

mdsumner

mdsumner commented on Sep 14, 2022

@mdsumner
Author
mdsumner

mdsumner commented on Sep 18, 2022

@mdsumner
Author

just reading Danielle's blog with a couple of rasterization steps, we could use a sparse cell approach - not profound or anything but a clear example for some crossover discussion: https://blog.djnavarro.net/posts/2022-08-23_visualising-a-billion-rows/

mdsumner

mdsumner commented on Sep 18, 2022

@mdsumner
Author

all the more reason for me to get these funs in here, I keep realising implications, and variations on the index conversions 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mdsumner@paleolimbot

        Issue actions

          discuss these cell logic functions · Issue #1 · paleolimbot/geoarrow-cpp-old