Replies: 1 comment 1 reply
-
Welcome @mankoff! I've not heard of datalad before, but it looks quite cool. Didn't realize there was anything like it. It feels like a pretty full solution to actually managing data and tracking provenance. STAC I think is much simpler - we're mostly focused on just providing metadata about spatial 'assets' - files that are located in space. So it's just a little json language that describes things in a common way. And yeah, as you said STAC is really focused on raster / point clouds. The STAC 'Collection' is aiming to align with the Open Geospatial Consortium 'Collection' construct, see https://github.com/cportele/ogcapi-building-blocks/blob/main/geo/common/json-collection.adoc for the simplest explanation of that. You could use that with vector data. On top of that core STAC API enables searching of data, which maybe overlaps a little bit more? But I'd guess that datalad and stac are mostly complementary. I'd see datalad as a way to handle all the syncing, and then perhaps standardize on using STAC to further describe the data you have, if it's geospatial. It'd be a sidecar file to the core data that you'd track provcenance / fetch / etc. So I think it could make sense for you to be able to 'understand' stac, and maybe extract common fields into your core data model. And then you could also put a STAC API interface on your core data catalog. But I think you'd have that be just the geospatial portion of your catalog. I'm not sure what capabilities datalad provides, but I could imagine doing something like making a datalad backend to a stac server like stac-fastapi so that users could do stac searches against the geospatial parts of your catalog. Hope this helps some, C |
Beta Was this translation helpful? Give feedback.
-
This is a repost of the same question asked about EODAG at CS-SI/eodag#409 and stactools at stac-utils/stactools#249 . I'm reposting here at the advice of @gadomski.
I've just discovered STAC, and the various tools as an implementation of the STAC spec. I was working on a data meta-portal (not a meta-data portal) using datalad https://www.datalad.org/ (built on git annex). datalad and STAC as implemented by some tools seem similar, but I'm new enough to not fully understand either. Can anyone here provide some insight into pros/cons of datalad vs. stacools, complementary or divergent use cases, etc.?
Our use case is to build a data portal that provides access to other datasets in other data portals, but end users don't need to know that dataA is in Zenodo and dataB is located elsewhere. Hence a data meta-portal. They can search from our portal, and then download. I will not have control over dataA or dataB or the servers that they reside on, but if they can be downloaded with http or ftp or other common protocols, then datalad will fetch them. The work we have to do is build some catalogs (currently, datalad catalogs, but could be STAC catalogs). Catalogs are build by crawling the existing dataset (wrapper around wget or Python or similar) and scripting the catalog construction.
The first/biggest difference I see is that STAC requires rasters or point clouds (but not vector data?) while datalad is data agnostic. It is built on git-annex, and works with any remote binary blob.
Any insights would be much appreciated.
Thank you,
Ken Mankoff
Beta Was this translation helpful? Give feedback.
All reactions