STAC tools vs. datalad #1172

mankoff · 2022-02-23T21:45:23Z

mankoff
Feb 23, 2022

This is a repost of the same question asked about EODAG at CS-SI/eodag#409 and stactools at stac-utils/stactools#249 . I'm reposting here at the advice of @gadomski.

I've just discovered STAC, and the various tools as an implementation of the STAC spec. I was working on a data meta-portal (not a meta-data portal) using datalad https://www.datalad.org/ (built on git annex). datalad and STAC as implemented by some tools seem similar, but I'm new enough to not fully understand either. Can anyone here provide some insight into pros/cons of datalad vs. stacools, complementary or divergent use cases, etc.?

Our use case is to build a data portal that provides access to other datasets in other data portals, but end users don't need to know that dataA is in Zenodo and dataB is located elsewhere. Hence a data meta-portal. They can search from our portal, and then download. I will not have control over dataA or dataB or the servers that they reside on, but if they can be downloaded with http or ftp or other common protocols, then datalad will fetch them. The work we have to do is build some catalogs (currently, datalad catalogs, but could be STAC catalogs). Catalogs are build by crawling the existing dataset (wrapper around wget or Python or similar) and scripting the catalog construction.

The first/biggest difference I see is that STAC requires rasters or point clouds (but not vector data?) while datalad is data agnostic. It is built on git-annex, and works with any remote binary blob.

Any insights would be much appreciated.

Thank you,

Ken Mankoff

cholmes · 2022-03-01T22:23:23Z

cholmes
Mar 1, 2022
Maintainer

Welcome @mankoff!

I've not heard of datalad before, but it looks quite cool. Didn't realize there was anything like it. It feels like a pretty full solution to actually managing data and tracking provenance.

STAC I think is much simpler - we're mostly focused on just providing metadata about spatial 'assets' - files that are located in space. So it's just a little json language that describes things in a common way. And yeah, as you said STAC is really focused on raster / point clouds. The STAC 'Collection' is aiming to align with the Open Geospatial Consortium 'Collection' construct, see https://github.com/cportele/ogcapi-building-blocks/blob/main/geo/common/json-collection.adoc for the simplest explanation of that. You could use that with vector data.

On top of that core STAC API enables searching of data, which maybe overlaps a little bit more?

But I'd guess that datalad and stac are mostly complementary. I'd see datalad as a way to handle all the syncing, and then perhaps standardize on using STAC to further describe the data you have, if it's geospatial. It'd be a sidecar file to the core data that you'd track provcenance / fetch / etc. So I think it could make sense for you to be able to 'understand' stac, and maybe extract common fields into your core data model. And then you could also put a STAC API interface on your core data catalog. But I think you'd have that be just the geospatial portion of your catalog.

I'm not sure what capabilities datalad provides, but I could imagine doing something like making a datalad backend to a stac server like stac-fastapi so that users could do stac searches against the geospatial parts of your catalog.

Hope this helps some,

C

1 reply

mankoff Mar 2, 2022
Author

Hi @cholmes - Thank you for the thoughtful reply. This is very helpful. After reading your reply and some others I agree the two are more complementary and less competitive than I first thought. I could see the two working very well together in my or other projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STAC tools vs. datalad #1172

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

STAC tools vs. datalad #1172

mankoff Feb 23, 2022

Replies: 1 comment · 1 reply

cholmes Mar 1, 2022 Maintainer

mankoff Mar 2, 2022 Author

mankoff
Feb 23, 2022

Replies: 1 comment 1 reply

cholmes
Mar 1, 2022
Maintainer

mankoff Mar 2, 2022
Author