Replies: 15 comments
-
I don't know much about Jsonnet, but the docs are very Javascript-focussed. I think we'd need more information about how the interoperability with other languages will work. Also, the Open Data Cube has used YAML for some time, and it gets heavy to parse many documents... JSON is so easy and interoperable, I can't see the advantages in an abstraction... (But this is my very quick take, without lots of thought!) |
Beta Was this translation helpful? Give feedback.
-
@alexgleith, Thanks for kicking off the conversation! Hopefully this makes it a bit more clear. Please ask anything you can think of. And I really want to know if I've made any incorrect assumptions. Basically, with one extra step, jsonnet using far fewer files can build a STAC json catalog tree with repetitive stuff written only once and then pulled in by functions or variables. e.g. if you have 4 NOAA GOES fire products that are very similar, they only differ in naming and extent parameters represented by 1 jsonnet file that gets converted by the jsonnet program into 4 STAC json files like everyone is accustomed to processing.
I think your comment says I wasn't clear in my description of what is going on. Or at least what I'm intending to have go on. There is no goal to alter how any other language interacts with STAC json. The jsonnet files are only mean to be passed through the jsonnet program (there are two versions: one in C++ and one in go). So any use of the jsonnet source-of-truth, requires first passing the
I am likely missing something. In general, it's possible for most yaml or json to be converted automatically the other. There are a few things like comments in yaml and multiple things (between json on the left, yaml on the right. Click for a readable image.
{
"stac_version": "1.0.0-beta.2",
"stac_extensions": [
"scientific"
],
"id": "NOAA/GOES/16/FDCC",
"title": "GOES-16 FDCC Series ABI Level 2 Fire/Hot Spot Characterization CONUS",
"gee:type": "image_collection",
"description": "[GOES](https://www.goes.noaa.gov) satellites are geostationary weather satellites run by NOAA.\n\nThe Fire (HSC) product contains four images: [SNIP]",
"links": [
{
"rel": "self",
"href": "https://earthengine-stac.storage.googleapis.com/catalog/NOAA_GOES_16_FDCC.json"
},
{
"rel": "root",
"href": "https://earthengine-stac.storage.googleapis.com/catalog/catalog.json"
}
],
"keywords": [
"climate",
"wildfire"
],
"providers": [
{
"name": "NOAA",
"roles": [
"producer",
"licensor"
],
"url": "https://data.noaa.gov/da [SNIP]"
}
],
"extent": {
"spatial": {
"bbox": [
[
-152.11,
14,
-49.18,
56.77
]
]
},
"temporal": {
"interval": [
[
"2017-05-24T00:00:00Z",
null
]
]
}
},
"properties": {
"gsd": 2000,
"eo:bands": [
{
"name": "Area",
"description": "Fire area",
"gee:unit": "m^2",
"gee:scale": 60.98,
"gee:offset": 4000
},
{
"name": "Mask",
"description": "Fire mask categories. Pixel values in [SNIP]",
"gee:classes": [
{
"value": 10,
"color": "red",
"description": "Processed fire"
},
{
"value": 35,
"color": "darkblue",
"description": "Low probability fire, filtered"
}
]
}
]
},
"sci:citation": "Early characterization of the active fire [SNIP]",
"sci:publications": [
{
"citation": "Schmit, T., Griffith, P., et al, (2016), A closer look [SNIP]",
"doi": "10.1175/BAMS"
}
],
"summaries": {
"Area": {
"min": 0,
"max": 16723,
"gee:estimated_range": true
}
}
}
# Some comments here before the document starts
---
# This STAC yaml (which isn't really a thing) is cut down from the original
# Earth Engine NOAA_GOES_16_FDCC.json
stac_version: 1.0.0-beta.2
stac_extensions:
- scientific
id: NOAA/GOES/16/FDCC
title: GOES-16 FDCC Series ABI Level 2 Fire/Hot Spot Characterization CONUS
gee:type: image_collection
description: |-
[GOES](https://www.goes.noaa.gov) satellites are geostationary weather satellites run by NOAA.
The Fire (HSC) product contains four images: [SNIP]
links:
- rel: self
href: https://earthengine-stac.storage.googleapis.com/catalog/NOAA_GOES_16_FDCC.json
- rel: root
href: https://earthengine-stac.storage.googleapis.com/catalog/catalog.json
keywords:
- climate
- wildfire
providers:
- name: NOAA
roles:
- producer
- licensor
url: https://data.noaa.gov/da [SNIP]
extent:
spatial:
bbox:
- - -152.11
- 14
- -49.18
- 56.77
temporal:
interval:
- - '2017-05-24T00:00:00Z'
-
properties:
gsd: 2000
eo:bands:
- name: Area
description: Fire area
gee:unit: m^2
gee:scale: 60.98
gee:offset: 4000
- name: Mask
description: Fire mask categories. Pixel values in [SNIP]
gee:classes:
- value: 10
color: red
description: Processed fire
- value: 35
color: darkblue
description: Low probability fire, filtered
sci:citation: Early characterization of the active fire [SNIP]
sci:publications:
- citation: Schmit, T., Griffith, P., et al, (2016), A closer look [SNIP]
doi: 10.1175/BAMS
summaries:
Area:
min: 0
max: 16723
gee:estimated_range: true A similar local stac_const = import "stac_const.libsonnet";
local stac = import "stac_lib.libsonnet";
// Probably want a stac_goes.libsonnet here.
// sat is 16 or 17
// product is 'fdcc' or 'fdcf' (fdc for fire/hotspt; c for CONUS or f for full-disk)
local goes_fdc_template(sat, product) = {
// Left out some local functions and hidden values here.
id: goes_id(sat, product), // <-- This is a function that makes things like "NOAA/GOES/16/FDCC"
'gee:type': 'image_collection',
stac_version: stac_const.stac_version,
stac_extensions: ['scientific'],
title: 'GOES-' + sat + ' ' + product + ' Series ABI Level 2 Fire/Hot Spot',
description: |||
[GOES](https://www.goes.noaa.gov) satellites are geostationary weather
satellites run by NOAA.
[SNIP]
[README](https://www.ncdc.noaa.gov/data-access/satellite-data/goes-r-series-satellites#FDC)
NOAA provides the following scripts for suggested categories, color
maps, and visualizations:
- [GOES-16-17_FireDetection.js](https://github.com/google/earthengine-community/blob/master/datasets/scripts/GOES-16-17_FireDetection.js)
- [GOES-16-17_FireReclassification.js](https://github.com/google/earthengine-community/blob/master/datasets/scripts/GOES-16-17_FireReclassification.js)
|||,
// TODO(schwehr): Can we use https://spdx.org/licenses/CC0-1.0.html for GOES?
license: 'proprietary',
links: [
{rel: 'self', href: self_url},
{rel: 'parent', href: stac_const.catalog_url},
{rel: 'root', href: stac_const.catalog_url},
{rel: 'preview', href: sample_url},
{rel: 'license', href: self_ee_catalog_url + '#terms-of-use'},
{rel: 'source', href: gcs_path},
{rel: 'cite-as', href: stac.doi_url(primary_doi)}
],
local this_sat = sat_info[std.toString(sat)],
keywords: fdc_keywords + [
'goes-' + sat,
this_sat['letter'],
this_sat['region'],
],
providers: [
stac.producer_provider('NOAA', noaa_fdc_url),
stac.host_provider(self_ee_catalog_url),
],
extent: extents[product + '_' + std.toString(sat)],
properties: {
gsd: 2000.0,
'eo:bands': [
{
name: 'Area',
description: 'Fire area',
'gee:unit': 'm^2',
'gee:scale': 60.98,
'gee:offset': 4000.0
},
{
name: 'Mask',
description: |||
Fire mask categories. Pixel values in the fire mask image identify a
fire category and diagnostic information associated with algorithm
[SNIP]
|||,
'gee:classes': [
stac.class_entry(10, 'red', 'Processed fire'),
// [SNIP]
stac.class_entry(35, 'darkblue', 'Low probability fire, filtered'),
]
},
// SNIP
],
},
'sci:citation': 'Early characterization of the active [SNIP]',
'sci:publications': [
{
citation: 'Schmit, T., Griffith, P., et al, (2016), A [SNIP]',
doi: '10.1175/BAMS'
}
],
summaries: {
Area: stac.stats_obj(0, 16723, true),
Temp: stac.stats_obj(0, 32642, true),
Power: stac.stats_obj(0, 200000, false),
DQF: stac.stats_obj(0, 5, false),
},
};
// Generate the 4 json files.
{
['NOAA_GOES_' + sat + '_' + product + '.json']: goes_fdc_template(sat, product)
for sat in [16, 17]
for product in ['FDCC', 'FDCF']
} |
Beta Was this translation helpful? Give feedback.
-
Hey @schwehr, I guess I was missing that this is templating the creation of JSON static documents, essentially! My points were more around if we used that standard to store text, in which case, my point around yaml is that it is heavy to process. (Despite being nicer to write for humans than JSON!) I'll step back from the conversation, though. Thanks for raising it! |
Beta Was this translation helpful? Give feedback.
-
@alexgleith I am curious what you've seen that says that yaml is heavy to process. One of the original ideas was to start with yaml rather than json and do some templating on top of the yaml. jsonnet means I'm not inventing yet another templating language, but yaml seems to me to be fairly light weight when compared to many other serialization formats. |
Beta Was this translation helpful? Give feedback.
-
I haven't benchmarked myself, but we're careful to use the C implementation of the yaml parser, because it's slow otherwise. And even with the optimised parser it is heavy. This is me talking anecdotes more than stats, but anecdotes from people I trust! I don't think it's an issue with a templating tool. But when you have a million files to parse, things need optimising. |
Beta Was this translation helpful? Give feedback.
-
Jsonnet seems cool, and like you have a good templating process that's working for you. I'm a little confused about the goal of this issue - are you looking to gather general information about how folks are templating, are you proposing a new set of tooling, or are you proposing that jsonnet would be integrated into this repo/another STAC repo? This at least is a good tip for if and when I run into a situation that could use templating, and something that stactools would be too heavyweight for, so thank you! If you end up developing some tooling around it's usage, I'd be interested in following - and also adding it to the stac-utils org if possible. |
Beta Was this translation helpful? Give feedback.
-
@lossyrob , I didn't explicitly state my goals, so let me give it a go for what I am hoping to get out of this discussion. tl;dr: Goals: share knowledge, get suggestions, compare with others doing templating I am planning to use this templating for Earth Engine and expose the Earth Engine catalog as jsonnet on github. Within that project, I will definitely have a rule that triggers generating the static STAC json files for the catalog. I am expecting/hoping users will submit PRs against that repo for new public assets in jsonnet format (which can be just STAC json with the extension changed). I also plan to have the build system run at least one validator against the resulting STAC json. I hope to provide a small program that does simple cleanup of STAC json files to make them have the version be templated, fields without quotes (when there is no By talking about this publicly before I deploy anything production ready, I hope to:
Not goals:
I do need to have another look at stactools to see if that influences anything I do with templating. |
Beta Was this translation helpful? Give feedback.
-
Asked about using python + [templating language, database, pystac, etc.] examples here: https://gitter.im/SpatioTemporal-Asset-Catalog/python?at=6026b46d32e01b4f719002a9 |
Beta Was this translation helpful? Give feedback.
-
I don't know enough about this, but it seems interesting. When I'm creating a STAC catalog for a new set of data I create the Collection JSON, then would have some Python code that creates the Items, like is currently done in PySTAC. Would there be an advantage to using Jsonnet here? Does it give us a reusable template we can use across programming languages? Is there a use case in here somewhere where given templates for STAC Items with different extensions it would make my job easier as a data provider? Or as a processor who is generating derived data? Not sure my questions are clear, just trying to wrap my head around it. |
Beta Was this translation helpful? Give feedback.
-
@matthewhanson Please keep asking questions. That's exactly what I need. Now to see if I can try to give a useful / constructive response.
I'm trying to figure out the workflow myself. For Earth Engine, we want to have a starting point for all processes that users can contribute to the collection in. That collection of info will be used to produce the STAC JSON (If we started with STAC json, that would be a no-op) and it's the collection of things that will control the content here: https://developers.google.com/earth-engine/datasets. It will also get used for other dataset management tasks. I had thought about checking in PySTAC scripts that, on each change to the repo, produce the STAC json that is uploaded to the static STAC json catalog. While that would be super powerful and convenient for us Python Coders, it seemed a bit hard to manage and hard on folks who don't know Python. Editing JSON isn't easy for many, but it's probably easier than writing python for many. A user contributing to a catalog that is jsonnet based can do one of:
The system I envision here is programming language agnostic. The base data is jsonnet files (that might include .libsonnet libraries). Any processing done using the data would likely generate the STAC json (or read a cache of the json like on cloud storage) and just use the STAC json as normal. It does have at least one major drawback that I can think of. If someone is accustomed to using something like PySTAC to make updates to a STAC catalog (e.g. doing an upgrade to a new version of STAC), then that is going to be a bit of a mess if there is a lot of use of jsonnet variables, functions and libraries. For Earth Engine, we plan to release the jsonnet files as a part of a GitHub repo that can build json. This is in addition to the STAC json that will sit in a GCS bucket. That repo will have instructions for how to do a first pass conversion of STAC json into jsonnet to make a PR, but really, users can make a PR with foo.jsonnet where that file is just a rename of foo.json and it will work. We can then do things like have the stac version set from a variable in a follow up commit. |
Beta Was this translation helpful? Give feedback.
-
It's okay if no other projects use this strategy, but if folks are using some other templating language to drive their STAC catalogs, they might be able to find things in the Earth Engine use of jsonnet that applies their system. |
Beta Was this translation helpful? Give feedback.
-
@schwehr I like this idea as a technique to allow language agnostic (or language atheist) users to update metadata easily that can then be applied to STAC. I wrote up some thoughts on an enhancement to stactools that would take advantage of this idea in stac-utils/stactools#68; curious about your thoughts! |
Beta Was this translation helpful? Give feedback.
-
Been doing a lot more on the jsonnet based STAC catalog. My latest draft is here and is stac_version 1.0.0-rc.2. https://storage.googleapis.com/earthengine-stac-experimental/catalog-v015/catalog.json Until recently, I didn't realize that jsonnet makes it easy to include text from files into string fields, so I've got an initial example of the results in NOAA/GOES/catalog.json. The jsonnet looks like: local id = 'NOAA/GOES';
local ee_const = import 'earthengine_const.libsonnet';
local basename = 'catalog';
local base_filename = basename + '.json';
local base_url = ee_const.catalog_base + 'NOAA/GOES/';
local parent_url = ee_const.catalog_base + 'NOAA/catalog.json';
local self_url = base_url + base_filename;
{
stac_version: ee_const.stac_version,
id: id,
title: 'GOES',
description: importstr 'description.md', // <------ importstr done here
links: [
{ rel: 'root', href: ee_const.catalog_url },
{ rel: 'parent', href: parent_url },
{ rel: 'self', href: self_url },
{ rel: 'child', title: 'NOAA_GOES_16_FDCC', href: base_url + 'NOAA_GOES_16_FDCC.json' },
{ rel: 'child', title: 'NOAA_GOES_16_FDCF', href: base_url + 'NOAA_GOES_16_FDCF.json' },
{ rel: 'child', title: 'NOAA_GOES_16_MCMIPC', href: base_url + 'NOAA_GOES_16_MCMIPC.json' },
{ rel: 'child', title: 'NOAA_GOES_16_MCMIPF', href: base_url + 'NOAA_GOES_16_MCMIPF.json' },
{ rel: 'child', title: 'NOAA_GOES_16_MCMIPM', href: base_url + 'NOAA_GOES_16_MCMIPM.json' },
{ rel: 'child', title: 'NOAA_GOES_17_FDCC', href: base_url + 'NOAA_GOES_17_FDCC.json' },
{ rel: 'child', title: 'NOAA_GOES_17_FDCF', href: base_url + 'NOAA_GOES_17_FDCF.json' },
{ rel: 'child', title: 'NOAA_GOES_17_MCMIPC', href: base_url + 'NOAA_GOES_17_MCMIPC.json' },
{ rel: 'child', title: 'NOAA_GOES_17_MCMIPF', href: base_url + 'NOAA_GOES_17_MCMIPF.json' },
{ rel: 'child', title: 'NOAA_GOES_17_MCMIPM', href: base_url + 'NOAA_GOES_17_MCMIPM.json' },
],
} With [GOES](https://www.goes.noaa.gov) satellites are geostationary weather
satellites run by NOAA.
Bands 1-6 are reflective. The dimensionless "reflectance factor" quantity is
normalized by the solar zenith angle. These bands support the characterization
of clouds, vegetation, snow/ice, and aerosols. Bands 7-16 are emissive. The
brightness temperature at the Top-Of-Atmosphere (TOA) is measured in
Kelvin. These bands support the characterization of the surface, clouds, water
vapor, ozone, volcanic ash, and dust based on emissive properties.
The ABI L2 product is used at the basis for all derived products such as the
Fire / Hotspot detection.
GOES file names use the Julian day of the year. Useful commands:
# Calendar date to Julian day
date -d "2019-11-15" +%j
# Julian day to calendar date:
year=2019; day=316; date -d "$day days $year-01-01" +%Y%m%d
Data types: (C: CONUS == Continental US, F: Full-disk, M: Mesoscale)
Product | Description
:------ | :----------------------------------------------
ACHA[CFM] | AWG Cloud Height Algorithm
ACM[CFM] | ABI Cloud Mask
ACTP[CFM] | Cloud Top Phase
CMIP[CFM] | Cloud and Moisture Imagery
COD[CF] | Cloud Optical Depth
DMW[CFM] | Derived Motion Winds (vectors)
FDC[CF] | Fire
LCFA | GLM Lightning Cluster-Filter Algorithm (points)
MCMIP[CFM] | Multichannel CMIP
Rad[CFM] | Radiance in 16 bands
Band, Center wavelen microns, Nickname, Classification, Function:
Channel | Wavelength | Description | Code | Use
:------ | :--------- | :------------------ | :------ | :------------------------------------
C01 | 0.47 | Blue | V | Aerosols
C01 | 0.47 | Blue | V | Aerosols
C02 | 0.64 | Red | V | Clouds
C03 | 0.87 | Veggie | Near-IR | Veg
C04 | 1.38 | Cirrus | Near-IR | Cirrus
C05 | 1.61 | Snow/Ice | Near-IR | Snow/ice discrim, cloud phase
C06 | 2.25 | Cloud Particle Size | Near-IR | Cloud particle size, snow cloud phase
C07 | 3.90 | Shortwave Window | IR | Fog, stratus, fire, volcanism
C08 | 6.90 | Upper Tropo Vapor | IR | Various atmospheric features
C09 | 6.95 | Mid Tropo Vapor | IR | Water vapor features
C10 | 7.34 | Lower Tropo Vapor | IR | Water vapor features
C11 | 8.50 | Cloud-Top Phase | IR | Cloud-top phase
C12 | 9.61 | Ozone | IR | Total column ozone
C13 | 10.35 | Clean IR | IR | Clouds
C14 | 11.20 | IR Longwave Window | IR | Clouds
C15 | 12.30 | Dirty IR | IR | Clouds
C16 | 13.30 | CO | IR | Air temperature, clouds
- https://www.ncdc.noaa.gov/data-access/satellite-data/goes-r-series-satellites
- https://www.goes-r.gov/users/docs/PUG-main-vol1.pdf
- https://www.weather.gov/media/crp/GOES_16_Guides_FINALBIS.pdf
- Wikipedia:
- [GOES](https://en.wikipedia.org/wiki/Geostationary_Operational_Environmental_Satellite)
- [GOES-R](https://en.wikipedia.org/wiki/GOES-16)
- [GOES-S](https://en.wikipedia.org/wiki/GOES-17)
- [GOES-T](https://en.wikipedia.org/wiki/GOES-T) Launch date: 2021-Dec-07 (planned)
- Google Cloud Public Data Pages
- https://console.cloud.google.com/marketplace/details/noaa-public/goes-16
- https://console.cloud.google.com/marketplace/details/noaa-public/goes-17
- Original data:
- https://console.cloud.google.com/storage/browser/gcp-public-data-goes-16/
- https://console.cloud.google.com/storage/browser/gcp-public-data-goes-17/ |
Beta Was this translation helpful? Give feedback.
-
Another similar system is https://cuelang.org/ |
Beta Was this translation helpful? Give feedback.
-
https://github.com/google/earthengine-catalog now has >500 commits and about 40 of those are from people who do not work at Google. Unfortunately, I have made most of the commits with over 300 of them. It's at a point where we can make some comments about Jsonnet for STAC, but a lot is still in the initial learning phase. Some initial thoughts.
|
Beta Was this translation helpful? Give feedback.
-
Hi all,
This issue is for discussion on templating STAC json using Jsonnet. It would also be good to hear from folks who are doing templating to go from a source-of-truth (
sot
) to a STAC catalog.Historically, the Earth Engine public data catalog has been driven by product yaml config files. The yaml files are instances of a protobuf product definition that is specific to Earth Engine. Those files give most of the metadata about an asset or collection that a user needs to understand when using that item in Earth Engine. From those files, we build that public data catalog and convert and export to a cloud storage STAC catalog. We are trying to move closer to STAC json as our sot, but have a lot of redundant information in the catalog that makes maintaining a strict STAC compliant catalog more difficult. With the yaml, we had our own inheritance based templating system that helped somewhat. However, I think we can do dramatically better without adding too much complexity using a lightweight setup of jsonnet. One excellent property of jsonnet files is that an existing STAC json file can just be renamed and
jsonnet foo.jsonnet
will generate the originalfoo.json
modulo order differences in the dictionary entries (as is the case with most json writing libraries).I've created a rough prototype to illustrate the ideas. It has both simple and more complicated cases along with a starter library of jsonnet functions that might (or might not!) make for files that are easier to read by humans.
https://gist.github.com/schwehr/b72a7c7dad9edc10ea9ebc987c5620f1
Features of a jsonnet sot:
stac_const.libsonnet
Key things that a jsonnet template sot doesn't do
There are lots of other ways that templating can be done. It would be good to collect examples of things done in other languages. I thought about writing python with pystac, but that seemed pretty heavy weight. I might do that to generate the original STAC json and then write a small python program to do the easy conversions to make a single STAC json file into a jsonnet file .e.g. set the stac version from a library, factor out common strings for our catalog, etc.
Some inline (and cut down) examples from my demo to give folks something to start on without having to go dig through my gist:
Part of stac_lib.libsonnet:
USGS_GAP_CONUS_2011.jsonnet:
Beta Was this translation helpful? Give feedback.
All reactions