Skip to content
This repository was archived by the owner on Oct 2, 2024. It is now read-only.

New Machine Learning Model Extension Version 2.0.alpha schema and (de)serialization, validation package #2

Merged
merged 113 commits into from
Apr 18, 2024
Merged
Changes from 1 commit
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
4d3955f
python package, cli, and old ml model spec validation
rbavery Dec 10, 2023
0aa94fc
refactor models and replace data object with common metadata band object
rbavery Dec 15, 2023
d67ca11
basic pydantic models for refactored model extension
rbavery Dec 16, 2023
195a07b
readme updates
rbavery Jan 5, 2024
9b3ad07
start filling out base models with schema described in README and mai…
rbavery Jan 5, 2024
f2ccf4c
mostly finish filling out object models
rbavery Jan 6, 2024
e81bee9
some changes to fields and language edits
rbavery Jan 6, 2024
9645354
poetry run stac-model generates json example
rbavery Jan 6, 2024
cdc6cba
README updates
rbavery Jan 6, 2024
f746d6c
add to CHANGELOG
rbavery Jan 6, 2024
c68dd72
address comments
rbavery Jan 9, 2024
b5a1fc8
address more comments
rbavery Jan 9, 2024
ffa398e
address most first draft comments
rbavery Jan 10, 2024
0716ee3
add container instructions
rbavery Jan 10, 2024
ab74151
fix fields missing object, add accelerator constrained field to runtime
rbavery Jan 10, 2024
eb8b80a
account for models that take parameters with one or more of each inpu…
rbavery Jan 10, 2024
5378bae
properly escape or operators
rbavery Jan 10, 2024
0c9f1f2
language edits for model input, change stats to have type option for …
rbavery Jan 10, 2024
b2cc2f0
language edits down to Result Array Object, specify derived_from rel …
rbavery Jan 10, 2024
f72faaa
update Changelog, more language edits
rbavery Jan 10, 2024
1f1891b
update Changelog
rbavery Jan 10, 2024
71cac72
link lots of fields and objects to readme text within tables
rbavery Jan 10, 2024
03e334c
new precommit, align with draft 2 of spec
rbavery Jan 12, 2024
1e58c02
version update
rbavery Jan 12, 2024
f379125
precommit version update
rbavery Jan 12, 2024
cd628c1
use classification extension instead of custom class map object
rbavery Feb 14, 2024
3602800
flatten architecture object into top level fields, use classification…
rbavery Feb 15, 2024
c0946d1
add best practices doc referencing processing extension
rbavery Feb 15, 2024
2ba7483
refer to best practices in readme
rbavery Feb 15, 2024
d4c8f3f
add processing ex
rbavery Feb 15, 2024
d7d99be
make task enum searchable, add to top level, keep in output object
rbavery Feb 15, 2024
4310971
update Model Input object to account for normalization with clipping …
rbavery Feb 15, 2024
b95a696
remove superflous data type field, rely on data type in the array obj…
rbavery Feb 15, 2024
af98031
update stac_model and example
rbavery Feb 15, 2024
1aba5fc
rescale -> resize add super res task
rbavery Feb 15, 2024
f86a64b
update example
rbavery Feb 15, 2024
76c0ba9
best practices for processing ext, format and lint
rbavery Feb 23, 2024
b8efda0
address metadata and text comments
rbavery Feb 23, 2024
fb1de05
remove old model_metadata.py
rbavery Feb 23, 2024
b2318ad
correct mlm: prefix
rbavery Feb 23, 2024
43a7440
remove extra column
rbavery Feb 23, 2024
7a25d92
move stac_model up with gh actions, readme, templates
rbavery Feb 24, 2024
54a52fa
update test, make an examples module
rbavery Feb 26, 2024
41cf8ea
simplify test
rbavery Feb 26, 2024
21d1aa9
increment stac model version
rbavery Feb 26, 2024
2e07901
optional annotations, downgrade pydantic
rbavery Feb 26, 2024
3178b27
Merge branch 'crim-ca:main' into validate
rbavery Feb 27, 2024
2b62d7b
combine stac_model and pystac metadata
rbavery Feb 28, 2024
4590141
produce pystac item in example but can't serialize datetime
rbavery Feb 28, 2024
4fc2e8e
remove helper and use pystac.Item in example
rbavery Feb 28, 2024
30269d4
update cli example. still getting datetime serialization issue
rbavery Feb 28, 2024
9ddff24
export an example with stac common metadata, derived from link to dat…
rbavery Feb 28, 2024
0bed29b
remove mlm_prefix in pydantic models
rbavery Mar 7, 2024
bf3b07f
address comments
rbavery Mar 7, 2024
c52daa7
update poetry, remove s3Path since it fails with recent pydantic and …
rbavery Mar 8, 2024
c7c75ca
roles for asset objects
rbavery Mar 8, 2024
d091c2e
specify how to use commit hash and add to example
rbavery Mar 8, 2024
9e58fdf
fields reordered so datetimes are together
rbavery Mar 8, 2024
f9f66d6
remove geometry models
rbavery Mar 8, 2024
406279c
add roles
rbavery Mar 10, 2024
90a63d4
changelog updates
rbavery Mar 20, 2024
42bbebb
address feedback on formatting and descriptions
rbavery Mar 20, 2024
3429dea
reorg runtime fields upward and remove runtime object
rbavery Mar 20, 2024
2570e62
add asset descriptions
rbavery Mar 20, 2024
fbdb482
move some non-search info to assets
rbavery Mar 20, 2024
aa3bc9b
linking and formatting
rbavery Mar 20, 2024
2a2039b
remove parameters, add artifact type field
rbavery Mar 26, 2024
67b4688
[wip] address PR comments about tasks definitions
fmigneault-crim Mar 28, 2024
efe223b
apply PR recommendations
fmigneault-crim Mar 29, 2024
c79ea01
add best practice details
fmigneault-crim Mar 29, 2024
4d765c2
add yet again more best practices to integrate other STAC extensions
fmigneault-crim Mar 29, 2024
4db3b94
more best practices (relates to https://github.com/stac-extensions/cl…
fmigneault-crim Mar 30, 2024
669c9a3
adjustments from PR review
fmigneault-crim Mar 30, 2024
edcc8a2
add more mlm:accelerator details (relates to https://github.com/crim-…
fmigneault Mar 30, 2024
06ee0ef
add details about link releation types
fmigneault Mar 30, 2024
1a50057
add details about dimensions and tasks
fmigneault Mar 30, 2024
1faf4d9
more examples and details
fmigneault Apr 2, 2024
501971a
[wip] updating JSON-schema with MLM fields
fmigneault Apr 2, 2024
6ec1cd5
[wip] more updates to JSON schema for MLM definitions
fmigneault Apr 4, 2024
8aca9b3
more schema adjustments
fmigneault Apr 4, 2024
ab41765
more details about expected values for dim_order + pretrained flag
fmigneault Apr 4, 2024
be58e86
address incompatibility of 'end_datetime=null' with STAC Core (relate…
fmigneault Apr 4, 2024
8b46388
add mlm:hyperparameters defintion (fixes https://github.com/crim-ca/d…
fmigneault Apr 4, 2024
2b87297
add example bands and statitics details
fmigneault Apr 4, 2024
269bd73
update pydantic models with new json-schema fields
fmigneault Apr 4, 2024
03e7e06
add details & example with 'eo:bands' for special JSON schema conside…
fmigneault Apr 4, 2024
2d6c70b
update examples working against JSON schema (except check for cross-b…
fmigneault Apr 5, 2024
d111678
adjust pydantic eurosat_example with json-schema fields
fmigneault Apr 5, 2024
4d57e41
fix pydantic drop unset fields as intended
fmigneault Apr 5, 2024
4eb30da
add OmitIfNone reference code
fmigneault Apr 5, 2024
2155745
fix invalid raster/eo bands/statistics definitions in examples
fmigneault Apr 5, 2024
9d14ac6
update schema title and description
fmigneault Apr 9, 2024
afe0a9a
remove out of date items from changelog
rbavery Apr 9, 2024
1fb5f21
include PR recommended changes
fmigneault Apr 11, 2024
f1bee68
fix github ci command to instlal poetry
fmigneault Apr 17, 2024
5ad13fa
update ci commands
fmigneault Apr 17, 2024
a6bf8ee
update and fix markdown linting
fmigneault Apr 17, 2024
feb2ce0
fix missing jsonschema dependency
fmigneault Apr 17, 2024
8c62744
fix typing definitions
fmigneault Apr 18, 2024
5de6693
fix pydantic recursion error on JSON type
fmigneault Apr 18, 2024
7f7620c
more linting fixes
fmigneault Apr 18, 2024
1d7d17a
ignore for remark-lint
fmigneault Apr 18, 2024
a1872e8
add remark-lint ignore to npm scripts
fmigneault Apr 18, 2024
7f16176
downgrade remark-gfm
fmigneault Apr 18, 2024
1a5927e
drop remark-gfm causing issues
fmigneault Apr 18, 2024
a1192d3
update node in CI and reapply remark-gfm
fmigneault Apr 18, 2024
43b8691
fix STAC examples linting
fmigneault Apr 18, 2024
7d95cca
fix STAC MLM examples - remove old (invalid) DLM examples
fmigneault Apr 18, 2024
728dcba
fix STAC object self-references in python tests
fmigneault Apr 18, 2024
ead9833
add Python 3.12 to CI + rename CI to be more representative than 'build'
fmigneault Apr 18, 2024
7ef029d
fix incorrectly interpretation of pydoclint exclude dirs
fmigneault Apr 18, 2024
b1804fc
remove unnecessary package with dependency flagged by safety
fmigneault Apr 18, 2024
f2b7dc2
remove unnecessary package with dependency flagged by safety
fmigneault Apr 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
account for models that take parameters with one or more of each inpu…
…t or as separate inputs
rbavery committed Jan 10, 2024
commit eb8b80a3189c8544357c0c96edfd378432afa69a
34 changes: 21 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -36,13 +36,13 @@ Check the original technical report for an earlier version of the Model Extensio

## Item Properties and Collection Fields

| Field Name | Type | Description |
|------------------|---------------------------------------------|-------------------------------------------------------------------------------------|
| mlm:input | [[Model Input Object](#model-input-object)] | **REQUIRED.** Describes the transformation between the EO data and the model input. |
| mlm:architecture | [Architecture Object](#architecture-object) | **REQUIRED.** Describes the model architecture. |
| mlm:runtime | [Runtime Object](#runtime-object) | **REQUIRED.** Describes the runtime environments to run the model (inference). |
| mlm:output | [Model Output Object](#model-output-object) | **REQUIRED.** Describes each model output and how to interpret it. |

| Field Name | Type | Description |
|------------------|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| mlm:input | [[Model Input Object](#model-input-object)] | **REQUIRED.** Describes the transformation between the EO data and the model input. |
| mlm:architecture | [Architecture Object](#architecture-object) | **REQUIRED.** Describes the model architecture. |
| mlm:runtime | [Runtime Object](#runtime-object) | **REQUIRED.** Describes the runtime environments to run the model (inference). |
| mlm:output | [Model Output Object](#model-output-object) | **REQUIRED.** Describes each model output and how to interpret it. |
| parameters | [Parameters Object](#params-object) | Mapping with names for the parameters and their values. Some models may take additional scalars, tuples, and other non-tensor inputs like text. |

In addition, fields from the following extensions must be imported in the item:
- [Scientific Extension Specification][stac-ext-sci] to describe relevant publications.
@@ -56,15 +56,23 @@ In addition, fields from the following extensions must be imported in the item:
| Field Name | Type | Description |
|-------------------------|-----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| name | string | **REQUIRED.** Informative name of the input variable. Example "RGB Time Series" |
| bands | [string] | **REQUIRED.** Describes the EO bands used to train or fine-tune the model, which may be all or a subset of bands available in a STAc Item's [Band Object](#bands-and-statistics). |
| bands | [string] | **REQUIRED.** Describes the EO bands used to train or fine-tune the model, which may be all or a subset of bands available in a STAC Item's [Band Object](#bands-and-statistics). |
| input_feature | [Feature Array Object](#feature-array-object) | **REQUIRED.** The N-dimensional feature array object that describes the shape, dimension ordering, and data type. |
| params | dict | Dictionary with names for the parameters and their values. Some models may take multiple input arrays, scalars, other non-tensor inputs. |
| parameters | [Parameters Object](#params-object) | Mapping with names for the parameters and their values. Some models may take additional scalars, tuples, and other non-tensor inputs like text. |
| norm_by_channel | boolean | Whether to normalize each channel by channel-wise statistics or to normalize by dataset statistics. |
| norm_type | string | Normalization method. Select one option from "min_max", "z_score", "max_norm", "mean_norm", "unit_variance", "none" |
| rescale_type | string | High-level descriptor of the rescaling method to change image shape. Select one option from "crop", "pad", "interpolation", "none". If your rescaling method combines more than one of these operations, provide the name of the operation instead |
| statistics | [Statistics Object](stac-statistics) | Dataset statistics for the training dataset used to normalize the inputs. |
| pre_processing_function | string | A url to the preprocessing function where normalization and rescaling takes place, and any other significant operations. Or, instead, the function code path, for example: my_python_module_name:my_processing_function |

#### Parameters Object

| Field Name | Type | Description |
|-----------------------------------|---------|--------------------------------------------------------------------------|
| *parameter names depend on the model* | number | string | boolean | array | The field number and names depend on the model as do the values. Values should be not be n-dimensional array inputs. If the model input can be represented as an n-dimensional array, it should instead be supplied as another model input object. |

The parameters field can either be specified in the model input object if they are associated with a specific input or as an Item or Collection field if the parameters are supplied without relation to a specific model input.

#### Bands and Statistics

We use the [STAC 1.1 Bands Object](https://github.com/radiantearth/stac-spec/pull/1254) for representing bands information, including nodata value, data type, and common band names. Only bands used to train or fine tune the model should be included in this `bands` field.
@@ -75,10 +83,10 @@ A deviation from the [STAC 1.1 Bands Object](https://github.com/radiantearth/sta

#### Feature Array Object

| Field Name | Type | Description |
|------------|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Field Name | Type | Description |
|------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| shape | [integer] | **REQUIRED.** Shape of the input n-dimensional feature array ($N \times C \times H \times W$), including the batch size dimension. The batch size dimension must either be greater than 0 or -1 to indicate an unspecified batch dimension size. |
| dim_order | string | **REQUIRED.** How the above dimensions are ordered with the tensor. "bhw", "bchw", "bthw", "btchw" are valid orderings where b=batch, c=channel, t=time, h=height, w=width |
| dim_order | string | **REQUIRED.** How the above dimensions are ordered with the tensor. "bhw", "bchw", "bthw", "btchw" are valid orderings where b=batch, c=channel, t=time, h=height, w=width |
| dtype | string | **REQUIRED.** The data type of values in the feature array. Suggested to use [Numpy numerical types](https://numpy.org/devdocs/user/basics.types.html), omitting the numpy module, e.g. "float32" |

### Architecture Object
@@ -90,7 +98,7 @@ A deviation from the [STAC 1.1 Bands Object](https://github.com/radiantearth/sta
| memory_size | integer | **REQUIRED.** The in-memory size of the model on the accelerator during inference (bytes). |
| summary | string | Summary of the layers, can be the output of `print(model)`. |
| pretrained_source | string | Indicates the source of the pretraining (ex: ImageNet). |
| total_parameters | integer | Total number of parameters. |
| total_parameters | integer | Total number of parameters.

### Runtime Object