diff --git a/README.md b/README.md index 8cdef69..66d01c2 100644 --- a/README.md +++ b/README.md @@ -43,21 +43,28 @@ Check the original technical report for an earlier version of the Model Extensio ## Item Properties and Collection Fields -| Field Name | Type | Description | -|-----------------------|-----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| mlm:name | string | **REQUIRED.** A unique name for the model. This should include but be distinct from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model. | -| mlm:task | [Task Enum](#task-enum) | **REQUIRED.** Specifies the primary Machine Learning task for which the output can be used for. If there are multi-modal outputs, specify the primary task and specify each task in the [Model Output Object](#model-output-object). | -| mlm:framework | string | **REQUIRED.** Framework used to train the model (ex: PyTorch, TensorFlow). | -| mlm:framework_version | string | **REQUIRED.** The `framework` library version. Some models require a specific version of the machine learning `framework` to run. | -| mlm:file_size | integer | **REQUIRED.** The size on disk of the model artifact (bytes). | -| mlm:memory_size | integer | **REQUIRED.** The in-memory size of the model on the accelerator during inference (bytes). | -| mlm:input | [[Model Input Object](#model-input-object)] | **REQUIRED.** Describes the transformation between the EO data and the model input. | -| mlm:output | [[Model Output Object](#model-output-object)] | **REQUIRED.** Describes each model output and how to interpret it. | -| mlm:runtime | [[Runtime Object](#runtime-object)] | **REQUIRED.** Describes the runtime environment(s) to run inference with the model asset(s). | -| mlm:total_parameters | integer | Total number of model parameters, including trainable and non-trainable parameters. | -| mlm:pretrained_source | string | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. | -| mlm:summary | string | Text summary of the model and it's purpose. | -| mlm:parameters | [Parameters Object](#parameters-object) | Mapping with names for the parameters and their values. Some models may take additional scalars, tuples, and other non-tensor inputs like text during inference (Segment Anything). The field should be specified here if parameters apply to all Model Input Objects. If each Model Input Object has parameters, specify parameters in that object. | +| Field Name | Type | Description | +| --------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| mlm:name | string | **REQUIRED.** A unique name for the model. This should include but be distinct from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model. | +| mlm:task | [Task Enum](#task-enum) | **REQUIRED.** Specifies the primary Machine Learning task for which the output can be used for. If there are multi-modal outputs, specify the primary task and specify each task in the [Model Output Object](#model-output-object). | +| mlm:framework | string | **REQUIRED.** Framework used to train the model (ex: PyTorch, TensorFlow). | +| mlm:framework_version | string | **REQUIRED.** The `framework` library version. Some models require a specific version of the machine learning `framework` to run. | +| mlm:file_size | integer | **REQUIRED.** The size on disk of the model artifact (bytes). | +| mlm:memory_size | integer | **REQUIRED.** The in-memory size of the model on the accelerator during inference (bytes). | +| mlm:input | [[Model Input Object](#model-input-object)] | **REQUIRED.** Describes the transformation between the EO data and the model input. | +| mlm:output | [[Model Output Object](#model-output-object)] | **REQUIRED.** Describes each model output and how to interpret it. | +| mlm:accelerator | [Accelerator Enum](#accelerator-enum) | **REQUIRED.** The intended computational hardware that runs inference. | +| mlm:accelerator_constrained | boolean | **REQUIRED.** True if the intended `accelerator` is the only `accelerator` that can run inference. False if other accelerators, such as amd64 (CPU), can run inference. | +| mlm:hardware_summary | string | **REQUIRED.** A high level description of the number of accelerators, specific generation of the `accelerator`, or other relevant inference details. | +| mlm:model | [Asset Object](stac-asset) | **REQUIRED.** Asset object containing URI to the model file. Recommended asset `roles` include `weights` for model weights that need to be loaded by a model definition and `compiled` for models that can be loaded directly without an intermediate model definition. | +| mlm:source_code | [Asset Object](stac-asset) | **REQUIRED.** Source code description. Can describe a github repo, zip archive, etc. The `description` field in the Asset Object should reference the inference function, for example my_package.my_module.predict. Recommended asset `roles` include `code` and `metadata`, since the source code asset might also refer to more detailed metadata than this spec captures. | +| mlm:container | [Asset Object](stac-asset) | **RECOMMENDED.** Information to run the model in a container with URI to the container. | +| mlm:total_parameters | integer | Total number of model parameters, including trainable and non-trainable parameters. | +| mlm:pretrained_source | string | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. | +| mlm:summary | string | Text summary of the model and it's purpose. | +| commit_hash | string | Hash value pointing to a specific version of the code used to run model inference. If this is supplied, `source code` should also be supplied and the commit hash must refer to a Git repository linked or described in the `source_code` [Asset Object](stac-asset). | +| batch_size_suggestion | number | A suggested batch size for the accelerator and summarized hardware. | +| mlm:parameters | [Parameters Object](#parameters-object) | Mapping with names for the parameters and their values. Some models may take additional scalars, tuples, and other non-tensor inputs like text during inference (Segment Anything). The field should be specified here if parameters apply to all Model Input Objects. If each Model Input Object has parameters, specify parameters in that object. | In addition, fields from the following extensions must be imported in the item: - [Scientific Extension Specification][stac-ext-sci] to describe relevant publications. @@ -66,21 +73,36 @@ In addition, fields from the following extensions must be imported in the item: [stac-ext-sci]: https://github.com/radiantearth/stac-spec/tree/v1.0.0-beta.2/extensions/scientific/README.md [stac-ext-ver]: https://github.com/radiantearth/stac-spec/tree/v1.0.0-beta.2/extensions/version/README.md +#### Accelerator Enum + +It is recommended to define `accelerator` with one of the following values: + +- `amd64` models compatible with AMD or Intel CPUs (no hardware specific optimizations) +- `cuda` models compatible with NVIDIA GPUs +- `xla` models compiled with XLA. models trained on TPUs are typically compiled with XLA. +- `amd-rocm` models trained on AMD GPUs +- `intel-ipex-cpu` for models optimized with IPEX for Intel CPUs +- `intel-ipex-gpu` for models optimized with IPEX for Intel GPUs +- `macos-arm` for models trained on Apple Silicon + +[stac-asset]: https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#asset-object + ### Model Input Object -| Field Name | Type | Description | | -|-------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---| -| name | string | **REQUIRED.** Informative name of the input variable. Example "RGB Time Series" | | -| bands | [string] | **REQUIRED.** The names of the raster bands used to train or fine-tune the model, which may be all or a subset of bands available in a STAC Item's [Band Object](#bands-and-statistics). | | -| input_array | [Array Object](#feature-array-object) | **REQUIRED.** The N-dimensional array object that describes the shape, dimension ordering, and data type. | | -| parameters | [Parameters Object](#parameters-object) | Mapping with names for the parameters and their values. Some models may take additional scalars, tuples, and other non-tensor inputs like text. | | -| norm_by_channel | boolean | Whether to normalize each channel by channel-wise statistics or to normalize by dataset statistics. If True, use an array of [Statistics Objects](#bands-and-statistics) that is ordered like the `bands` field in this object. | | -| norm_type | string | Normalization method. Select one option from `min_max`, `z_score`, `max_norm`, `mean_norm`, `unit_variance`, `norm_with_clip`, `none` | | -| resize_type | string | High-level descriptor of the rescaling method to change image shape. Select one option from `crop`, `pad`, `interpolation`, `none`. If your rescaling method combines more than one of these operations, provide the name of the operation instead | | -| statistics | [Statistics Object](stac-statistics) `\|` [[Statistics Object](stac-statistics)] | Dataset statistics for the training dataset used to normalize the inputs. | | -| norm_with_clip_values | [integer] | If `norm_type = "norm_with_clip"` this array supplies a value that is less than the band maximum. The array must be the same length as "bands", each value is used to divide each band before clipping values between 0 and 1. | -| pre_processing_function | string | A url to the preprocessing function where normalization and rescaling takes place, and any other significant operations. Or, instead, the function code path, for example: `my_python_module_name:my_processing_function` | | +| Field Name | Type | Description | | +| ----------------------- | -------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- | +| name | string | **REQUIRED.** Informative name of the input variable. Example "RGB Time Series" | | +| bands | [string] | **REQUIRED.** The names of the raster bands used to train or fine-tune the model, which may be all or a subset of bands available in a STAC Item's [Band Object](#bands-and-statistics). | | +| input_array | [Array Object](#feature-array-object) | **REQUIRED.** The N-dimensional array object that describes the shape, dimension ordering, and data type. | | +| parameters | [Parameters Object](#parameters-object) | Mapping with names for the parameters and their values. Some models may take additional scalars, tuples, and other non-tensor inputs like text. | | +| norm_by_channel | boolean | Whether to normalize each channel by channel-wise statistics or to normalize by dataset statistics. If True, use an array of [Statistics Objects](#bands-and-statistics) that is ordered like the `bands` field in this object. | | +| norm_type | string | Normalization method. Select one option from `min_max`, `z_score`, `max_norm`, `mean_norm`, `unit_variance`, `norm_with_clip`, `none` | | +| resize_type | string | High-level descriptor of the rescaling method to change image shape. Select one option from `crop`, `pad`, `interpolation`, `none`. If your rescaling method combines more than one of these operations, provide the name of the operation instead | | +| statistics | [Statistics Object](stac-statistics) `\|` [[Statistics Object](stac-statistics)] | Dataset statistics for the training dataset used to normalize the inputs. | | +| norm_with_clip_values | [integer] | If `norm_type = "norm_with_clip"` this array supplies a value that is less than the band maximum. The array must be the same length as "bands", each value is used to divide each band before clipping values between 0 and 1. | +| pre_processing_function | string | A url to the preprocessing function where normalization and rescaling takes place, and any other significant operations. Or, instead, the function code path, for example: `my_python_module_name:my_processing_function` | | + #### Parameters Object @@ -88,7 +110,7 @@ In addition, fields from the following extensions must be imported in the item: |---------------------------------------|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | *parameter names depend on the model* | number `\|` string `\|` boolean `\|` array | The number of fields and their names depend on the model. Values should not be n-dimensional array inputs. If the model input can be represented as an n-dimensional array, it should instead be supplied as another [model input object](#model-input-object). | -The `Parameters Object` is simply a user defined mapping of parameters to parameter values. This is meant to capture model inputs that can't be represented as n-dimensional arrays/tensors. This includes inputs like scalars, text, and booleans. The `parameters` field can either be specified in the [Model Input Object](#model-input-object) if they are associated with a specific input or as an [Item or Collection](#item-properties-and-collection-fields) field if the parameters are supplied without relation to a specific model input. For example: the [Segment Anything](https://ai.meta.com/blog/segment-anything-foundation-model-image-segmentation/) foundational model accepts a label integer for each image input. +The `Parameters Object` is a user defined mapping of parameters to parameter values. This is meant to capture model inputs that can't be represented as n-dimensional arrays/tensors. This includes inputs like scalars, text, and booleans. The `parameters` field can either be specified in the [Model Input Object](#model-input-object) if they are associated with a specific input or as an [Item or Collection](#item-properties-and-collection-fields) field if the parameters are supplied without relation to a specific model input. For example: the [Segment Anything](https://ai.meta.com/blog/segment-anything-foundation-model-image-segmentation/) foundational model accepts a label integer for each image input. #### Bands and Statistics @@ -100,51 +122,23 @@ A deviation from the [STAC 1.1 Bands Object](https://github.com/radiantearth/sta #### Array Object -| Field Name | Type | Description | | -|------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| -| shape | [integer] | **REQUIRED.** Shape of the input n-dimensional array ($N \times C \times H \times W$), including the batch size dimension. The batch size dimension must either be greater than 0 or -1 to indicate an unspecified batch dimension size. | | -| dim_order | string | **REQUIRED.** How the above dimensions are ordered within the `shape`. `bhw`, `bchw`, `bthw`, `btchw` are valid orderings where `b`=batch, `c`=channel, `t`=time, `h`=height, w=width. | | -| data_type | enum | **REQUIRED.** The data type of values in the n-dimensional array. For model inputs, this should be the data type of the processed input supplied to the model inference function, not the data type of the source bands. Use one of the [common metadata data types](https://github.com/stac-extensions/raster?tab=readme-ov-file#data-types). | | - +| Field Name | Type | Description | | +| ---------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- | +| shape | [integer] | **REQUIRED.** Shape of the input n-dimensional array ($N \times C \times H \times W$), including the batch size dimension. The batch size dimension must either be greater than 0 or -1 to indicate an unspecified batch dimension size. | | +| dim_order | string | **REQUIRED.** How the above dimensions are ordered within the `shape`. `bhw`, `bchw`, `bthw`, `btchw` are valid orderings where `b`=batch, `c`=channel, `t`=time, `h`=height, w=width. | | +| data_type | enum | **REQUIRED.** The data type of values in the n-dimensional array. For model inputs, this should be the data type of the processed input supplied to the model inference function, not the data type of the source bands. Use one of the [common metadata data types](https://github.com/stac-extensions/raster?tab=readme-ov-file#data-types). | | Note: It is common in the machine learning, computer vision, and remote sensing communities to refer to rasters that are inputs to a model as arrays or tensors. Array Objects are distinct from the JSON array type used to represent lists of values. +#### Container Asset -### Runtime Object - -| Field Name | Type | Description | -| ----------------------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| model_asset | [Asset Object](stac-asset) | **REQUIRED.** Asset object containing URI to the model file. Recommended asset `roles` include `weights` for model weights that need to be loaded by a model definition and `compiled` for models that can be loaded directly without an intermediate model definition. | -| source_code | [Asset Object](stac-asset) | **REQUIRED.** Source code description. Can describe a github repo, zip archive, etc. The `description` field in the Asset Object should reference the inference function, for example my_package.my_module.predict. Recommended asset `roles` include `code` and `metadata`, since the source code asset might also refer to more detailed metadata than this spec captures. | -| accelerator | [Accelerator Enum](#accelerator-enum) | **REQUIRED.** The intended computational hardware that runs inference. | -| accelerator_constrained | boolean | **REQUIRED.** True if the intended `accelerator` is the only `accelerator` that can run inference. False if other accelerators, such as amd64 (CPU), can run inference. | -| hardware_summary | string | **REQUIRED.** A high level description of the number of accelerators, specific generation of the `accelerator`, or other relevant inference details. | -| container | [Container](#container) | **RECOMMENDED.** Information to run the model in a container instance. | -| commit_hash | string | Hash value pointing to a specific version of the code used to run model inference. If this is supplied, `source code` should also be supplied and the commit hash must refer to a Git repository linked or described in the `source_code` [Asset Object](stac-asset). | -| batch_size_suggestion | number | A suggested batch size for the accelerator and summarized hardware. | - -#### Accelerator Enum - -It is recommended to define `accelerator` with one of the following values: - -- `amd64` models compatible with AMD or Intel CPUs (no hardware specific optimizations) -- `cuda` models compatible with NVIDIA GPUs -- `xla` models compiled with XLA. models trained on TPUs are typically compiled with XLA. -- `amd-rocm` models trained on AMD GPUs -- `intel-ipex-cpu` for models optimized with IPEX for Intel CPUs -- `intel-ipex-gpu` for models optimized with IPEX for Intel GPUs -- `macos-arm` for models trained on Apple Silicon - -[stac-asset]: https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#asset-object - -#### Container Object - -| Field Name | Type | Description | -|----------------|--------|-------------------------------------------------------| -| container_file | string | Url of the container file (Dockerfile). | -| image_name | string | Name of the container image. | -| tag | string | Tag of the image. | -| working_dir | string | Working directory in the instance that can be mapped. | -| run | string | Running command. | +| Field Name | Type | Description | +| ----------- | ------ | ----------------------------------------------------- | +| title | string | Description of the container. | +| href | string | Url of the container file (Dockerfile). | +| type | string | "application/vnd.oci.image.index.v1+json" | +| roles | string | ["runtime"] | +| working_dir | string | Working directory in the instance that can be mapped. | +| run | string | Running command. | If you're unsure how to containerize your model, we suggest starting from the latest official container image for your framework that works with your model and pinning the container version. @@ -173,12 +167,13 @@ You can also use other base images. Pytorch and Tensorflow offer docker images f ### Model Output Object | Field Name | Type | Description | -|--------------------------|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| ------------------------ | --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | task | [Task Enum](#task-enum) | **REQUIRED.** Specifies the Machine Learning task for which the output can be used for. | -| result_array | [[Result Array Object](#result-array-object)] | The list of output arrays/tensors from the model. | -| classification:classes | [[Class Object](#class-object)] | A list of class objects adhering to the [Classification extension](https://github.com/stac-extensions/classification). | +| result_array | [[Result Array Object](#result-array-object)] | The list of output arrays/tensors from the model. | +| classification:classes | [[Class Object](#class-object)] | A list of class objects adhering to the [Classification extension](https://github.com/stac-extensions/classification). | | post_processing_function | string | A url to the postprocessing function where normalization, rescaling, and other operations take place.. Or, instead, the function code path, for example: `my_package.my_module.my_processing_function` | + While only `task` is a required field, all fields are recommended for supervised tasks that produce a fixed shape tensor and have output classes. `image-captioning`, `multi-modal`, and `generative` tasks may not return fixed shape tensors or classes.