diff --git a/README.md b/README.md index 6873633..034b916 100644 --- a/README.md +++ b/README.md @@ -70,7 +70,7 @@ extension to synthesize common use cases into a single reference for Machine Lea | mlm:total_parameters | integer | Total number of model parameters, including trainable and non-trainable parameters. | | mlm:pretrained_source | string | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. | | mlm:summary | string | Text summary of the model and it's purpose. | -| batch_size_suggestion | number | A suggested batch size for the accelerator and summarized hardware. | +| mlm:batch_size_suggestion | number | A suggested batch size for the accelerator and summarized hardware. | In addition, fields from the following extensions must be imported in the item: - [Scientific Extension Specification][stac-ext-sci] to describe relevant publications. @@ -233,6 +233,10 @@ Note that the URI including the specific commit hash, release number or target b other means of referring to checkout procedures, although this specification does not prohibit the use of additional properties to better describe the Asset. +Since the source code of a model provides useful example on how to use it, it is also recommended to define relevant +references to documentation using the `example` extension. +See the [Best Practices - Example Extension](best-practices.md#example-extension) section for more details. + Recommended asset `roles` include `code` and `metadata`, since the source code asset might also refer to more detailed metadata than this specification captures. diff --git a/best-practices.md b/best-practices.md index a9bb661..691d7a9 100644 --- a/best-practices.md +++ b/best-practices.md @@ -13,6 +13,7 @@ models or creating tools to work with STAC. - [Classification Extension](#classification-extension) - [Scientific Extension](#scientific-extension) - [File Extension](#file-extension) + - [Example Extension](#example-extension) - [Version Extension](#version-extension) ## Using STAC Common Metadata Fields for the ML Model Extension @@ -131,6 +132,9 @@ MLM definition to indicate which class values can be contained in the resulting For more details, see the [Model Output Object](README.md#model-output-object) definition. +> [!NOTE] +> Update according to https://github.com/stac-extensions/classification/issues/48 + ### Scientific Extension Provided that most models derive from previous scientific work, it is strongly recommended to employ the @@ -142,19 +146,6 @@ lead to its creation. This extension can also be used for the purpose of publishing new models, by providing to users the necessary details regarding how they should cite its use (i.e.: `sci:citation` field and `cite-as` relation type). -### Version Extension - -In the even that a model is retrained with gradually added annotations or improved training strategies leading to -better performances, the existing model and newer models represented by STAC Items with MLM should also make use of -the [Version Extension](https://github.com/stac-extensions/version). Using the fields and link relation types defined -by this extension, the retraining cycle of the model can better be described, with a full history of the newer versions -developed. - -Additionally, the `version:experimental` field should be considered for models being trained and still under evaluation -before widespread deployment. This can be particularly useful for annotating models experiments during cross-validation -training process to find the "best model". This field could also be used to indicate if a model is provided for -educational purposes only. - ### File Extension In order to provide a reliable and reproducible machine learning pipeline, external references to data required by the @@ -187,3 +178,35 @@ that the model is properly instantiated from the expected weights, or that suffi } } ``` + +### Example Extension + +In order to help users understand how to apply and run the described machine learning model, +the [Example Extension](https://github.com/stac-extensions/example-links#fields) can be used to provide code examples +demonstrating how it can be applied. + +For example, a [Model Card on Hugging Face](https://huggingface.co/docs/hub/en/model-cards) +is often provided (see [Hugging Face Model examples](https://huggingface.co/models)) to describe the model, which +can embed sample code and references to more details about the model. This kind of reference should be added under +the `links` of the STAC Item using MLM. + +Typically, a STAC Item using the MLM extension to describe the training or +inference strategies to apply a model should define the [Source Code Asset](README.md#source-code-asset). +This code is in itself ideal to guide users how to run it, and should therefore be replicated as an `example` link +reference to offer more code samples to execute the model. + +> [!NOTE] +> Update according to https://github.com/stac-extensions/example-links/issues/4 + +### Version Extension + +In the even that a model is retrained with gradually added annotations or improved training strategies leading to +better performances, the existing model and newer models represented by STAC Items with MLM should also make use of +the [Version Extension](https://github.com/stac-extensions/version). Using the fields and link relation types defined +by this extension, the retraining cycle of the model can better be described, with a full history of the newer versions +developed. + +Additionally, the `version:experimental` field should be considered for models being trained and still under evaluation +before widespread deployment. This can be particularly useful for annotating models experiments during cross-validation +training process to find the "best model". This field could also be used to indicate if a model is provided for +educational purposes only.