Contents
- 1 Introduction
- 2 Definition of Terms
- 3 General API Requirements and Conventions
- 4 Responses
- 5 API Endpoints
- 6 API Filtering Format Specification
- 6.1 Lexical Tokens
- 6.2 The Filter Language Syntax
- 6.2.1 Basic boolean operations
- 6.2.2 Numeric and String comparisons
- 6.2.3 Substring comparisons
- 6.2.4 Comparisons of boolean values
- 6.2.5 Comparisons of list properties
- 6.2.6 Nested property names
- 6.2.7 Filtering on relationships
- 6.2.8 Filtering on Properties with an unknown value
- 6.2.9 Precedence
- 6.2.10 Type handling and conversions in comparisons
- 6.2.11 Optional filter features
- 7 Property Definitions
- 8 Entry List
- 8.1 Properties Used by Multiple Entry Types
- 8.2 Structures Entries
- 8.2.1 elements
- 8.2.2 nelements
- 8.2.3 elements_ratios
- 8.2.4 chemical_formula_descriptive
- 8.2.5 chemical_formula_reduced
- 8.2.6 chemical_formula_hill
- 8.2.7 chemical_formula_anonymous
- 8.2.8 dimension_types
- 8.2.9 nperiodic_dimensions
- 8.2.10 lattice_vectors
- 8.2.11 space_group_symmetry_operations_xyz
- 8.2.12 space_group_symbol_hall
- 8.2.13 space_group_symbol_hermann_mauguin
- 8.2.14 space_group_symbol_hermann_mauguin_extended
- 8.2.15 space_group_it_number
- 8.2.16 cartesian_site_positions
- 8.2.17 nsites
- 8.2.18 species_at_sites
- 8.2.19 species
- 8.2.20 assemblies
- 8.2.21 structure_features
- 8.3 Calculations Entries
- 8.4 References Entries
- 8.5 Files Entries
- 8.6 Custom Entry Types
- 8.7 Relationships Used by Multiple Entry Types
- 9 Appendices
As researchers create independent materials databases, much can be gained from retrieving data from multiple databases. However, automating the retrieval of data is difficult if each database has a different application programming interface (API). This document specifies a standard API for retrieving data from materials databases. This API specification has been developed over a series of workshops entitled "Open Databases Integration for Materials Design", held at the Lorentz Center in Leiden, Netherlands and the CECAM headquarters in Lausanne, Switzerland.
The API specification described in this document builds on top of the JSON:API v1.1 specification. More specifically, it defines specific implementation semantics allowed by the JSON:API standard, but which go beyond the restrictions imposed on JSON:API profiles and extensions. The JSON:API specification is assumed to apply wherever it is stricter than what is formulated in this document. Exceptions to this rule are stated explicitly (e.g. non-compliant responses are tolerated if a non-standard response format is explicitly requested).
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
- Database provider
- A service that provides one or more databases with data desired to be made available using the OPTIMADE API.
- Database-provider-specific prefix
- Every database provider is designated a unique prefix. The prefix is used to separate the namespaces used by provider-specific extensions. The list of presently defined prefixes is maintained externally from this specification. For more information, see section Namespace Prefixes.
- Definition provider
- A service that provides one or more external or domain-specific property definitions that can be used by OPTIMADE API implementations.
- Definition provider prefix
- Every definition provider is designated a prefix that cannot clash with an existing database provider prefix. The prefix is used to separate the namespaces used by these collections of definitions. The list of presently defined prefixes is maintained externally from this specification. For more information, see section Namespace Prefixes.
- API implementation
- A realization of the OPTIMADE API that a database provider uses to serve data from one or more databases.
- Identifier
- Names that MUST start with a lowercase letter ([a-z]) or an underscore ("_") followed by any number of lowercase alphanumerics ([a-z0-9]) and underscores ("_").
- Base URL
- The topmost URL under which the API is served. See section Base URL.
- Versioned base URL
- A URL formed by the base URL plus a path segment indicating a version of the API. See section Base URL.
- Entry
- A single instance of a specific type of resource served by the API implementation.
For example, a
structures
entry is comprised by data that belong to a single structure. - Entry type
- Entries are categorized into types, e.g.,
structures
,calculations
,references
. Entry types MUST be named according to the rules for identifiers. - Entry property
- One data item which belongs to an entry, e.g., the chemical formula of a structure.
- Entry property name
- The name of an entry property. Entry property names MUST follow the rules for identifiers and MUST NOT have the same name as any of the entry types.
- Relationship
- Any entry can have one or more relationships with other entries. These are described in section Relationships. Relationships describe links between entries rather than data that belong to a single entry, and are thus regarded as distinct from the entry properties.
- Query filter
- An expression used to influence the entries returned in the response to a URL query.
The filter is specified using the URL query parameter
filter
using a format described in the section API Filtering Format Specification. - Queryable property
- An entry property that can be referred to in the filtering of results. See section API Filtering Format Specification for more information on formulating filters on properties. The section Entry List specifies the REQUIRED level of query support for different properties. If nothing is specified, any support for queries is OPTIONAL.
- ID
The ID entry property is a unique string referencing a specific entry in the database. The following constraints and conventions apply to IDs:
- Taken together, the ID and entry type MUST uniquely identify the entry.
- Reasonably short IDs are encouraged and SHOULD NOT be longer than 255 characters.
- IDs MAY change over time.
- Immutable ID
- A unique string that specifies a specific resource in a database. The string MUST NOT change over time.
- Response format
- The data format for the HTTP response, which can be selected using the
response_format
URL query parameter. For more info, see section Response Format. - Field
- The key used in response formats that return data in associative-array-type data structures. This is particularly relevant for the default JSON-based response format. In this case, field refers to the name part of the name-value pairs of JSON objects.
An API implementation handles data types and their representations in three different contexts:
- In the HTTP URL query filter, see section API Filtering Format Specification.
- In the HTTP response. The default response format is JSON-based and thus uses JSON data types. However, other response formats can use different data types. For more info, see section Responses.
- The underlying database backend(s) from which the implementation serves data.
Hence, entry properties are described in this proposal using context-independent types that are assumed to have some form of representation in all contexts. They are as follows:
- Basic types: string, integer, float, boolean, timestamp.
- list: an ordered collection of items, where all items are of the same type, unless they are unknown. A list can be empty, i.e., contain no items.
- dictionary: an associative array of keys and values, where keys are pre-determined strings, i.e., for the same entry property, the keys remain the same among different entries whereas the values change. The values of a dictionary can be any basic type, list, dictionary, or unknown.
An entry property value that is not present in the database is unknown.
This is equivalently expressed by the statement that the value of that entry property is null
.
For more information see section Properties with an unknown value
The definition of a property of an entry type specifies a type. The value of that property MUST either have a value of that type, or be unknown.
This standard describes a communication protocol that, when implemented by a server, provides clients with an API for data access.
Released versions of the standard are versioned using semantic versioning v2 in reference to changes in that API (i.e., not in the server-side implementation of the protocol).
To clarify: semantic versioning mandates version numbers of the form MAJOR.MINOR.PATCH, where a "backwards incompatible API change" requires incrementing the MAJOR version number. A future version of the OPTIMADE standard can mandate servers to change their behavior to be compliant with the newer version. However, such changes are only considered "backwards incompatible API changes" if they have the potential to break clients that correctly use the API according to the earlier version.
Furthermore, the addition of new keys in key-value-formatted responses of the OPTIMADE API are not regarded as "backwards incompatible API changes." Hence, a client MUST disregard unrecognized keys when interpreting responses (but MAY issue warnings about them). On the other hand, a change of the OPTIMADE standard that fundamentally alters the interpretation of a response due to the presence of a new key will be regarded as a "backwards incompatible API change" since a client interpreting the response according to a prior version of the standard would misinterpret that response.
Working copies distributed as part of the development of the standard are marked with the version number for the release they are based on with an additional "~develop" suffix. These "versions" do not refer to a single specific instance of the text (i.e., the same "~develop" version string is retained until a release), nor is it clear to what degree they contain backwards incompatible API changes. Hence, the suffix is intentionally designed to make these version strings not to conform with semantic versioning to prevent incorrect comparisons to released versions using the scheme prescribed by semantic versioning. Version strings with a "~develop" suffix MAY be used by implementations during testing. However, a client that encounters them unexpectedly SHOULD NOT make any assumptions about the level of API compatibility.
In conclusion, the versioning policy of this standard is designed to allow clients using the OPTIMADE API according to a specific version of the standard to assume compatibility with servers implementing any future (non-development) version of the standard sharing the same MAJOR version number.
Each database provider will publish one or more base URLs that serve the API, for example: http://example.com/optimade/. Every URL path segment that follows the base URL MUST behave as standardized in this API specification.
Access to the API is primarily provided under versioned base URLs.
An implementation MUST provide access to the API under a URL where the first path segment appended to the base URL is /vMAJOR
, where MAJOR
is one of the major version numbers of the API that the implementation supports.
This URL MUST serve the latest minor/patch version supported by the implementation.
For example, the latest minor and patch version of major version 1 of the API is served under /v1
.
An implementation MAY also provide versioned base URLs on the forms /vMAJOR.MINOR
and /vMAJOR.MINOR.PATCH
.
Here, MINOR
is the minor version number and PATCH
is the patch version number of the API.
A URL on the form /vMAJOR.MINOR
MUST serve the latest patch version supported by the implementation of this minor version.
API versions that are published with a suffix, e.g., -rc<number>
to indicate a release candidate version, SHOULD be served on versioned base URLs without this suffix.
If a request is made to a versioned base URL that begins with /v
and an integer followed by any other characters, indicating a version that the implementation does not recognize or support, the implementation SHOULD respond with the custom HTTP server error status code 553 Version Not Supported
, preferably along with a user-friendly error message that directs the client to adapt the request to a version it provides.
It is the intent that future versions of this standard will not assign different meanings to URLs that begin with /v
and an integer followed by other characters.
Hence, a client can safely attempt to access a specific version of the API via the corresponding versioned base URL.
For other forms of version negotiation, see section Version Negotiation.
Examples of valid versioned base URLs:
Examples of invalid versioned base URLs:
Database providers SHOULD strive to implement the latest released version of this standard, as well as the latest patch version of any major and minor version they support.
Note: The base URLs and versioned base URLs themselves are not considered part of the API, and the standard does not specify the response for a request to them. However, it is RECOMMENDED that implementations serve a human-readable HTML document on base URLs and versioned base URLs, which explains that the URL is an OPTIMADE URL meant to be queried by an OPTIMADE client.
Implementations MAY also provide access to the API on the unversioned base URL as described in this subsection.
Access via the unversioned URL is primarily intended for (i) convenience when manually interacting with the API, and (ii) to provide version agnostic permanent links to resource objects. Clients that perform automated processing of responses SHOULD access the API via versioned base URLs.
Implementations serving the API on the unversioned base URL have a few alternative options:
- Direct access MAY be provided to the full API.
- Requests to endpoints under the unversioned base URL MAY be redirected using an HTTP 307 temporary redirect to the corresponding endpoints under a versioned base URL.
- Direct access MAY be limited to only single entry endpoints (see section Single Entry Endpoints), i.e., so that this form of access is only available for permanent links to resource objects.
Implementations MAY combine direct access to single entry endpoints with redirects for other API queries.
The client MAY provide a query parameter api_hint
to hint the server about a preferred API version.
When this parameter is provided, the request is to be handled as described in section Version Negotiation, which allows a "best suitable" version of the API to be selected to serve the request (or forward the request to).
However, if api_hint
is not provided, the implementation SHOULD serve (or redirect to) its preferred version of the API (i.e., the latest, most mature, and stable version).
In this case, that version MUST also be the first version in the response of the versions
endpoint (see section Versions Endpoint).
For implementers: Before enabling access to the API on unversioned base URLs, implementers are advised to consider that an upgrade of the major version of the API served this way can change the behaviors of associated endpoints in ways that are not backward compatible.
The OPTIMADE API provides three concurrent mechanisms for version negotiation between client and server.
- The
versions
endpoint served directly under the unversioned base URL allows a client to discover all major API versions supported by a server in the order of preference (see section Versions Endpoint). - A client can access the API under versioned base URLs. In this case, the server MUST respond according to the specified version or return an error if the version is not supported (see section Versioned Base URLs).
- When accessing the API under the unversioned base URL, clients are encouraged to append the OPTIONAL query parameter
api_hint
to hint the server about a preferred API version for the request. This parameter is described in more detail below.
The api_hint
query parameter MUST be accepted by all API endpoints.
However, for endpoints under a versioned base URL the request MUST be served as usual according to the version specified in the URL path segment regardless of the value of api_hint
.
In this case, the server MAY issue a warning if the value of api_hint
suggests that the query may not be properly supported.
If the client provides the parameter, the value SHOULD have the format vMAJOR
or vMAJOR.MINOR
, where MAJOR is a major version and MINOR is a minor version of the API.
For example, if a client appends api_hint=v1.0
to the query string, the hint provided is for major version 1 and minor version 0.
If the server supports the major version indicated by the api_hint
parameter at the same or a higher minor version (if provided), it SHOULD serve the request using this version.
If the server does not support the major version hinted, or if it supports the major version but only at a minor version below the one hinted, it MAY use the provided values to make a best-effort attempt at still serving the request, e.g., by invoking the closest supported version of the API.
If the hinted version is not supported by the server and the request is not served using an alternative version, the server SHOULD respond with the custom HTTP server error status code 553 Version Not Supported
.
Note that the above protocol means that clients MUST NOT expect that a returned response is served according to the version that is hinted.
For end users: Users are strongly encouraged to include the api_hint
query parameter for URLs in, e.g., journal publications for queries on endpoints under the unversioned base URL.
The version hint will make it possible to serve such queries in a reasonable way even after the server changes the major API version used for requests without version hints.
A database provider MAY publish a special Index Meta-Database base URL. The main purpose of this base URL is to allow for automatic discoverability of all databases of the provider. Thus, it acts as a meta-database for the database provider's implementation(s).
The index meta-database MUST only provide the info
and links
endpoints, see sections Info Endpoints and Links Endpoint.
It MUST NOT expose any entry listing endpoints (e.g., structures
).
These endpoints do not need to be queryable, i.e., they MAY be provided as static JSON files. However, they MUST return the correct and updated information on all currently provided implementations.
The is_index
field under attributes
as well as the relationships
field, MUST be included in the info
endpoint for the index meta-database (see section Base Info Endpoint).
The value for is_index
MUST be true
.
A few suggestions and mandatory requirements of the OPTIMADE specification are specifically relaxed only for index meta-databases to make it possible to serve them in the form of static files on restricted third-party hosting platforms:
- When serving an index meta-database in the form of static files, it is RECOMMENDED that the response excludes the subfields in the top-level
meta
field that would need to be dynamically generated (as described in the section JSON Response Schema: Common Fields.) The motivation is that static files cannot keep dynamic fields such astime_stamp
updated. - The JSON:API specification requirements on content negotiation using the HTTP headers
Content-Type
andAccept
are NOT mandatory for index meta-databases. Hence, API Implementations MAY ignore the content of these headers and respond to all requests. The motivation is that static file hosting is typically not flexible enough to support these requirements on HTTP headers. - API implementations SHOULD serve JSON content with either the JSON:API mandated HTTP header
Content-Type: application/vnd.api+json
orContent-Type: application/json
. However, if the hosting platform does not allow this, JSON content MAY be served withContent-Type: text/plain
.
Note: A list of database and definition providers acknowledged by the Open Databases Integration for Materials Design consortium is maintained externally from this specification and can be retrieved as described in section Namespace Prefixes. This list is also machine-readable, enabling the automatic discoverability of OPTIMADE API services.
There are two mechanisms by which a provider can serve properties that are not standardized by the OPTIMADE specification.
- By serving properties under a database-provider-specific namespace prefix. This is the preferred mechanism for serving properties that are specific to a particular database provider.
- By adopting a property definition external to the specification by a definition provider. This is the preferred mechanism in cases where a database-specific field aligns with a field that is already defined by a definition provider, and can be used to enable aggregated filtering over all OPTIMADE APIs that support this property.
A list of known database and definition providers and their assigned prefixes is published in the form of an OPTIMADE Index Meta-Database with base URL https://providers.optimade.org. Visiting this URL in a web browser gives a human-readable description of how to retrieve the information in the form of a JSON file, and specifies the procedure for registration of new prefixes. A human-readable dashboard is also hosted at https://www.optimade.org/providers-dashboard.
API implementations SHOULD NOT make up and use new prefixes without first getting them registered in the official list.
Examples:
- A database-provider-specific prefix:
exmpl
. Used as a field name in a response:_exmpl_custom_field
. - A definition-provider prefix:
dft
. Used as a field name in a response by multiple different providers:_dft_cell_volume
(note: this is a hypothetical example).
The initial underscore indicates an identifier that is under a separate namespace under the ownership of that organization or definition provider. Identifiers prefixed with underscores will not be used for standardized names.
This standard refers to database-provider-specific prefixes and database providers.
Database-provider-specific fields only need to be consistent within the context of one particular database.
Providers that serve multiple databases MAY use the same provider-specific field names with different meanings in different databases.
For example, a provider may use the field _exmpl_band_gap
to mean a computed band gap in one their databases, and a measured band gap in another database.
Database-provider-specific fields SHOULD be fully described at the relevant /info/<entry_type>
endpoint (see section Entry Listing Info Endpoints)
This standard refers to definition-provider-specific prefixes and definition providers.
Definition providers MUST provide a canonical property definition for all custom fields they define using the OPTIMADE Property Definitions format.
Definition providers MUST also list these definitions in the relevant /info/<entry_type>
endpoint of the index meta-database for that provider.
They MAY also provide human-readable webpages for their definitions.
Definition-provider-specific fields MAY be fully described at the relevant /info/<entry_type>
endpoint (see section Entry Listing Info Endpoints), but can also rely on the canonical definitions provided by the definition provider, provided they return an $id
for the field that resolves to the relevant OPTIMADE property definition.
Clients SHOULD encode URLs according to RFC 3986. API implementations MUST decode URLs according to RFC 3986.
The API implementation MAY describe many-to-many relationships between entries along with OPTIONAL human-readable descriptions that describe each relationship. These relationships can be to the same, or to different, entry types. Response formats have to encode these relationships in ways appropriate for each format.
In the default response format, relationships are encoded as JSON:API Relationships, see section Entry Listing JSON Response Schema.
For implementers: For database-specific response formats without a dedicated mechanism to indicate relationships, it is suggested that they are encoded alongside the entry properties. For each entry type, the relationships with entries of that type can then be encoded in a field with the name of the entry type, which are to contain a list of the IDs of the referenced entries alongside the respective human-readable description of the relationships. It is the intent that future versions of this standard uphold the viability of this encoding by not standardizing property names that overlap with the entry type names.
Many databases allow specific data values to exist for some of the entries, whereas for others, no data value is present.
This is referred to as the property having an unknown value, or equivalently, that the property value is null
.
The text in this section describes how the API handles properties with the value null
.
The use of null
values inside nested property values (such as, e.g., lists or dictionaries) are described in the definitions of those data structures elsewhere in the specification, see section Entry List.
For these properties, null
MAY carry a special meaning.
REQUIRED properties with an unknown value MUST be included and returned in the response with the value null
.
OPTIONAL properties with an unknown value, if requested explicitly via the response_fields
query parameter, MUST be included and returned in the response with the value null
.
(For more info on the response_fields
query parameter, see section Entry Listing URL Query Parameters.)
The interaction of properties with an unknown value with query filters is described in the section Filtering on Properties with an unknown value.
In particular, filters with IS UNKNOWN
and IS KNOWN
can be used to match entries with values that are, or are not, unknown for some property, respectively.
When an implementation receives a request with a query filter that refers to an unknown property name it is handled differently depending on the database-specific prefix:
- If the property name has no database-specific prefix, or if it has the database-specific prefix that belongs to the implementation itself, the error
400 Bad Request
MUST be returned with a message indicating the offending property name. - If the property name has a database-specific prefix that does not belong to the implementation itself, it MUST NOT treat this as an error, but rather MUST evaluate the query with the property treated as unknown, i.e., comparisons are evaluated as if the property has the value
null
.- Furthermore, if the implementation does not recognize the prefix at all, it SHOULD return a warning that indicates that the property has been handled as unknown.
- On the other hand, if the prefix is recognized, i.e., as belonging to a known database provider, the implementation SHOULD NOT issue a warning but MAY issue diagnostic output with a note explaining how the request was handled.
The rationale for treating properties from other databases as unknown rather than triggering an error is for OPTIMADE to support queries using database-specific properties that can be sent to multiple databases.
For example, the following query can be sent to API implementations exmpl1
and exmpl2
without generating any errors:
filter=_exmpl1_band_gap<2.0 OR _exmpl2_band_gap<2.5
A property value may be too large to fit in a single response. OPTIMADE provides a mechanism for a client to handle such properties by fetching them in separate series of requests. It is up to the implementation to decide which values are too large to represent in a single response, and this decision MAY change between responses.
In this case, the response to the initial query gives the value null
for the property.
A list of one or more data URLs together with their respective partial data formats are given in the response.
How this list is provided is response format-dependent.
For the JSON response format, see the description of the partial_data_links
field, nested under data
and then meta
, in the section JSON Response Schema: Common Fields.
The default partial data format is named "jsonlines" and is described in the Appendix OPTIMADE JSON lines partial data format. An implementation SHOULD always include this format as one of the partial data formats provided for a property that has been omitted from the response to the initial query. Implementations MAY provide links to their own non-standard formats, but non-standard format names MUST be prefixed by a database-provider-specific prefix.
Below follows an example of the data
and meta
parts of a response using the JSON response format that communicates that the property value has been omitted from the response, with three different links for different partial data formats provided.
{
// ...
"data": {
"type": "structures",
"id": "2345678",
"attributes": {
"a": null
}
"meta": {
"partial_data_links": {
"a": [
{
"format": "jsonlines",
"link": "https://example.org/optimade/v1.2/extensions/partial_data/structures/2345678/a/default_format"
},
{
"format": "_exmpl_bzip2_jsonlines",
"link": "https://db.example.org/assets/partial_values/structures/2345678/a/bzip2_format"
},
{
"format": "_exmpl_hdf5",
"link": "https://cloud.example.org/ACCHSORJGIHWOSJZG"
}
]
}
}
}
// ...
}
A metadata property represents entry and property-specific metadata for a given entry.
How these are communicated in the response depends on the response format.
For the JSON response format, the metadata properties are stored in the resource object metadata field, meta
in a dictionary field property_metadata
with the keys equal to the names of the respective properties for which metadata is available, see JSON Response Schema: Common Fields.
The format of the metadata property is specified by the field x-optimade-metadata-definition
in the Property Definition of the field, see Property Definitions.
Database providers are allowed to define their own metadata properties in x-optimade-metadata-definition
, but they MUST use the database-provider-specific prefix even for metadata of database-specific fields.
For example, the metadata property definition of the field _exmpl_example_field
MUST NOT define a metadata field named, e.g., accuracy
; the field rather needs to be named, e.g., _exmpl_accuracy
.
The reason for this limitation is to avoid name collisions with metadata fields defined by the OPTIMADE standard in the future that apply also to database-specific data fields.
Implementation of the meta
field is OPTIONAL.
However, when an implementation supports the property_metadata
field, it SHOULD include metadata fields for all properties which have metadata and are present in the data part of the response.
Example of a response in the JSON response format with two structure entries that each include a metadata property for the attribute field elements_ratios
and the database-specific per entry metadata field _exmpl_originates_from_project
:
{
"data": [
{
"type": "structures",
"id": "example.db:structs:0001",
"attributes": {
"elements_ratios":[0.33336, 0.22229, 0.44425]
},
"meta": {
"property_metadata": {
"elements_ratios": {
"_exmpl_originates_from_project": "piezoelectic_perovskites"
}
}
}
},
{
"type": "structures",
"id": "example.db:structs:1234",
"attributes": {
"elements_ratios":[0.5, 0.5]
},
"meta": {
"property_metadata":{
"elements_ratios": {
"_exmpl_originates_from_project": "ferroelectric_binaries"
}
}
}
}
//...
]
// ...
}
Example of the corresponding metadata property definition contained in the field x-optimade-metadata-definition
which is placed in the property definition of elements_ratios
:
// ...
"x-optimade-metadata-definition": {
"title": "Metadata for the elements_ratios field",
"description": "This field contains the per-entry metadata for the elements_ratios field.",
"x-optimade-type": "dictionary",
"x-optimade-unit": "inapplicable",
"type": ["object", "null"],
"properties" : {
"_exmpl_originates_from_project": {
"$id": "https://properties.example.com/v1.2.0/elements_ratios_meta/_exmpl_originates_from_project",
"description" : "A string naming the internal example.com project id where this property was added to the database.",
"x-optimade-type": "string",
"x-optimade-unit" : "inapplicable",
"type": ["string", "null"]
}
}
}
// ...
This section defines a JSON response format that complies with the JSON:API v1.1 specification. All endpoints of an API implementation MUST be able to provide responses in the JSON format specified below and MUST respond in this format by default.
Each endpoint MAY support additional formats, and SHOULD declare these formats under the endpoint /info/<entry type>
(see section Entry Listing Info Endpoints).
Clients can request these formats using the response_format
URL query parameter.
Specifying a response_format
different from json
(e.g. response_format=xml
) allows the API to break conformance not only with the JSON response format specification, but also, e.g., in terms of how content negotiation is implemented.
Database-provider-specific and definition-provider-specific response_format
identifiers MUST include the corresponding prefix (see section Namespace Prefixes).
In the JSON response format, property types translate as follows:
- string, boolean, list are represented by their similarly named counterparts in JSON.
- integer, float are represented as the JSON number type.
- timestamp uses a string representation of date and time as defined in RFC 3339 Internet Date/Time Format.
- dictionary is represented by the JSON object type.
- unknown properties are represented by either omitting the property or by a JSON
null
value.
Every response SHOULD contain the following fields, and MUST contain at least meta
:
meta: a JSON:API meta member that contains JSON:API meta objects of non-standard meta-information. It MUST be a dictionary with these fields:
- api_version: a string containing the full version of the API implementation.
The version number string MUST NOT be prefixed by, e.g., "v".
Examples:
1.0.0
,1.0.0-rc.2
. - query: information on the query that was requested.
It MUST be a dictionary with this field:
- representation: a string with the part of the URL following the versioned or unversioned base URL that serves the API.
Query parameters that have not been used in processing the request MAY be omitted.
In particular, if no query parameters have been involved in processing the request, the query part of the URL MAY be excluded.
Example:
/structures?filter=nelements=2
.
- representation: a string with the part of the URL following the versioned or unversioned base URL that serves the API.
Query parameters that have not been used in processing the request MAY be omitted.
In particular, if no query parameters have been involved in processing the request, the query part of the URL MAY be excluded.
Example:
- more_data_available:
false
if the response contains all data for the request (e.g., a request issued to a single entry endpoint, or afilter
query at the last page of a paginated response) andtrue
if the response is incomplete in the sense that multiple objects match the request, and not all of them have been included in the response (e.g., a query with multiple pages that is not at the last page).
meta
SHOULD also include these fields:time_stamp: a timestamp containing the date and time at which the query was executed.
data_returned: an integer containing the total number of data resource objects returned for the current
filter
query, independent of pagination.provider: information on the database provider of the implementation. It MUST be a dictionary with these fields:
- name: a short name for the database provider.
- description: a longer description of the database provider.
- prefix: database-provider-specific prefix (see section Database-Provider-Specific Namespace Prefixes).
provider
MAY include these fields:- homepage: a JSON API link, pointing to the homepage of the database provider, either directly as a string, or as an object which can contain the following fields:
- href: a string containing the homepage URL.
- meta: a meta object containing non-standard meta-information about the database provider's homepage.
meta
MAY also include these fields:data_available: an integer containing the total number of data resource objects available in the database for the endpoint.
last_id: a string containing the last ID returned.
response_message: response string from the server.
request_delay: a non-negative float giving time in seconds that the client is suggested to wait before issuing a subsequent request.
Implementation note: the functionality of this field overlaps to some degree with features provided by the HTTP error
429 Too Many Requests
and the Retry-After HTTP header. Implementations are suggested to provide consistent handling of request overload through both mechanisms.database: a dictionary describing the specific database accessible at this OPTIMADE API. If provided, the dictionary fields SHOULD match those provided in the corresponding links entry for the database in the provider's index meta-database, outlined in Links Endpoint JSON Response Schema. The dictionary can contain the following OPTIONAL fields:
- id: the identifier of this database within those served by this provider, i.e., the ID under which this database is served in this provider's index meta-database.
- name: a human-readable name for the database, e.g., for use in clients.
- version: a string describing the version of the database.
- description: a human-readable description of the database, e.g., for use in clients.
- homepage: a JSON API link, pointing to a homepage for the particular database.
- maintainer: a dictionary providing details about the maintainer of the database, which MUST contain the single field:
- email with the maintainer's email address.
implementation: a dictionary describing the server implementation, containing the OPTIONAL fields:
- name: name of the implementation.
- version: version string of the current implementation.
- homepage: a JSON API link, pointing to the homepage of the implementation.
- source_url: a JSON API link pointing to the implementation source, either downloadable archive or version control system.
- maintainer: a dictionary providing details about the maintainer of the implementation, MUST contain the single field:
- email with the maintainer's email address.
- issue_tracker: a JSON API link pointing to the implementation's issue tracker.
warnings: a list of warning resource objects representing non-critical errors or warnings. A warning resource object is defined similarly to a JSON:API error object, but MUST also include the field
type
, which MUST have the value"warning"
. The fielddetail
MUST be present and SHOULD contain a non-critical message, e.g., reporting unrecognized search attributes or deprecated features. The fieldstatus
, representing an HTTP response status code, MUST NOT be present for a warning resource object. This is an exclusive field for error resource objects.Example for a deprecation warning:
{ "id": "dep_chemical_formula_01", "type": "warning", "code": "_exmpl_dep_chemical_formula", "title": "Deprecation Warning", "detail": "chemical_formula is deprecated, use instead chemical_formula_hill" }
Note: warning
id
s MUST NOT be trusted to identify the exceptional situations (i.e., they are not error codes), use instead the fieldcode
for this. Warningid
s can only be trusted to be unique in the list of warning resource objects, i.e., together with thetype
.General OPTIMADE warning codes are specified in section Warnings.
Other OPTIONAL additional information global to the query that is not specified in this document, MUST start with a database-provider-specific prefix (see section Database-Provider-Specific Namespace Prefixes).
Example for a request made to
http://example.com/optimade/v1/structures/?filter=a=1 AND b=2
:{ "meta": { "query": { "representation": "/structures/?filter=a=1 AND b=2" }, "api_version": "1.0.0", "schema": "http://schemas.optimade.org/openapi/v1/optimade.json", "time_stamp": "2007-04-05T14:30:20Z", "data_returned": 10, "data_available": 10, "more_data_available": false, "provider": { "name": "Example provider", "description": "Provider used for examples, not to be assigned to a real database", "prefix": "exmpl", "homepage": "http://example.com" }, "implementation": { "name": "exmpl-optimade", "version": "0.1.0", "source_url": "http://git.example.com/exmpl-optimade", "maintainer": { "email": "[email protected]" }, "issue_tracker": "http://tracker.example.com/exmpl-optimade" }, "database": { "id": "example_db", "name": "Example database 1 (of many)", "description": "The first example database in a series hosted by the Example Provider.", "homepage": "http://database_one.example.com", "maintainer": { "email": "[email protected]" } } } // ... }
schema: a JSON:API links object that points to a schema for the response. If it is a string, or a dictionary containing no
meta
field, the provided URL MUST point at an OpenAPI schema. It is possible that future versions of this specification allow for alternative schema types. Hence, if themeta
field of the JSON:API links object is provided and contains a fieldschema_type
that is not equal to the stringOpenAPI
the client MUST NOT handle failures to parse the schema or to validate the response against the schema as errors.Note: The
schema
field was previously RECOMMENDED in all responses, but is now demoted to being OPTIONAL since there now is a standard way of specifying a response schema in JSON:API through thedescribedby
subfield of the top-levellinks
field.
- api_version: a string containing the full version of the API implementation.
The version number string MUST NOT be prefixed by, e.g., "v".
Examples:
data: The schema of this value varies by endpoint, it can be either a single JSON:API resource object or a list of JSON:API resource objects. Every resource object needs the
type
andid
fields, and its attributes (described in section API Endpoints) need to be in a dictionary corresponding to theattributes
field.Every resource object MAY also contain a
meta
field which MAY contain the following keys:property_metadata: an object containing per-entry and per-property metadata. The keys are the names of the fields in
attributes
for which metadata is available. The values belonging to these keys are dictionaries containing the relevant metadata fields. See also Metadata propertiespartial_data_links: an object used to list links which can be used to fetch data that has been omitted from the
data
part of the response. The keys are the names of the fields inattributes
for which partial data links are available. Each value is a list of objects that MUST have the following keys:- format: String.
The name of the format provided via this link.
For one of the objects this
format
field SHOULD have the value "jsonlines", which refers to the format in OPTIMADE JSON lines partial data format. - link: String. A JSON API link that points to a location from which the omitted data can be fetched. There is no requirement on the syntax or format for the link URL.
For more information about the mechanism to transmit large property values, including an example of the format of
partial_data_links
, see Transmission of large property values.- format: String.
The name of the format provided via this link.
For one of the objects this
The response MAY also return resources related to the primary data in the field:
links: a JSON API links object is REQUIRED for implementing pagination. (see section Entry Listing URL Query Parameters.) Each field of a links object, i.e., a "link", MUST be one of:
null
- a string representing a URI, or
- a dictionary ("link object") with fields
- href: a string representing a URI
- meta: (OPTIONAL) a meta object containing non-standard meta-information about the link
Example links objects:
base_url: a links object representing the base URL of the implementation. Example:
{ "links": { "base_url": { "href": "http://example.com/optimade", "meta": { "_exmpl_db_version": "3.2.1" } } // ... } // ... }
The
links
field SHOULD include the following links objects:describedby: a links object giving the URL for a schema that describes the response. The URL SHOULD resolve into a JSON formatted response returning a JSON object with top level
$schema
and/or$id
fields that can be used by the client to identify the schema format.Note: This field is the standard facility in JSON:API to communicate a response schema. It overlaps in function with the field
schema
in the top levelmeta
field.
The following fields are REQUIRED for implementing pagination:
- next: represents a link to fetch the next set of results.
When the current response is the last page of data, this field MUST be either omitted or
null
-valued.
An implementation MAY also use the following reserved fields for pagination. They represent links in a similar way as for
next
.- prev: the previous page of data.
null
or omitted when the current response is the first page of data. - last: the last page of data.
- first: the first page of data.
Finally, the
links
field MAY also include the following links object:- self: a links object giving the URL from which the response was obtained.
included: a list of JSON:API resource objects related to the primary data contained in
data
. Responses that contain related resources underincluded
are known as compound documents in the JSON:API.The definition of this field is found in the JSON:API specification. Specifically, if the query parameter
include
is included in the request,included
MUST NOT include unrequested resource objects. For further information on the parameterinclude
, see section Entry Listing URL Query Parameters.This value MUST be either an empty array or an array of related resource objects.
If there were errors in producing the response all other fields MAY be present, but the top-level data
field MUST be skipped, and the following field MUST be present:
- errors: a list of JSON:API error objects, where the field
detail
MUST be present. All other fields are OPTIONAL.
An example of a full response:
{
"links": {
"next": null,
"base_url": {
"href": "http://example.com/optimade",
"meta": {
"_exmpl_db_version": "3.2.1"
}
}
},
"meta": {
"query": {
"representation": "/structures?filter=a=1 AND b=2"
},
"api_version": "1.0.0",
"time_stamp": "2007-04-05T14:30:20Z",
"data_returned": 10,
"data_available": 10,
"last_id": "xy10",
"more_data_available": false,
"provider": {
"name": "Example provider",
"description": "Provider used for examples, not to be assigned to a real database",
"prefix": "exmpl",
"homepage": {
"href": "http://example.com",
"meta": {
"_exmpl_title": "This is an example site"
}
}
},
// <OPTIONAL implementation- or database-provider-specific metadata, global to the query>
},
"data": [
// ...
],
"included": [
// ...
]
}
@context: A JSON-LD context that enables interpretation of data in the response as linked data. If provided, it SHOULD be one of the following:
- An object conforming to a JSON-LD standard, which includes a
@version
field specifying the version of the standard. - A string containing a URL that resolves to such an object.
- An object conforming to a JSON-LD standard, which includes a
jsonapi: A JSON:API object. The
version
subfield SHOULD be"1.1"
. Themeta
subfield SHOULD be included and contain the following subfields:- api: A string with the value "OPTIMADE".
- api-version: A string with the full version of the OPTIMADE standard that the processing and response adheres to.
This MAY be the version indicated at the top of this document, but MAY also be another version if the client, e.g., has used the query parameter
api_hint
to request processing according to another version.
If the server is able to handle serialization in such a way that it can dictate the order of the top level object members in the response, it is RECOMMENDED to put the
jsonapi
as the first top level member to simplify identification of the response.
All HTTP response status codes MUST conform to RFC 7231: HTTP Semantics. The code registry is maintained by IANA and can be found here.
See also the JSON:API definitions of responses when fetching data, i.e., sending an HTTP GET request.
Important: If a client receives an unexpected 404 error when making a query to a base URL, and is aware of the index meta-database that belongs to the database provider (as described in section Index Meta-Database), the next course of action SHOULD be to fetch the resource objects under the links
endpoint of the index meta-database and redirect the original query to the corresponding database ID that was originally queried, using the object's base_url
value.
There are relevant use-cases for allowing data served via OPTIMADE to be accessed from in-browser JavaScript, e.g. to enable server-less data aggregation.
For such use, many browsers need the server to include the header Access-Control-Allow-Origin: *
in its responses, which indicates that in-browser JavaScript access is allowed from any site.
Non-critical exceptional situations occurring in the implementation SHOULD be reported to the referrer as warnings. Warnings MUST be expressed as a human-readable message, OPTIONALLY coupled with a warning code.
Warning codes starting with an alphanumeric character are reserved for general OPTIMADE error codes (currently, none are specified).
For implementation-specific warnings, they MUST start with _
and the database-provider-specific prefix of the implementation (see section Database-Provider-Specific Namespace Prefixes).
Access to API endpoints as described in the subsections below are to be provided under the versioned and/or the unversioned base URL as explained in the section Base URL.
The endpoints are:
- a
versions
endpoint - an "entry listing" endpoint
- a "single entry" endpoint
- an introspection
info
endpoint - an "entry listing" introspection
info
endpoint - a
links
endpoint to discover related implementations - a custom
extensions
endpoint prefix
These endpoints are documented below.
Query parameters to the endpoints are documented in the respective subsections below.
However, in addition, all API endpoints MUST accept the api_hint
parameter described under Version Negotiation.
The versions
endpoint aims at providing a stable and future-proof way for a client to discover the major versions of the API that the implementation provides.
This endpoint is special in that it MUST be provided directly on the unversioned base URL at /versions
and MUST NOT be provided under the versioned base URLs.
The response to a query to this endpoint is in a restricted subset of the RFC 4180 CSV (text/csv; header=present) format. The restrictions are: (i) field values and header names MUST NOT contain commas, newlines, or double quote characters; (ii) Field values and header names MUST NOT be enclosed by double quotes; (iii) The first line MUST be a header line. These restrictions allow clients to parse the file line-by-line, where each line can be split on all occurrences of the comma ',' character to obtain the head names and field values.
In the present version of the API, the response contains only a single field that is used to list the major versions of the API that the implementation supports.
The CSV format header line MUST specify version
as the name for this field.
However, clients MUST accept responses that include other fields that follow the version.
The major API versions in the response are to be ordered according to the preference of the API implementation. If a version of the API is served on the unversioned base URL as described in the section Base URL, that version MUST be the first value in the response (i.e., it MUST be on the second line of the response directly following the required CSV header).
It is the intent that all future versions of this specification retain this endpoint, its restricted CSV response format, and the meaning of the first field of the response.
Example response:
version
1
0
The above response means that the API versions 1 and 0 are served under the versioned base URLs /v1
and /v0
, respectively.
The order of the versions indicates that the API implementation regards version 1 as preferred over version 0.
If the API implementation allows access to the API on the unversioned base URL, this access has to be to version 1, since the number 1 appears in the first (non-header) line.
Entry listing endpoints return a list of resource objects representing entries of a specific type. For example, a list of structures, or a list of calculations.
Each entry in the list includes a set of properties and their corresponding values. The section Entry list specifies properties as belonging to one of three categories:
- Properties marked as REQUIRED in the response. These properties MUST always be present for all entries in the response.
- Properties marked as REQUIRED only if the query parameter
response_fields
is not part of the request, or if they are explicitly requested inresponse_fields
. Otherwise they MUST NOT be included. One can think of these properties as constituting a default value forresponse_fields
when that parameter is omitted. - Properties not marked as REQUIRED in any case, MUST be included only if explicitly requested in the query parameter
response_fields
. Otherwise they SHOULD NOT be included.
Examples of valid entry listing endpoint URLs:
There MAY be multiple entry listing endpoints, depending on how many types of entries an implementation provides. Specific standard entry types are specified in section Entry list.
The API implementation MAY provide other entry types than the ones standardized in this specification.
Such entry types MUST be prefixed by a database-provider-specific prefix (i.e., the resource objects' type
value should start with the database-provider-specific prefix, e.g., type
= _exmpl_workflows
).
Each custom entry type SHOULD be served at a corresponding entry listing endpoint under the versioned or unversioned base URL that serves the API with the same name (i.e., equal to the resource objects' type
value, e.g., /_exmpl_workflows
).
It is RECOMMENDED to align with the OPTIMADE API specification practice of using a plural for entry resource types and entry type endpoints.
Any custom entry listing endpoint MUST also be added to the available_endpoints
and entry_types_by_format
attributes of the Base Info Endpoint.
For more on custom endpoints, see Custom Extension Endpoints.
The client MAY provide a set of URL query parameters in order to alter the response and provide usage information. While these URL query parameters are OPTIONAL for clients, API implementations MUST accept and handle them. To adhere to the requirement on implementation-specific URL query parameters of JSON:API v1.1, query parameters that are not standardized by that specification have been given names that consist of at least two words separated by an underscore (a LOW LINE character '_').
Standard OPTIONAL URL query parameters standardized by the JSON:API specification:
filter: a filter string, in the format described below in section API Filtering Format Specification.
page_limit: sets a numerical limit on the number of entries returned. See JSON:API 1.1. The API implementation MUST return no more than the number specified. It MAY return fewer. The database MAY have a maximum limit and not accept larger numbers (in which case the
403 Forbidden
error code MUST be returned). The default limit value is up to the API implementation to decide. Example:http://example.com/optimade/v1/structures?page_limit=100
page_{offset, number, cursor, above, below}: A server MUST implement pagination in the case of no user-specified
sort
parameter (via thelinks
response field, see section JSON Response Schema: Common Fields). A server MAY implement pagination in concert withsort
. The following parameters, all prefixed by "page_", are RECOMMENDED for use with pagination. If an implementation chooses- offset-based pagination: using
page_offset
andpage_limit
is RECOMMENDED. - cursor-based pagination: using
page_cursor
andpage_limit
is RECOMMENDED. - page-based pagination: using
page_number
andpage_limit
is RECOMMENDED. It is RECOMMENDED that the first page has number 1, i.e., thatpage_number
is 1-based. - value-based pagination: using
page_above
/page_below
andpage_limit
is RECOMMENDED.
Examples (all OPTIONAL behavior a server MAY implement):
- skip 50 structures and fetch up to 100:
/structures?page_offset=50&page_limit=100
. - fetch page 2 of up to 50 structures per page:
/structures?page_number=2&page_limit=50
. - fetch up to 100 structures above sort-field value 4000 (in this example, server chooses to fetch results sorted by increasing
id
, sopage_above
value refers to anid
value):/structures?page_above=4000&page_limit=100
.
- offset-based pagination: using
sort: If supporting sortable queries, an implementation MUST use the
sort
query parameter with format as specified by JSON:API 1.1.An implementation MAY support multiple sort fields for a single query. If it does, it again MUST conform to the JSON:API 1.1 specification.
If an implementation supports sorting for an entry listing endpoint, then the
/info/<entries>
endpoint MUST include, for each field name<fieldname>
in itsdata.properties.<fieldname>
response value that can be used for sorting, the keysortable
with valuetrue
. If a field name under an entry listing endpoint supporting sorting cannot be used for sorting, the server MUST either leave out thesortable
key or set it equal tofalse
for the specific field name. The set of field names, withsortable
equal totrue
are allowed to be used in the "sort fields" list according to its definition in the JSON:API 1.1 specification. The fieldsortable
is in addition to each property description and other OPTIONAL fields. An example is shown in section Entry Listing Info Endpoints.include: A server MAY implement the JSON:API concept of returning compound documents by utilizing the
include
query parameter as specified by JSON:API 1.0.All related resource objects MUST be returned as part of an array value for the top-level
included
field, see section JSON Response Schema: Common Fields.The value of
include
MUST be a comma-separated list of "relationship paths", as defined in the JSON:API. If relationship paths are not supported, or a server is unable to identify a relationship path a400 Bad Request
response MUST be made.The default value for
include
isreferences
. This meansreferences
entries MUST always be included under the top-level fieldincluded
as default, since a server assumes ifinclude
is not specified by a client in the request, it is still specified asinclude=references
. Note, if a client explicitly specifiesinclude
and leaves outreferences
,references
resource objects MUST NOT be included under the top-level fieldincluded
, as per the definition ofincluded
, see section JSON Response Schema: Common Fields.Note: A query with the parameter
include
set to the empty string means no related resource objects are to be returned under the top-level fieldincluded
.
Standard OPTIONAL URL query parameters not in the JSON:API specification:
- response_format: the output format requested (see section Response Format).
Defaults to the format string 'json', which specifies the standard output format described in this specification.
Example:
http://example.com/optimade/v1/structures?response_format=xml
- email_address: an email address of the user making the request.
The email SHOULD be that of a person and not an automatic system.
Example:
http://example.com/optimade/v1/[email protected]
- response_fields: a comma-delimited set of fields to be provided in the output.
If provided, these fields MUST be returned along with the REQUIRED fields.
Other OPTIONAL fields MUST NOT be returned when this parameter is present.
Example:
http://example.com/optimade/v1/structures?response_fields=last_modified,nsites
Additional OPTIONAL URL query parameters not described above are not considered to be part of this standard, and are instead considered to be "custom URL query parameters". These custom URL query parameters MUST be of the format "<database-provider-specific prefix><url_query_parameter_name>". These names adhere to the requirements on implementation-specific query parameters of JSON:API v1.1 since the database-provider-specific prefixes contain at least two underscores (a LOW LINE character '_').
Example uses of custom URL query parameters include providing an access token for the request, to tell the database to increase verbosity in error output, or providing a database-specific extended searching format.
Examples:
http://example.com/optimade/v1/structures?_exmpl_key=A3242DSFJFEJE
http://example.com/optimade/v1/structures?_exmpl_warning_verbosity=10
http://example.com/optimade/v1/structures?_exmpl_filter="elements all in [Al, Si, Ga]"
Note: the specification presently makes no attempt to standardize access control mechanisms. There are security concerns with access control based on URL tokens, and the above example is not to be taken as a recommendation for such a mechanism.
"Entry listing" endpoint response dictionaries MUST have a data
key.
The value of this key MUST be a list containing dictionaries that represent individual entries.
In the default JSON response format every dictionary (resource object) MUST have the following fields:
type: field containing the Entry type as defined in section Definition of Terms
id: field containing the ID of entry as defined in section Definition of Terms. This can be the local database ID.
attributes: a dictionary, containing key-value pairs representing the entry's properties, except for
type
andid
.Database-provider-specific and definition-provider-specific properties MUST include the corresponding prefix (see section Namespace Prefixes).
OPTIONALLY it can also contain the following fields:
- links: a JSON:API links object can OPTIONALLY contain the field
- self: the entry's URL
- meta: a JSON API meta object that is used to communicate metadata. See JSON Response Schema: Common Fields for more information about this field.
- relationships: a dictionary containing references to other entries according to the description in section Relationships encoded as JSON:API Relationships.
The OPTIONAL human-readable description of the relationship MAY be provided in the
description
field inside themeta
dictionary of the JSON:API resource identifier object. All relationships to entries of the same entry type MUST be grouped into the same JSON:API relationship object and placed in the relationships dictionary with the entry type name as key (e.g.,structures
).
Example:
{
"data": [
{
"type": "structures",
"id": "example.db:structs:0001",
"attributes": {
"chemical_formula_descriptive": "Es2 O3",
"url": "http://example.db/structs/0001",
"immutable_id": "http://example.db/structs/0001@123",
"last_modified": "2007-04-05T14:30:20Z"
}
},
{
"type": "structures",
"id": "example.db:structs:1234",
"attributes": {
"chemical_formula_descriptive": "Es2",
"url": "http://example.db/structs/1234",
"immutable_id": "http://example.db/structs/1234@123",
"last_modified": "2007-04-07T12:02:20Z"
}
}
// ...
]
// ...
}
A client can request a specific entry by appending a URL-encoded ID path segment to the URL of an entry listing endpoint. This will return properties for the entry with that ID.
In the default JSON response format, the ID component MUST be the content of the id
field.
Examples:
http://example.com/optimade/v1/structures/exmpl%3Astruct_3232823
http://example.com/optimade/v1/calculations/232132
The rules for which properties are to be present for an entry in the response are the same as defined in section Entry Listing Endpoints.
The client MAY provide a set of additional URL query parameters for this endpoint type.
URL query parameters not recognized MUST be ignored.
While the following URL query parameters are OPTIONAL for clients, API implementations MUST accept and handle them:
response_format
, email_address
, response_fields
.
The URL query parameter include
is OPTIONAL for both clients and API implementations.
The meaning of these URL query parameters are as defined above in section Entry Listing URL Query Parameters.
The response for a 'single entry' endpoint is the same as for 'entry listing' endpoint responses, except that the value of the data
field MUST have only one or zero entries.
In the default JSON response format, this means the value of the data
field MUST be a single response object or null
if there is no response object to return.
Example:
{
"data": {
"type": "structures",
"id": "example.db:structs:1234",
"attributes": {
"chemical_formula_descriptive": "Es2",
"url": "http://example.db/structs/1234",
"immutable_id": "http://example.db/structs/1234@123",
"last_modified": "2007-04-07T12:02:20Z"
}
},
"meta": {
"query": {
"representation": "/structures/example.db:structs:1234?"
}
// ...
}
// ...
}
Info endpoints provide introspective information, either about the API implementation itself, or about specific entry types.
There are two types of info endpoints:
- Base info endpoints: placed directly under the versioned or unversioned base URL that serves the API (e.g., http://example.com/optimade/v1/info or http://example.com/optimade/info)
- Entry listing info endpoints: placed under the endpoints belonging to specific entry types (e.g., http://example.com/optimade/v1/info/structures or http://example.com/optimade/info/structures)
The types and output content of these info endpoints are described in more detail in the subsections below.
Common for them all are that the data
field SHOULD return only a single resource object.
If no resource object is provided, the value of the data
field MUST be null
.
The Info endpoint under a versioned or unversioned base URL serving the API (e.g. http://example.com/optimade/v1/info or http://example.com/optimade/info) returns information relating to the API implementation.
The single resource object's response dictionary MUST include the following fields:
type:
"info"
id:
"/"
attributes: Dictionary containing the following fields:
api_version: Presently used full version of the OPTIMADE API. The version number string MUST NOT be prefixed by, e.g., "v". Examples:
1.0.0
,1.0.0-rc.2
.available_api_versions: MUST be a list of dictionaries, each containing the fields:
- url: a string specifying a versioned base URL that MUST adhere to the rules in section Base URL
- version: a string containing the full version number of the API served at that versioned base URL.
The version number string MUST NOT be prefixed by, e.g., "v".
Examples:
1.0.0
,1.0.0-rc.2
.
formats: List of available output formats.
entry_types_by_format: Available entry endpoints as a function of output formats.
available_endpoints: List of available endpoints (i.e., the string to be appended to the versioned or unversioned base URL serving the API).
license: A JSON API link giving a URL to a web page containing a human-readable text describing the license (or licensing options if there are multiple) covering all the data and metadata provided by this database.
Clients are advised not to try automated parsing of this link or its content, but rather rely on the field
available_licenses
instead. Example:https://example.com/licenses/example_license.html
.
attributes
MAY also include the following OPTIONAL fields:is_index: if
true
, this is an index meta-database base URL (see section Index Meta-Database).If this member is not provided, the client MUST assume this is not an index meta-database base URL (i.e., the default is for
is_index
to befalse
).available_licenses: List of SPDX license identifiers specifying a set of alternative licenses available to the client for licensing the complete database, i.e., all the entries, metadata, and the content and structure of the database itself. If more than one license is available to the client, the identifier of each one SHOULD be included in the list. Inclusion of a license identifier in the list is a commitment of the database that the rights are in place to grant clients access to all the individual entries, all metadata, and the content and structure of the database itself according to the terms of any of these licenses (at the choice of the client). If the licensing information provided via the field
license
omits licensing options specified inavailable_licenses
, or if it otherwise contradicts them, a client MUST still be allowed to interpret the inclusion of a license inavailable_licenses
as a full commitment from the database without exceptions, under the respective licenses. If the database cannot make that commitment, e.g., if only part of the database is available under a license, the corresponding license identifier MUST NOT appear inavailable_licenses
(but, rather, the fieldlicense
is to be used to clarify the licensing situation.) An empty list indicates that none of the SPDX licenses apply and that the licensing situation is clarified in human readable form in the fieldlicense
. An unknown value means that the database makes no commitment.available_licenses_for_entries: List of SPDX license identifiers specifying a set of additional alternative licenses available to the client for licensing individual, and non-substantial sets of, database entries, metadata, and extracts from the database that do not constitute substantial parts of the database. Note that the definition of the field
available_licenses
implies that licenses specified in that field are available also for the licensing specified by this field, even if they are not explicitly included in the fieldavailable_licenses_for_entries
or if it isnull
(however, the opposite relationship does not hold). Ifavailable_licenses
is unknown, only the licenses inavailable_licenses_for_entries
apply.
If this is an index meta-database base URL (see section Index Meta-Database), then the response dictionary MUST also include the field:
relationships: Dictionary that MAY contain a single JSON:API relationships object:
- default: Reference to the links identifier object under the
links
endpoint that the provider has chosen as their "default" OPTIMADE API database. A client SHOULD present this database as the first choice when an end-user chooses this provider. This MUST include the field:- data: JSON:API resource linkage.
It MUST be either
null
or contain a single links identifier object with the fields:- type:
links
- id: ID of the provider's chosen default OPTIMADE API database.
MUST be equal to a valid child object's
id
under thelinks
endpoint.
- type:
- data: JSON:API resource linkage.
It MUST be either
Lastly,
is_index
MUST also be included inattributes
and betrue
.- default: Reference to the links identifier object under the
Example:
{
"data": {
"type": "info",
"id": "/",
"attributes": {
"api_version": "1.0.0",
"available_api_versions": [
{"url": "http://db.example.com/optimade/v0/", "version": "0.9.5"},
{"url": "http://db.example.com/optimade/v0.9/", "version": "0.9.5"},
{"url": "http://db.example.com/optimade/v0.9.2/", "version": "0.9.2"},
{"url": "http://db.example.com/optimade/v0.9.5/", "version": "0.9.5"},
{"url": "http://db.example.com/optimade/v1/", "version": "1.0.0"},
{"url": "http://db.example.com/optimade/v1.0/", "version": "1.0.0"}
],
"formats": [
"json",
"xml"
],
"entry_types_by_format": {
"json": [
"structures",
"calculations"
],
"xml": [
"structures"
]
},
"available_endpoints": [
"structures",
"calculations",
"info",
"links"
],
"is_index": false
}
}
// ...
}
Example for an index meta-database:
{
"data": {
"type": "info",
"id": "/",
"attributes": {
"api_version": "1.0.0",
"available_api_versions": [
{"url": "http://db.example.com/optimade/v0/", "version": "0.9.5"},
{"url": "http://db.example.com/optimade/v0.9/", "version": "0.9.5"},
{"url": "http://db.example.com/optimade/v0.9.2/", "version": "0.9.2"},
{"url": "http://db.example.com/optimade/v1/", "version": "1.0.0"},
{"url": "http://db.example.com/optimade/v1.0/", "version": "1.0.0"}
],
"formats": [
"json",
"xml"
],
"entry_types_by_format": {
"json": [],
"xml": []
},
"available_endpoints": [
"info",
"links"
],
"is_index": true
},
"relationships": {
"default": {
"data": { "type": "links", "id": "perovskites" }
}
}
}
// ...
}
Entry listing info endpoints are accessed under the versioned or unversioned base URL serving the API as /info/<entry_type>
(e.g., http://example.com/optimade/v1/info/structures or http://example.com/optimade/info/structures).
They return information related to the specific entry types served by the API.
The response for these endpoints MUST include the following information in the data
field:
type:
"info"
.id: This MUST precisely match the entry type name, e.g.,
"structures"
for the/info/structures
.description: Description of the entry.
properties: A dictionary describing properties for this entry type, where each key is a property name and the value is an OPTIMADE Property Definition described in detail in the section Property Definitions.
formats: List of output formats available for this type of entry (see section Response Format)
output_fields_by_format: Dictionary of available output fields for this entry type, where the keys are the values of the
formats
list and the values are the keys of theproperties
dictionary.Note: Future versions of the OPTIMADE API will deprecate this format and require all keys that are not
type
orid
to be under theattributes
key.
An example of the data part of the entry listing info endpoint response follows below, however, note that:
- The description strings have been wrapped for readability only (newline characters are not allowed inside JSON strings)
- The properties in the example, 'nelements' and 'lattice_vectors', mimick OPTIMADE standard properties, but are given here with simplified definitions compared to the standard definitions for these properties.
{
"data": {
"type": "info",
"id": "structures",
"description": "a structures entry",
"properties": {
"nelements": {
"$id": "urn:uuid:10a05e55-0c20-4f68-89ad-35a18eb7076f",
"title": "Number of elements",
"x-optimade-type": "integer",
"type": ["integer", "null"],
"description": "Number of different elements in the structure as an integer.\n
\n
- Note: queries on this property can equivalently be formulated using `elements LENGTH`.\n
- A filter that matches structures that have exactly 4 elements: `nelements=4`.\n
- A filter that matches structures that have between 2 and 7 elements: `nelements>=2 AND nelements<=7`.",
"examples": [
3
],
"x-optimade-property": {
"property-format": "1.2"
},
"x-optimade-unit": "dimensionless",
"x-optimade-implementation": {
"sortable": true,
"query-support": "all mandatory"
},
"x-optimade-requirements": {
"support": "should",
"sortable": false,
"query-support": "all mandatory"
}
},
"lattice_vectors": {
"$id": "urn:uuid:81edf372-7b1b-4518-9c14-7d482bd67834",
"title": "Lattice vectors",
"x-optimade-definition": {
"label": "lattice_vectors_optimade_structures",
"kind": "property",
"format": "1.2",
"version": "1.2.0",
"name": "lattice_vectors"
},
"x-optimade-type": "list",
"x-optimade-dimensions": {
"names": ["dim_lattice", "dim_spatial"],
"lengths": [3, 3]
},
"x-optimade-unit-definitions": [
{
"symbol": "angstrom",
"title": "ångström",
"description": "The ångström unit of length.",
"standard": {
"name": "gnu units",
"version": "3.09",
"symbol": "angstrom"
}
}
],
"x-optimade-unit": "inapplicable",
"x-optimade-implementation": {
"sortable": false,
"query-support": "none"
},
"x-optimade-requirements": {
"support": "should",
"sortable": false,
"query-support": "none"
},
"type": ["array", "null"],
"description": "The three lattice vectors in Cartesian coordinates, in ångström (Å).\n
\n
- MUST be a list of three vectors *a*, *b*, and *c*, where each of the vectors MUST BE a
list of the vector's coordinates along the x, y, and z Cartesian coordinates.
",
"examples": [
[[4.0, 0.0, 0.0], [0.0, 4.0, 0.0], [0.0, 1.0, 4.0]]
],
"items": {
"type": "array",
"x-optimade-type": "list",
"x-optimade-unit": "inapplicable",
"x-optimade-dimensions": {
"names": ["dim_spatial"],
"lengths": [3]
},
"items": {
"type": "number",
"x-optimade-type": "float",
"x-optimade-unit": "angstrom",
"x-optimade-implementation": {
"sortable": true,
"query-support": "none"
},
"x-optimade-requirements": {
"sortable": false,
"query-support": "none"
}
}
}
}
// ... <other property descriptions>
},
"formats": ["json", "xml"],
"output_fields_by_format": {
"json": [
"nelements",
"lattice_vectors",
// ...
],
"xml": ["nelements"]
}
}
// ...
}
This endpoint exposes information on other OPTIMADE API implementations that are related to the current implementation.
The links endpoint MUST be provided under the versioned or unversioned base URL serving the API at /links
.
Each link has a link_type
attribute that specifies the type of the linked relation.
The link_type
MUST be one of the following values:
child
: a link to another OPTIMADE implementation that MUST be within the same provider. This allows the creation of a tree-like structure of databases by pointing to children sub-databases.root
: a link to the root implementation within the same provider. This is RECOMMENDED to be an Index Meta-Database. There MUST be only oneroot
implementation per provider and all implementations MUST have a link to thisroot
implementation. If the provider only supplies a single implementation, theroot
link links to the implementation itself.external
: a link to an external OPTIMADE implementation. This MAY be used to point to any other implementation, also in a different provider.providers
: a link to a List of Providers Links implementation that includes the current implementation, e.g. providers.optimade.org.
Limiting to the root
and child
link types, links can be used as an introspective endpoint, similar to the Info Endpoints, but at a higher level, i.e., Info Endpoints provide information on the given implementation, while the /links
endpoint provides information on the links between immediately related implementations (in particular, an array of none or a single object with link type root
and none or more objects with link type child
, see section Internal Links: Root and Child Links).
For /links
endpoints, the API implementation MAY ignore any provided query parameters.
Alternatively, it MAY handle the parameters specified in section Entry Listing URL Query Parameters for entry listing endpoints.
The resource objects' response dictionaries MUST include the following fields:
type: MUST be
"links"
.id: MUST be unique.
attributes: Dictionary that MUST contain the following fields:
name: Human-readable name for the OPTIMADE API implementation, e.g., for use in clients to show the name to the end-user.
description: Human-readable description for the OPTIMADE API implementation, e.g., for use in clients to show a description to the end-user.
base_url: JSON API link, pointing to the base URL for this implementation, either directly as a string, or as an object, which can contain the following fields:
- href: a string containing the OPTIMADE base URL.
- meta: a meta object containing non-standard meta-information about the implementation.
homepage: a JSON API link, pointing to a homepage URL for this implementation, either directly as a string, or as an object, which can contain the following fields:
- href: a string containing the implementation homepage URL.
- meta: a meta object containing non-standard meta-information about the homepage.
link_type: a string containing the link type. It MUST be one of the values listed above in section Link Types.
aggregate: a string indicating whether a client that is following links to aggregate results from different OPTIMADE implementations should follow this link or not. This flag SHOULD NOT be indicated for links where
link_type
is notchild
.If not specified, clients MAY assume that the value is
ok
. If specified, and the value is anything different thanok
, the client MUST assume that the server is suggesting not to follow the link during aggregation by default (also if the value is not among the known ones, in case a future specification adds new accepted values).Specific values indicate the reason why the server is providing the suggestion. A client MAY follow the link anyway if it has reason to do so (e.g., if the client is looking for all test databases, it MAY follow the links where
aggregate
has valuetest
).If specified, it MUST be one of the values listed in section Link Aggregate Options.
no_aggregate_reason: an OPTIONAL human-readable string indicating the reason for suggesting not to aggregate results following the link. It SHOULD NOT be present if
aggregate
has valueok
.
Example:
{
"data": [
{
"type": "links",
"id": "index",
"attributes": {
"name": "Index",
"description": "Index for example's OPTIMADE databases",
"base_url": "http://example.com/optimade",
"homepage": "http://example.com",
"link_type": "root"
}
},
{
"type": "links",
"id": "cat_zeo",
"attributes": {
"name": "Catalytic Zeolites",
"description": "Zeolites for deNOx catalysis",
"base_url": {
"href": "http://example.com/optimade/denox/zeolites",
"meta": {
"_exmpl_catalyst_group": "denox"
}
},
"homepage": "http://example.com",
"link_type": "child"
}
},
{
"type": "links",
"id": "frameworks",
"attributes": {
"name": "Zeolitic Frameworks",
"description": "",
"base_url": "http://example.com/zeo_frameworks/optimade",
"homepage": "http://example.com",
"link_type": "child"
}
},
{
"type": "links",
"id": "testdb",
"attributes": {
"name": "Test database",
"description": "A test database",
"base_url": "http://example.com/testdb/optimade",
"homepage": "http://example.com",
"link_type": "child",
"aggregate": "test"
}
},
{
"type": "links",
"id": "internaldb",
"attributes": {
"name": "Database for internal use",
"description": "An internal database",
"base_url": "http://example.com/internaldb/optimade",
"homepage": "http://example.com",
"link_type": "child",
"aggregate": "no",
"no_aggregate_reason": "This is a database for internal use and might contain nonsensical data"
}
},
{
"type": "links",
"id": "frameworks",
"attributes": {
"name": "Some other DB",
"description": "A DB by the example2 provider",
"base_url": "http://example2.com/some_db/optimade",
"homepage": "http://example2.com",
"link_type": "external"
}
},
{
"type": "links",
"id": "optimade",
"attributes": {
"name": "Materials Consortia",
"description": "List of OPTIMADE providers maintained by the Materials Consortia organisation",
"base_url": "https://providers.optimade.org",
"homepage": "https://optimade.org",
"link_type": "providers"
}
}
]
}
Any number of resource objects with link_type
equal to child
MAY be present as part of the data
list.
A child
object represents a "link" to an OPTIMADE implementation within the same provider exactly one layer below the current implementation's layer.
Exactly one resource object with link_type
equal to root
MUST be present as part of the data
list.
Note: the same implementation may of course be linked by other implementations via a /links
endpoint with link_type
equal to external
.
The root
resource object represents a link to the topmost OPTIMADE implementation of the current provider.
By following child
links from the root
object recursively, it MUST be possible to reach the current OPTIMADE implementation.
In practice, this forms a tree structure for the OPTIMADE implementations of a provider. Note: The RECOMMENDED number of layers is two.
Resource objects with link_type
equal to providers
MUST point to an Index Meta-Database that supplies a list of OPTIMADE database providers.
The intention is to be able to auto-discover all providers of OPTIMADE implementations.
A list of known database providers can be retrieved as described in section Namespace Prefixes. This section also describes where to find information for how a provider can be added to this list.
If the provider implements an Index Meta-Database, it is RECOMMENDED to adopt a structure where the index meta-database is the root
implementation of the provider.
This will make all OPTIMADE databases and implementations by the provider discoverable as links with child
link type, under the links
endpoint of the Index Meta-Database.
If specified, the aggregate
attributed MUST have one of the following values:
ok
(default value, if unspecified): it is ok to follow this link when aggregating OPTIMADE results.test
: the linked database is a test database, whose content might not be correct or might not represent physically-meaningful data. Therefore by default the link should not be followed.staging
: the linked database is almost production-ready, but final checks on its content are being performed, so the content might still contain errors. Therefore by default the link should not be followed.no
: any other reason to suggest not to follow the link during aggregation of OPTIMADE results. The implementation MAY provide mode details in a human-readable form via the attributeno-aggregate-reason
.
API implementations MAY provide custom endpoints under the Extensions endpoint.
Custom extension endpoints MUST be placed under the versioned or unversioned base URL serving the API at /extensions
.
The API implementation is free to define roles of further URL path segments under this URL.
An OPTIMADE filter expression is passed in the parameter filter
as a URL query parameter as specified by JSON:API.
The filter expression allows desired properties to be compared against search values; several such comparisons can be combined using the logical conjunctions AND, OR, NOT, and parentheses, with their usual semantics.
All properties marked as REQUIRED in section Entry list MUST be queryable with all mandatory filter features. The level of query support REQUIRED for other properties is described in Entry list.
When provided as a URL query parameter, the contents of the filter
parameter is URL-encoded by the client in the HTTP GET request, and then URL-decoded by the API implementation before any further parsing takes place.
In particular, this means the client MUST escape special characters in string values as described below for String values before the URL encoding, and the API implementation MUST first URL-decode the filter
parameter before reversing the escaping of string tokens.
Examples of syntactically correct query strings embedded in queries:
http://example.org/optimade/v1/structures?filter=_exmpl_melting_point%3C300+AND+nelements=4+AND+chemical_formula_descriptive="SiO2"&response_format=xml
Or, fully URL encoded:
http://example.org/optimade/v1/structures?filter=_exmpl_melting_point%3C300+AND+nelements%3D4+AND+chemical_formula_descriptive%3D%22SiO2%22&response_format=xml
The following tokens are used in the filter query component:
Property names: the first character MUST be a lowercase letter, the subsequent symbols MUST be composed of lowercase letters or digits; the underscore ("_", ASCII 95 dec (0x5F)) is considered to be a lower-case letter when defining identifiers. The length of the identifiers is not limited, except that when passed as a URL query parameter the whole query SHOULD NOT be longer than the limits imposed by the URI specification. This definition is similar to one used in most widespread programming languages, except that OPTIMADE limits allowed letter set to lowercase letters only. This allows to tell OPTIMADE identifiers and operator keywords apart unambiguously without consulting a reserved word table and to encode this distinction concisely in the EBNF Filter Language grammar.
Examples of valid property names:
band_gap
cell_length_a
cell_volume
Examples of incorrect property names:
0_kvak
(starts with a number);"foo bar"
(contains space; contains quotes)BadLuck
(contains upper-case letters)
Identifiers that start with an underscore are specific to a database or definition provider, and MUST be on the format of a namespace prefix (see section Namespace Prefixes).
Examples:
_exmpl_formula_sum
(a property specific to that database)_exmpl_band_gap
_exmpl_supercell
_exmpl_trajectory
_exmpl_workflow_id
Nested property names A nested property name is composed of at least two identifiers separated by periods (
.
).
String values MUST be surrounded by double quote characters (
"
, ASCII symbol 34 dec, 0x22 hex). A double quote that is a part of the value, not a delimiter, MUST be escaped by prepending it with a backslash character (\\
, ASCII symbol 92 dec, 0x5C hex). A backslash character that is part of the value (i.e., not used to escape a double quote) MUST be escaped by prepending it with another backslash. An example of an escaped string value, including the enclosing double quotes, is given below:- "A double quote character (\", ASCII symbol 34 dec) MUST be prepended by a backslash (\\, ASCII symbol 92 dec) when it is a part of the value and not a delimiter; the backslash character \"\\\" itself MUST be preceded by another backslash, forming a double backslash: \\\\"
(Note that at the end of the string value above the four final backslashes represent the two terminal backslashes in the value, and the final double quote is a terminator, it is not escaped.)
String value tokens are also used to represent timestamps in form of the RFC 3339 Internet Date/Time Format.
Numeric values are represented as decimal integers or in scientific notation, using the usual programming language conventions. A regular expression giving the number syntax is given below as a POSIX Extended Regular Expression (ERE) or as a Perl-Compatible Regular Expression (PCRE):
- ERE:
[-+]?([0-9]+(.[0-9]*)?|.[0-9]+)([eE][-+]?[0-9]+)?
- PCRE:
[-+]?(?:d+(.d*)?|.d+)(?:[eE][-+]?d+)?
- ERE:
An implementation of the search filter MAY reject numbers that are outside the machine representation of the underlying hardware; in such case it MUST return the error 501 Not Implemented
with an appropriate error message that indicates the cause of the error and an acceptable number range.
- Examples of valid numbers:
- 12345, +12, -34, 1.2, .2E7, -.2E+7, +10.01E-10, 6.03e23, .1E1, -.1e1, 1.e-12, -.1e-12, 1000000000.E1000000000, 1., .1
- Examples of invalid numbers (although they MAY contain correct numbers as substrings):
- 1.234D12, .e1, -.E1, +.E2, 1.23E+++, +-123
- Note: this number representation is more general than the number representation in JSON (for instance,
1.
is a valid numeric value for the filtering language specified here, but is not a valid float number in JSON, where the correct format is1.0
instead).
While the filtering language supports tests for equality between properties of floating point type and decimal numbers given in the filter string, such comparisons come with the usual caveats for testing for equality of floating point numbers. Normally, a client cannot rely on that a floating point number stored in a database takes on a representation that exactly matches the one obtained for a number given in the filtering string as a decimal number or as an integer. However, testing for equality to zero MUST be supported.
More examples of the number tokens and machine-readable definitions and tests can be found in the Materials-Consortia API Git repository (files integers.lst, not-numbers.lst, numbers.lst, and reals.lst).
Boolean values are represented with the tokens
TRUE
andFALSE
.Operator tokens are represented by usual mathematical relation symbols or by case-sensitive keywords. Currently the following operators are supported:
=
,!=
,<=
,>=
,<
,>
for tests of number, string (lexicographical) or timestamp (temporal) equality, inequality, less-than, more-than, less, and more relations;AND
,OR
,NOT
for logical conjunctions, and a number of keyword operators discussed in the next section.In future extensions, operator tokens that are words MUST contain only upper-case letters. This requirement guarantees that no operator token will ever clash with a property name.
All filtering expressions MUST follow the EBNF grammar of appendix The Filter Language EBNF Grammar of this specification.
The appendix contains a complete machine-readable EBNF, including the definition of the lexical tokens described above in section Lexical Tokens. The EBNF is enclosed in special strings constructed as BEGIN
and END
, both followed by EBNF GRAMMAR Filter
, to enable automatic extraction.
The filter language supports conjunctions of comparisons using the boolean algebra operators "AND", "OR", and "NOT" and parentheses to group conjunctions. A comparison clause prefixed by NOT matches entries for which the comparison is false.
Examples:
NOT ( chemical_formula_hill = "Al" AND chemical_formula_anonymous = "A" OR chemical_formula_anonymous = "H2O" AND NOT chemical_formula_hill = "Ti" )
Comparisons involving Numeric and String properties can be expressed using the usual comparison operators: '<', '>', '<=', '>=', '=', '!='. Implementations MUST support comparisons in the forms:
identifier <operator> constant constant <operator> identifier
Where identifier
is a property name and constant
is either a numerical or string type constant.
Implementations MAY also support comparisons with identifiers on both sides, and comparisons with numerical type constants on both sides, i.e., in the forms:
identifier <operator> identifier constant <operator> constant
However, the latter form, constant <operator> constant
where the constants are strings MUST return the error 501 Not Implemented
.
Note: The motivation to exclude the form constant <operator> constant
for strings is that filter language strings can refer to data of different data types (e.g., strings and timestamps), and thus this construct is not unambiguous.
The OPTIMADE specification will strive to address this issue in a future version.
Examples:
nelements > 3
chemical_formula_hill = "H2O" AND chemical_formula_anonymous != "AB"
_exmpl_aax <= +.1e8 OR nelements >= 10 AND NOT ( _exmpl_x != "Some string" OR NOT _exmpl_a = 7)
_exmpl_spacegroup="P2"
_exmpl_cell_volume<100.0
_exmpl_band_gap > 5.0 AND _exmpl_molecular_weight < 350
_exmpl_melting_point<300 AND nelements=4 AND chemical_formula_descriptive="SiO2"
_exmpl_some_string_property = 42
(This is syntactically allowed without putting 42 in quotation marks, see the notes about comparisons of values of different types below.)5 < _exmpl_a
- OPTIONAL:
((NOT (_exmpl_a>_exmpl_b)) AND _exmpl_x>0)
- OPTIONAL:
5 < 7
In addition to the standard equality and inequality operators, matching of partial strings is provided by keyword operators:
identifier CONTAINS x
: Is true if the substring value x is found anywhere within the property.identifier STARTS WITH x
: Is true if the property starts with the substring value x. TheWITH
keyword MAY be omitted.identifier ENDS WITH x
: Is true if the property ends with the substring value x. TheWITH
keyword MAY be omitted.
OPTIONAL features:
- Support for x to be an identifier, rather than a string is OPTIONAL.
Examples:
chemical_formula_anonymous CONTAINS "C2" AND chemical_formula_anonymous STARTS WITH "A2"
chemical_formula_anonymous STARTS "A2" AND chemical_formula_anonymous ENDS WITH "D1"
Straightforward comparisons ('=' and '!=') MUST be supported for boolean values.
Other comparison operators ('<', '>', '<=', '>=') MUST NOT be supported.
Boolean values are only supposed to be used in direct comparisons with properties, but not compound comparisons.
For example, (nsites = 3 AND nelements = 3) = FALSE
is not supported.
Boolean property property
MAY be compared with TRUE
by omitting the = TRUE
altogether: property
.
Conversely, it MAY be compared with FALSE
by negating the comparison with TRUE
: NOT property
.
Examples:
property = TRUE
property != FALSE
_exmpl_has_inversion_symmetry AND NOT _exmpl_is_primitive
In the following, list
is a list-type property, and values
is one or more value
separated by commas (","), i.e., strings or numbers.
An implementation MAY also support property names and nested property names in values
.
The following constructs MUST be supported:
list HAS value
: matches if at least one element inlist
is equal tovalue
. (Iflist
has no duplicate elements, this implements the set operator IN.)list HAS ALL values
: matches if, for eachvalue
, there is at least one element inlist
equal to that value. (If bothlist
andvalues
do not contain duplicate values, this implements the set operator >=.)list HAS ANY values
: matches if at least one element inlist
is equal to at least onevalue
. (This is equivalent to a number of HAS statements separated by OR.)list LENGTH value
: matches if the number of items in thelist
property is equal tovalue
.
The HAS ONLY
construct MAY be supported:
- OPTIONAL:
list HAS ONLY values
: matches if all elements inlist
are equal to at least onevalue
. (If bothlist
andvalues
do not contain duplicate values, this implements the <= set operator.)
This construct is OPTIONAL as it can be difficult to realize in some underlying database implementations.
However, if the desired search is over a property that can only take on a finite set of values (e.g., chemical elements) a client can formulate an equivalent search by inverting the list of values into inverse
and express the filter as NOT list HAS inverse
.
Furthermore, there is a set of OPTIONAL constructs that allows filters to be formulated over the values in correlated positions in multiple list properties. An implementation MAY support this syntax selectively only for specific properties. This type of filter is useful for, e.g., filtering on elements and correlated element counts available as two separate list properties.
list1:list2:... HAS val1:val2:...
list1:list2:... HAS ALL val1:val2:...
list1:list2:... HAS ANY val1:val2:...
list1:list2:... HAS ONLY val1:val2:...
Finally, all the above constructs that allow a value or lists of values on the right-hand side MAY allow <operator> value
in each place a value can appear.
In that case, a match requires that the <operator>
comparison is fulfilled instead of equality.
Strictly, the definitions of the HAS
, HAS ALL
, HAS ANY
, HAS ONLY
and LENGTH
operators as written above apply, but with the word 'equal' replaced with the <operator>
comparison.
For example:
- OPTIONAL:
list HAS < 3
: matches all entries for whichlist
contains at least one element that is less than three. - OPTIONAL:
list HAS ALL < 3, > 3
: matches only those entries for whichlist
simultaneously contains at least one element less than three and one element greater than three.
An implementation MAY support combining the operator syntax with the syntax for correlated lists in particularly on a list correlated with itself. For example:
- OPTIONAL:
list:list HAS >=2:<=5
: matches all entries for whichlist
contains at least one element that is between the values 2 and 5.
Further examples of various comparisons of list properties:
- OPTIONAL:
elements HAS "H" AND elements HAS ALL "H","He","Ga","Ta" AND elements HAS ONLY "H","He","Ga","Ta" AND elements HAS ANY "H", "He", "Ga", "Ta"
- OPTIONAL:
elements HAS ONLY "H","He","Ga","Ta"
- OPTIONAL:
elements:_exmpl_element_counts HAS "H":6 AND elements:_exmpl_element_counts HAS ALL "H":6,"He":7 AND elements:_exmpl_element_counts HAS ONLY "H":6 AND elements:_exmpl_element_counts HAS ANY "H":6,"He":7 AND elements:_exmpl_element_counts HAS ONLY "H":6,"He":7
- OPTIONAL:
_exmpl_element_counts HAS < 3 AND _exmpl_element_counts HAS ANY > 3, = 6, 4, != 8
(note: specifying the = operator after HAS ANY is redundant here, if no operator is given, the test is for equality.) - OPTIONAL:
elements:_exmpl_element_counts:_exmpl_element_weights HAS ANY > 3:"He":>55.3 , = 6:>"Ti":<37.6 , 8:<"Ga":0
Everywhere in a filter string where a property name is accepted, the API implementation MAY accept nested property names as described in section Lexical Tokens, consisting of identifiers separated by periods ('.').
A filter on a nested property name consisting of two identifiers identifier1.identifier2
matches if either one of these points are true:
identifier1
references a dictionary-type property that contains as an identifieridentifier2
and the filter matches for the content ofidentifier2
.identifier1
references a list of dictionaries that contain as an identifieridentifier2
and the filter matches for a flat list containing only the contents ofidentifier2
for every dictionary in the list. E.g., ifidentifier1
is the list[{"identifier2":42, "identifier3":36}, {"identifier2":96, "identifier3":66}]
, thenidentifier1.identifier2
is understood in the filter as the list[42, 96]
.
The API implementation MAY allow this notation to generalize to arbitrary depth. A nested property name that combines more than one list MUST, if accepted, be interpreted as a completely flattened list.
As described in the section Relationships, it is possible for the API implementation to describe relationships between entries of the same, or different, entry types.
The API implementation MAY support queries on relationships with an entry type <entry type>
by using special nested property names:
<entry type>.id
references a list of IDs of relationships with entries of the type<entry type>
.<entry type>.description
references a correlated list of the human-readable descriptions of these relationships.
Hence, the filter language acts as, for every entry type, there is a property with that name which contains a list of dictionaries with two keys, id
and description
.
For example: a client queries the structures
endpoint with a filter that references calculations.id
.
For a specific structures entry, the nested property behaves as the list ["calc-id-43", "calc-id-96"]
and would then, e.g., match the filter calculations.id HAS "calc-id-96"
.
This means that the structures entry has a relationship with the calculations entry of that ID.
Note: formulating queries on relationships with entries that have specific property values is a multi-step process. For example, to find all structures with bibliographic references where one of the authors has the last name "Schmidt" is performed by the following two steps:
- Query the
references
endpoint with a filterauthors.lastname HAS "Schmidt"
and store theid
values of the returned entries.- Query the
structures
endpoint with a filterreferences.id HAS ANY <list-of-IDs>
, where<list-of-IDs>
are the IDs retrieved from the first query separated by commas.(Note: the type of query discussed here corresponds to a "join"-type operation in a relational data model.)
Properties can have an unknown value, see section Properties with an unknown value.
Filters that match when the property is known, or unknown, respectively can be constructed using the following syntax:
identifier IS KNOWN identifier IS UNKNOWN
Except for the above constructs, filters that use any form of comparison that involve properties of unknown values MUST NOT match.
Hence, by definition, an identifier
of value null
never matches equality (=
), inequality (<
, <=
, >
, >=
, !=
) or other comparison operators besides identifier IS UNKNOWN
and NOT identifier IS KNOWN
.
In particular, a filter that compares two properties that are both null
for equality or inequality does not match.
Examples:
chemical_formula_hill IS KNOWN AND NOT chemical_formula_anonymous IS UNKNOWN
The precedence (priority) of the operators MUST be as indicated in the list below:
- Comparison and keyword operators (
<
,<=
,=
,HAS
,STARTS
, etc.) -- highest priority; NOT
AND
OR
-- lowest priority.
Examples:
NOT a > b OR c = 100 AND f = "C2 H6"
: this is interpreted as(NOT (a > b)) OR ( (c = 100) AND (f = "C2 H6") )
when fully braced.a >= 0 AND NOT b < c OR c = 0
: this is interpreted as((a >= 0) AND (NOT (b < c))) OR (c = 0)
when fully braced.
The definitions of specific properties in this standard define their types.
Similarly, for custom properties, the database provider decides their types.
In the syntactic constructs that can accommodate values of more than one type, types of all participating values are REQUIRED to match, with a single exception of timestamps (see below).
Different types of values MUST be reported as 501 Not Implemented
errors, meaning that type conversion is not implemented in the specification.
As the filter language syntax does not define a lexical token for timestamps, values of this type are expressed using string tokens in RFC 3339 Internet Date/Time Format.
In a comparison with a timestamp property, a string token represents a timestamp value that would result from parsing the string according to RFC 3339 Internet Date/Time Format.
Interpretation failures MUST be reported with error 400 Bad Request
.
Some features of the filtering language are marked OPTIONAL.
An implementation that encounters an OPTIONAL feature that it does not support MUST respond with error 501 Not Implemented
with an explanation of which OPTIONAL construct the error refers to.
An OPTIMADE Property Definition defines a specific property, which will be referred to as the defined property throughout this section. The definition uses a dictionary-based construct that, when represented in the JSON output format, is compatible with the JSON Schema standard (for more information, see Property Definition keys from JSON Schema). The format of Property Definitions defined below allows nesting inner Property Definitions to define properties that are comprised by values organized in lists and dictionaries to arbitrary depth.
To make a property definition expressible in any output format, the fields of the property definition below are specified using OPTIMADE data types. When a property definition is communicated using a specific data format (e.g., JSON), the property definition is implemented in that data format by mapping the OPTIMADE data types into the corresponding data types for that output format.
Clients are meant to be able to rely on the fact that properties with the same $id
fields represents equivalently defined properties.
Hence, when a Property Definition that has been published previously is updated, it is of major importance to decide if the updates merely amend, annotate, or clarify the definition in a way that leaves it functionally the same and thus can retain the $id
, or whether the property is redefined.
An example of an update that does not functionally change the definition is the addition or modification of the examples given in the examples
field.
If a property is redefined, the redefinition MUST change the $id
.
The nature of an updated definition can also be reflected in the subfield version
of x-optimade-definition
, which allows definitions to be versioned using the semantic versioning v2 standard where the update is categorized on the levels of a patch, minor, or major change.
A Property Definition MUST be composed according to the combination of the requirements in the subsection Property Definition keys from JSON Schema below and the following additional requirements:
REQUIRED keys for the outermost level of the Property Definition and OPTIONAL for other levels:
$id
: String,$schema
: String,title
: String, anddescription
: String. See the subsection Property definition keys from JSON Schema for the definitions of these fields. They are defined in that subsection as OPTIONAL on any level of the Property Definition, but are REQUIRED on the outermost level.
x-optimade-definition
: Dictionary. Additional information about the definition that is not covered by fields in the JSON Schema standard.REQUIRED keys:
format
: String. A string that declares the OPTIMADE definition format the definition adheres to. Currently, this is expressed as the minor version of the OPTIMADE specification that describes the property definition format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format in which this property definition is expressed. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly1.2
. A client MUST disregard the property definition if the field is not a string of the format MAJOR.MINOR or if the MAJOR version number is unrecognized. This field allows future versions of this standard to support implementations keeping definitions that adhere to older versions of the property definition format.kind
: String. A string specifying what entity is being defined. For Property Definitions this MUST be the string "property".name
: String. An short identifier (as defined in Definition of Terms) that provides a reasonable short non-unique name for the entity being defined.label
: String. An extended identifier (as defined in Definition of Terms) that describes the entity being defined in a way that is unique within a set of definitions provided together. The label SHOULD start with the name.Implementation notes:
The name and label fields ensure implementations will be able to give meaningful names to definitions if they are translated into other formats with various requirements on human-readable names, e.g., as RDF data (see, e.g., rdfs:label).
OPTIONAL keys:
version
: String. This string indicates the version of the definition. The string SHOULD be in the format described by the semantic versioning v2 standard. When a definition is changed in a way that consitutes a redefinition it SHOULD indicate this by incrementing the MAJOR version number.resources
: List. A list of dictionaries that references remote resources that describe the property. The format of each dictionary is:REQUIRED keys:
relation
: String. A human-readable description of the relationship between the property and the remote resource, e.g., a "natural language description".resource-id
: String. An IRI of the external resource, which MAY be a resolvable URL.
REQUIRED keys for all levels of the Property Definition:
x-optimade-type
: String. Specifies the OPTIMADE data type for this level of the defined property. MUST be one of"string"
,"integer"
,"float"
,"boolean"
,"timestamp"
,"list"
, or"dictionary"
.x-optimade-unit
: String. A (compound) symbol for the physical unit in which the value of the defined property is given or one of the stringsdimensionless
orinapplicable
. See subsection Physical Units in Property Definitions for the details on how compound units are represented in OPTIMADE Property Definitions and the precise format of this string.
OPTIONAL keys at all nested levels of the Property Definition:
x-optimade-unit-definitions
: List. A list of definitions of the symbols used in the Property Definition (including its nested levels) for physical units given as values of thex-optimade-unit
field. This field MUST be included at the outermost level of a property definition if the defined property, at any level, includes anx-optimade-unit
with a value that is notdimensionless
orinapplicable
, and it MUST include definitions of all units used on all levels in the property definition. The field MAY also occur at deeper nesting levels (but this is not required). If it does, the unit definitions provided MUST be redundant with those provided at higher nesting levels. See subsection Physical Units in Property Definitions for the details on how units are represented in OPTIMADE Property Definitions and the precise format of this dictionary.
x-optimade-dimensions
: Dictionary. Specification of the dimensions of one or multi-dimensional data represented as multiple levels of lists. Each dimension is given a name and optionally a fixed size.REQUIRED keys:
names
: List of Strings. A list of names of the dimensions of the underlying one or multi-dimensionsional data represented as mutiple levels of lists. The order is that the the first name applies to the outermost list, the next name to the lists embedded in that list, etc.sizes
: List of Integers ornull
. A list of fixed length requirements on the underlying one or multi-dimensionsional data represented as mutiple levels of lists. The order is that the the first name applies to the outermost list, the next name to the lists embedded in that list, etc. The data only validates if the respective level consists of lists of exactly this length. A value ofnull
allows arbitrary-length lists at the corresponding level.Note: OPTIMADE Property Definitions use this field, and MUST NOT use the JSON Schema validating fields minItems and maxItems since that would require reprocessing the schema to handle requests using the OPTIMADE features that requests partial data in lists. Instead, the length of lists can be validated against the length information provided in the
sizes
subfield ofx-optimade-dimensions
(which, at this time, can only specify a fixed length requirement.)
x-optimade-implementation
: Dictionary. A dictionary describing the level of OPTIMADE API functionality provided by the present implementation. If an implementation omits this field in its response, a client interacting with that implementation SHOULD NOT make any assumptions about the availability of these features.The dictionary has the following format:
OPTIONAL keys:
sortable
: Boolean. IfTRUE
, specifies that results can be sorted on this property (see Entry Listing URL Query Parameters for more information on this field). IfFALSE
, specifies that results cannot be sorted on this property. Omitting the field is equivalent toFALSE
.query-support
: String. Defines a required level of support in formulating queries on this field. The string MUST be one of the following:all mandatory
: the defined property MUST be queryable using the OPTIMADE filter language with support for all mandatory filter features.equality only
: the defined property MUST be queryable using the OPTIMADE filter language equality and inequality operators. Other filter language features do not need to be available.partial
: the defined property MUST be queryable with support for a subset of the filter language operators as specified by the fieldquery-support-operators
.none
: the defined property does not need to be queryable with any features of the filter language.
Omitting the field or
null
is equivalent tonone
.query-support-operators
: List of Strings. Defines the filter language features supported on this property. MUST be present and notnull
if and only ifquery-support
ispartial
.Each string in the list MUST be one of
<
,<=
,>
,>=
,=
,!=
,CONTAINS
,STARTS WITH
,ENDS WITH
,HAS
,HAS ALL
,HAS ANY
,HAS ONLY
,IS KNOWN
,IS UNKNOWN
with the following meanings:<
,<=
,>
,>=
,=
,!=
: indicating support for filtering this property using the respective operator. If the property is of Boolean type, support for=
also designates support for boolean comparisons with the property being true that omit "= TRUE
", e.g., specifying that filtering for "is_yellow = TRUE
" is supported also implies support for "is_yellow
" (which means the same thing). Support for "NOT is_yellow
" also follows.CONTAINS
,STARTS WITH
,ENDS WITH
: indicating support for substring filtering of this property using the respective operator. MUST NOT appear if the property is not of type String.HAS
,HAS ALL
,HAS ANY
: indicating support of the MANDATORY features for list property comparison using the respective operator. MUST NOT appear if the property is not of type List.HAS ONLY
: indicating support for list property comparison with all or a subset of the OPTIONAL constructs using this operator. MUST NOT appear if the property is not of type List.IS KNOWN
,IS UNKNOWN
: indicating support for filtering this property on unknown values using the respective operator.
response-default
: Boolean. The valueTRUE
means the implementation includes the property in responses by default, i.e., when not specifically requested. The valueFALSE
means that the property is only included when requested. Omitting the field ornull
means the implementation does not declare if the property will be included in responses by default or not.
x-optimade-requirements
: Dictionary. A dictionary describing the level of OPTIMADE API functionality required by all implementations of this property. Omitting this field means the corresponding functionality is OPTIONAL. The dictionary has the same format asx-optimade-implementation
, except that theresponse-default
field SHOULD NOT appear, and the following additional OPTIONAL fields are allowed:support
: String. Describes the minimal required level of support for the Property by an implementation. This field only has meaning for the defined property when appearing in thex-optimade-requirements
at the outermost level of the definition. Nevertheless, it MAY appear in other places, e.g., if a nested property definition has been inserted that references its own$id
. The string MUST be one of the following:must
: the defined property MUST be recognized by the implementation (e.g., in filter strings) and MUST NOT benull
.should
: the defined property MUST be recognized by the implementation (e.g., in filter strings) and SHOULD NOT benull
.may
: it is OPTIONAL for the implementation to recognize the defined property and it MAY be equal tonull
.
Omitting the field is equivalent to
may
.Note: the specification by this field of whether the defined property can be
null
or not MUST match the value of thetype
field. Ifnull
values are allowed, that field must be a list where the string"null"
is the second element.response-default-level
: String. Expresses if an implementation of this property is required to include or exclude it in responses when not specifically requested. This field only has meaning for the defined property when appearing in thex-optimade-requirements
at the outermost level of the definition. Nevertheless, it MAY appear in other places, e.g., if a nested property definition has been inserted that references its own$id
.The string MUST be one of the following:
always
: the defined property MUST always be included in responses and cannot be excluded even by request via, e.g., theresponse_fields
query parameter. This is primarily intended for theid
andtype
fields, which are required for the JSON:API response format to be valid.must
: the defined property MUST be included in responses unless specifically excluded.should
: the defined property SHOULD be included in responses unless specifically excluded.may
: it is OPTIONAL for the implementation to include the defined property in responses.should not
: the defined property SHOULD NOT be included in responses unless specifically requested.must not
: the defined property MUST NOT be included in responses unless specifically requested.
Omitting the field is equivalent to
may
.
In addition to the requirements on the format of a Property Definition in the previous section, it MUST also adhere to the OPTIONAL and REQUIRED keys described in this subsection. The format described in this subsection forms a subset of the JSON Schema Validation Draft 2020-12 and JSON Schema Core Draft 2020-12 standards.
REQUIRED keys
type
: List. Specifies the corresponding JSON type for this level of the defined property and whether the property can benull
or not. The value is directly correlated withx-optimade-type
(cf. the definition of the x-optimade-type field).It MUST be a list of one or two elements where the first element is a string correlated with
x-optimade-type
as follows; ifx-optimade-type
is:"boolean"
,"string"
, or"integer"
thentype
is the same string."dictionary"
thentype
is "object"."list"
thentype
is "array"."float"
thentype
is "number"."timestamp"
thentype
is "string".
If the second element is included, it MUST be the string
"null"
. This two element form specifies that the defined property can benull
.The inclusion or not of
"null"
in the fieldtype
for a subfield defined at a nested level by a Property Definition declares if that subfield is nullable. Property Definitions for which the nullability of a subfield differs MUST NOT share the same$id
. However, the nullability of the subfield SHOULD NOT be taken into account when comparing the nested Property Definition for that subfield with other definitions, i.e., a nullable and non-nullable subfield that are otherwise defined the same SHOULD share the same$id
. Hence, formally OPTIMADE Property Definitions regard nullability of a subfield to belong one level above where it appears in the JSON Schema definition.Implementation notes:
- The field
type
can be derived from the fieldx-optimade-type
and its role is only to provide the JSON type names corresponding tox-optimade-type
. The motivation to include these type names is that it makes the JSON representation of a Property Definition a fully valid standard JSON Schema. Nevertheless, for consistency across formats, these JSON type names MUST still be included when a property definition is represented in other output formats (i.e., the JSON names MUST NOT be translated into the type names of that output format). - The allowed values of the
type
field are highly restricted compared to what is permitted using the full JSON Schema standard. Values can only be defined to be a single OPTIMADE data type or, optionally,null
. This restriction is intended to reduce the complexity of possible data types that implementations have to handle in different formats and database backends.
Keys that are REQUIRED on the outermost level of a Property Definition, but otherwise OPTIONAL:
$schema
: String. A URL for a meta schema that describes the Property Definitions format. For Property Definitions adhering to the format described in this document, it should be set to:https://schemas.optimade.org/meta/v1.2/optimade/property_definition.json
.
$id
: String. A static IRI identifier that is a URN or URL representing the specific version of this level of the defined property. (If it is a URL, clients SHOULD NOT assign any interpretation to the response when resolving that URL.) It SHOULD NOT be changed as long as the property definition remains the same, and MUST be changed when the property definition changes.title
: String. A short single-line human-readable explanation of the defined property appropriate to show as part of a user interface.
description
: String. A human-readable multi-line description that explains the purpose, requirements, and conventions of the defined property. The format SHOULD be a one-line description, followed by a new paragraph (two newlines), followed by a more detailed description of all the requirements and conventions of the defined property. Formatting in the text SHOULD use Markdown in the CommonMark v0.3 format format, with mathematical expressions written to render correctly with the LaTeX mode of Mathjax 3.2. When possible, it is preferable for mathematical expressions to use as straightforward notation as possible to make them readable also when not rendered.
OPTIONAL keys
$comment
: String. A human-readable comment relevant in the context of the raw definition data. These comments should normally not be shown to the end users. Comments pertaining to the Property Definition that are relevant to end users should go into the fielddescription
. Formatting in the text SHOULD use Markdown using the format described in the definition of the description field.deprecated
: Boolean. IfTRUE
, implementations SHOULD not use the defined property, and it MAY be removed in the future. IfFALSE
, the defined property is not deprecated. The field not being present meansFALSE
. A Property Definition marked as deprecated is generally considered to be the same as its non-deprecated counterpart, i.e., it SHOULD retain its$id
.examples
: List. A list of example values that the defined property can have. These examples MUST all be of a data type that matches thetype
field and otherwise adhere to the rest of the Property Definition.enum
: List. The defined property MUST take one of the values given in the provided list. The items in the list MUST all be of a data type that matches thetype
field and otherwise adhere to the rest of the Property Definition. If this key is given, fornull
to be a valid value of the defined property, the list MUST contain anull
value and thetype
MUST be a list where the second value is the string"null"
.
Furthermore, depending on what string the type
is equal to, or contains as first element, the following additional requirements also apply:
"object"
:REQUIRED
properties
: Dictionary. Gives key-value pairs where each value is an inner Property Definition. The defined property is a dictionary that can only contain keys present in this dictionary, and, if so, the corresponding value is described by the respective inner Property Definition. (Or, if thetype
field is the list "object" and "null", it can also benull
.)
OPTIONAL
required
: List. The list MUST only contain strings. The defined property MUST have keys that match all the strings in this list. Other keys present in theproperties
field are OPTIONAL in the defined property. If not present or empty, all keys inproperties
are regarded as OPTIONAL.maxProperties
: Integer. The defined property is a dictionary where the number of keys MUST be less than or equal to the number given.minProperties
: Integer. The defined property is a dictionary where the number of keys MUST be greater than or equal to the number given.dependentRequired
: Dictionary. The dictionary keys are strings and the values are lists of unique strings. If the defined property has a key that is equal to a key in the given dictionary, the defined property MUST also have keys that match each of the corresponding values. No restriction is inferred from this field for keys in the defined property that do not match any key in the given dictionary.
"array"
:REQUIRED
items
: Dictionary. Specifies an inner Property Definition. The defined property is a list where each item MUST match this inner Property Definition.
OPTIONAL
uniqueItems
: Boolean. IfTRUE
, the defined property is an array that MUST only contain unique items. IfFALSE
, this field sets no limitation on the defined property.
Furthermore, despite being defined in the JSON Schema standard, the fields
minItems
andmaxItems
MUST NOT be used to indicate limits of the number of items of a list. See the definition of the x-optimade-dimensions field for more information."integer"
:OPTIONAL
multipleOf
: Integer. An integer is strictly greater than 0. The defined property MUST have an integer value that when divided by the given integer results in an integer (i.e., it must be even divisible by this integer without a fractional part).maximum
: Integer. The defined property is an integer that MUST be less than or equal to this number.exclusiveMaximum
: Integer. The defined property is an integer that MUST be strictly less than this number; it cannot be equal to the number.minimum
: Integer. The defined property is an integer that MUST be greater than or equal to this number.exclusiveMinimum
: Integer. The defined property is an integer that MUST be strictly greater than this number; it cannot be equal to the number.
"number"
:OPTIONAL
multipleOf
: Float. An integer is strictly greater than 0. The defined property MUST have an integer value that when divided by the given integer results in an integer (i.e., it must be even divisible by this integer without a fractional part).maximum
: Float. The defined property is a float that MUST be less than or equal to this number.exclusiveMaximum
: Float. The defined property is a float that MUST be strictly less than this number; it cannot be equal to the number.minimum
: Float. The defined property is a float that MUST be greater than or equal to this number.exclusiveMinimum
: Float. The defined property is a float that MUST be strictly greater than this number; it cannot be equal to the number.
"string"
:OPTIONAL
maxLength
: Integer. A non-negative integer. The defined property is a string that MUST have a length that is less than or equal to the given integer. (The length of the string is the number of individual Unicode characters it is composed of.)minLength
: Integer. A non-negative integer. The defined property is a string that MUST have a length that is less than or equal to the given integer. (The definition of the length of a string is the same as in the fieldmaxLength
.)format
: String. Choose one of the following values to indicate that the defined property is a string that MUST adhere to the specified format:"date-time"
: the date-time production in RFC 3339 section 5.6."date"
: the full-date production in RFC 3339 section 5.6."time"
: the full-time production in RFC 3339 section 5.6."duration"
: the duration production in RFC 3339 Appendix A."email"
: the "Mailbox" ABNF rule in RFC 5321 section 4.1.2."uri"
: a string instance is valid against this attribute if it is a valid URI according to RFC 3986."iri"
: a string instance is valid against this attribute if it is a valid IRI according to RFC 3987.
pattern
: String. This string SHOULD be a valid regular expression, according to the ECMA-262 regular expression dialect. A string instance is considered valid if the regular expression matches the instance successfully. The regular expression is not implicitly anchored, i.e., it can match the string at any position unless the expression contains a leading '^' or a trailing '$'.
A complete example of a Property Definition is found in the appendix Property Definition Example.
In OPTIMADE, there is no facility to allow a property to be represented in a choice of units, e.g., either ångström (Å) or meter (m). The unit is always permanently fixed by the Property Definition. Clients and servers that use other units internally thus have to do unit conversions as part of preparing and processing OPTIMADE responses.
The physical unit of a property, the embedded items of a list, or values of a dictionary, are defined with the field x-optimade-unit
with the following requirements:
- The field MUST be given with a non-
null
value both at the highest level in the OPTIMADE Property Definition and all inner Property Definitions. - If the property refers to a physical quantity that is dimensionless and unitless (often also referred to as having the dimension 1) or refers to a dimensionless and unitless count of something (e.g., the number of protons in a nucleus) the field MUST have the value
dimensionless
. However, quantities that use counting units, e.g., the mole, or quantities that use dimensionless units, e.g., the radian MUST NOT set the field todimensionless
. - If the property refers to an entity for which the assignment of a unit would not make sense, e.g., a string representing a chemical formula or a serial number the field MUST have the value
inapplicable
. - If the field does not take the value
dimensionless
orinapplicable
, it MUST be set to a single unit symbol or a Compound Unit Expressions from a set of unit symbols using the format described in Compound Unit Expressions. - All unit symbols used in
x-optimade-unit
fields at any level in a Property Definition MUST be defined in theunits
field inside thex-optimade-property
field in the outermost level of the Property Definition, or in theunits
field in the Entry info endpoint (the latter is only possible for Property Definitions embedded in such a response). - The
units
MUST be a list of dictionaries using the format for OPTIMADE Physical Unit Definitions described in Physical Unit Definitions.
A Compound Unit Expression is formed by a sequence of symbols for units or constants separated by a single multiplication *
character.
Each symbol can also be suffixed by a single ^
character followed by a positive or negative integer to indicate the power of the preceding symbol, e.g., m^3
for cubic meter, m^-3
for inverse cubic meter.
(Positive integers MUST NOT be preceded by a plus sign.)
Each unit or constant symbol MAY be directly prefixed by a prefix symbol.
A prefix symbol MUST be directly followed by a unit symbol, i.e., it MUST NOT be used on its own, and MUST NOT be followed by ^
to indicate a power.
When defining prefix symbols it is important to ensure that they do not introduce ambiguity.
If there are ambiguous interpretations of a symbol as either having or not having a prefix, it MUST be interpreted as a unit without a prefix.
Furthermore:
- No whitespace, parentheses, or other symbols than specified above are permitted.
- The (prefixed) unit and constant symbols MUST appear in alphabetical order.
An OPTIMADE Physical Unit Definition is a dictionary adhering to the following format:
REQUIRED keys:
$schema
: String. A URL for a meta schema that describes the Physical Unit Definitions format. For Property Definitions adhering to the format described in this document, it should be set to:https://schemas.optimade.org/meta/v1.2/optimade/physical_unit_definition.json
.x-optimade-definition
: Dictionary. The same field as defined in the definition of the x-optimade-definition field for Property Definitions but where thekind
subfield MUST beunit
.
$id
: String. A static IRI identifier that is a URN or URL representing the specific version of the Physical Unit Definition. (If it is a URL, clients SHOULD NOT assign any interpretation to the response when resolving that URL.) It SHOULD NOT be changed as long as the Physical Unit Definition remains the same, and SHOULD be changed when the definition changes. Physical Unit Definitions SHOULD be regarded as the same if they only differ by:- Additions of annotating notes to end of the
description
field. - Changes to the following specific fields at any level:
deprecated
and$comment
.
- Additions of annotating notes to end of the
symbol
: String. Specifies the symbol to be used inx-optimade-unit
to reference this unit.title
: String. A human-readable single-line string name for the unit.description
: String. A human-readable multiple-line detailed description of the unit.Additions appended to the end of the
description
field that are clearly marked as notes that clarify the definition without changing it are viewed as annotations to the Physical Unit Definition rather than an integral part of it. Such annotations SHOULD only be added to the end of an otherwise unmodifieddescription
and MUST NOT change the meaning or interpretation of the text above them. The purpose is to provide a way to add explanations and clarifications to a definition without having to regard it as a new definition. For example, these annotations to the description MAY be used to explain why a definition has been deprecated.
OPTIONAL keys:
standard
: Dictionary. This field is used to express that the unit is part of a preexisting standard. The dictionary has the following format:REQUIRED keys:
name
: String. The abbreviated name of the standard being referenced. One of the following:"si"
: the symbol is defined as part of the SI standard of unit symbols and prefixes."codata"
: the symbol is defined as part of one of the CODATA series of publications."iso-iec-80000"
: the symbol is defined in the iso-iec-80000 standard."gnu units"
: the symbol is a (compound) unit expression based on the symbols in the filedefinitions.units
distributed with GNU Units software.A standard set of symbols for units and prefixes for OPTIMADE is taken from version 3.15 of the (separately versioned) unit database
definitions.units
included with the source distribution of GNU Units version 2.22. A prefix is indicated in the file by a trailing-
, but that trailing character MUST NOT be included when using it as a prefix. If the unit is available in this database, or if it can be expressed as a Compound Unit Expression using these units and prefixes, the value ofx-optimade-unit
SHOULD use the (compound) string symbol. If there are multiple prefixes in the file with the same meaning, an implementation SHOULD use the shortest one consisting of only lowercase letters a-z and underscores, but no other symbols. If there are multiple ones with the same shortest length, then the first one of those SHOULD be used. For example, the GNU Units database defines the symbol"km"
for kilometers by a combination of thek-
SI kilo prefix and them
symbol for the SI meter unit."ucum"
: the symbol is defined in The Unified Code for Units of Measure (UCUM) standard."qudt"
: the symbol is defined in the QUDT standard. Not only symbols strictly defined within the standard are allowed, but also other compound unit expressions created according to the scheme for how new such symbols are formed in this standard.
symbol
: String. The symbol to use from the referenced standard, expressed according to that standard. The field MAY use mathematical expressions written the same way as described in the definition of the description field. This field MAY be different from the symbol being defined via the definition if the unit will be referenced inx-optimade-unit
field using a different symbol than the one used in the standard or if the symbol is expressed in the standard in a way that requires mathematical notation. However, if possible, thesymbol
field SHOULD be the same.
OPTIONAL keys:
version
: String. The version string of the referenced standard.year
: Integer. The year that the standard adopted the definition.category
The category of the definition in case the standard uses categories to organize definitions.
alternate-symbols
: List of String. A list of other symbols that are commonly associated with the unit. The stings MAY use mathematical expressions written the same way as described in the definition of the description field.property-format
: String. Specifies the minor version of the Property Definitions format that the Physical Units Definition is expressed in. (The Physical Units Definition format is not versioned independently.) The format is the same as described above for the definition of the property-format field in Property Definitions. This field MUST be included when Physical Unit Definitions are used standalone, i.e., when they are not embedded inside a Property Definition that already declares aproperty-format
at the top level.version
: String. This string indicates the version of the Physical Unit Definition. The string SHOULD be in the format described by the semantic versioning v2 standard.resources
: List of Dictionaries. A list of dictionaries that reference remote resources that describe the unit. The format of each dictionary is:REQUIRED keys:
relation
: String. A human-readable description of the relationship between the unit and the remote resource, e.g., a "natural language description".resource-id
: String. An IRI of the external resource (which MAY be a resolvable URL).
defining-relation
: Dictionary. A dictionary that encodes a defining relation to another unit or set of units, with the primary intended use of relating a unit to its definition in SI units, if such a relationship exists. Some units, e.g., the atomic mass unit (also known as dalton, commonly denotedu
), only has an approximate relationship to SI units, in which case thedefining-relation
MUST be omitted ornull
. The dictionary MUST adhere to the following format:OPTIONAL keys:
base-units
: List of Dictionaries. A list specifying the base IRIs and unit symbols for the units in which the dimensional formula for the defining relation is expressed. Each item MUST be a dictionary that adheres to the following format:REQUIRED keys:
symbol
: String. The symbol used to reference this unit in the dimensional formula.id
: String. The IRI of one of the units referenced in the dimensional formula for the defining relation.
base-units-expression
: String. A string expressing the base units part of the defining relation for the unit being defined. It MUST adhere to the format for compound unit expression described in Physical Units in Property Definitions. If the field is missing ornull
the base-units-expression is taken to be equal to 1, i.e., the defining relation is dimensionless.scale
: Dictionary. A dictionary specifying the scale in the defining relation, adhering to the following format:OPTIONAL keys:
numerator
: Integer.denominator
: Integer.base
: Integer.exponent
: Integer.
These four fields specify the value as the rational number
numerator
/denominator
, multiplied bybase
to the power ofexponent
. If omitted ornull
, the defaults for thenumerator
,denominator
,base
, andexponent
are respectively 1, 1, 10, and 0.standard_uncertainty
: Float. The standard uncertainty of the value used in the defining relation. Some definitions define an entity (e.g. a constant) to a specific value along with an uncertainty of that value.
offset
: Dictionary. A dictionary specifying the offset value, adhering to the same format asscale
. If omitted ornull
, the defaults for thenumerator
,denominator
, andexponent
are respectively 0, 1, and 0.
If the fields in
scale
are designated assn
,sd
, andse
; and the fields inoffset
are designated ason
,od
, andoe
; andbase-units-expression
is designated asb
, these fields state the following defining relation: a valuev
multiplied by the unit being defined is equal to the following expression(v * (sn/sd) * 10**se + (on/od) * 10**oe)*b
, where*
designates multiplication and**
designates exponentiation. For example, the defining relation of the temperature unit FahrenheitF
in CelsiusC
, that says thatx F = (x - 32) * (5/9) C = 5/9 x + (-160/9) C
could be expressed as follows:"defining-relation": { "base-units": [ { "symbol": "C", "id": "https://units.example.com/celsius" } ], "base-units-expression": "C", "scale": { "numerator": 5, "denominator": 9 }, "offset": { "numerator": -160, "denominator": 9 } }
approximate-relations
: List of Dictionary. A list of dictionaries that encode approximate relations to another unit or set of units. The intended use is to express one or a few approximate relationships from the unit being defined to other unit systems (primarily intended to be SI). This field is useful for units not defined by such a relationship, in which case thedefining-relation
field would be used. For example, the atomic mass unit (also known as dalton, commonly denotedu
) is defined as one twelfth of the mass of a free carbon-12 atom at rest and only has an approximate relationship to the SI kilogram. While this field allows expressing multiple relationships, the intent is only to provide the most relevant relationships (e.g., to an SI base unit) from which other relationships can be derived.Each element in the list MUST be a dictionary adhering to the following format:
OPTIONAL keys:
base-units
: List of Dictionaries, andbase-units-expression
: String. These fields take the same format and roles as in the definition of defining-relationscale
: Dictionary. A dictionary specifying the scale in the approximate relation. It MUST adhere to the following format:REQUIRED keys:
value
: Float. The value of the scale in the approximate relation.
OPTIONAL keys:
standard_uncertainty
: Float. The standard uncertainty of the value in the approximate relation.relative_standard_uncertainty
: Float. The relative standard uncertainty of the value in the approximate relation.
offset
: Dictionary. A dictionary specifying the offset in the approximate relation. It MUST adhere to the same format as thescale
field above.
The values for
scale
andoffset
take the same meaning as in the definition of defining-relation to express a relationship between the unit being defined and the compound unit expression inbase-units-expression
.deprecated
: Boolean. IfTRUE
, implementations SHOULD not use the unit defined in this Physical Unit Definition. IfFALSE
, the unit defined in this Physical Unit Definition is not deprecated. The field not being present meansFALSE
.$comment
: String. A human-readable comment relevant in the context of the raw definition data. These comments should normally not be shown to the end users. Comments pertaining to the Property Definition that are relevant to end users should go into the fielddescription
. Formatting in the text SHOULD use Markdown using the format described in the definition of the description field of Property Definitions.
An example of a Physical Unit Definition, including a defining relation that involves more than one other unit, is embedded in the example of a Property Definition in the appendix Property Definition Example.
Prefixes and constants are defined in OPTIMADE using nearly identical schemas as the one for units in Physical Unit Definitions. The only difference is that for prefixes:
- The
$schema
SHOULD be set to: "https://schemas.optimade.org/meta/v1.2/optimade/prefix_definition.json". - The subfield
kind
of the fieldx-optimade-definition
MUST beprefix
.
And for Constants:
- The
$schema
SHOULD be set to: "https://schemas.optimade.org/meta/v1.2/optimade/constant_definition.json". - The subfield
kind
of the fieldx-optimade-definition
MUST beconstant
.
Implementations MAY add their own keys in Property Definitions, both inside and outside of the fields x-optimade-property
, x-optimade-implementation
, and x-optimade-requirements
in the form of x-exmpl-name
where exmpl
is the database-specific prefix (without underscore characters) and name
is the part of the key chosen by the implementation.
Implementations MUST NOT add keys to property definitions on other formats.
Client and server implementations that interpret an OPTIMADE Property Definition and encounter unrecognized keys starting with x-exmpl-
where exmpl
is a recognized database prefix MAY issue errors or warnings.
Other unrecognized keys starting with x-
MUST NOT issue errors, SHOULD NOT issue warnings, and MUST otherwise be ignored.
To allow forward compatibility with future versions of both OPTIMADE and the JSON Schema standards, unrecognized keys that do not start with x-
SHOULD issue a warning but MUST otherwise be ignored.
This section defines standard entry types and their properties.
- Description: An entry's ID as defined in section Definition of Terms.
- Type: string.
- Requirements/Conventions:
- Support: MUST be supported by all implementations, MUST NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- Response: REQUIRED in the response.
- See section Definition of Terms.
- Support: MUST be supported by all implementations, MUST NOT be
- Examples:
"db/1234567"
"cod/2000000"
"cod/2000000@1234567"
"nomad/L1234567890"
"42"
- Description: The name of the type of an entry.
- Type: string.
- Requirements/Conventions:
- Support: MUST be supported by all implementations, MUST NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- Response: REQUIRED in the response.
- MUST be an existing entry type.
- The entry of type
<type>
and ID<id>
MUST be returned in response to a request for/<type>/<id>
under the versioned or unversioned base URL serving the API.
- Support: MUST be supported by all implementations, MUST NOT be
- Examples:
"structures"
- Description: The entry's immutable ID (e.g., a UUID). This is important for databases having preferred IDs that point to "the latest version" of a record, but still offer access to older variants. This ID maps to the version-specific record, in case it changes in the future.
- Type: string.
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Examples:
"8bd3e750-b477-41a0-9b11-3a799f21b44f"
"fjeiwoj,54;@=%<>#32"
(Strings that are not URL-safe are allowed.)
- Description: Date and time representing when the entry was last modified.
- Type: timestamp.
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- Response: REQUIRED in the response unless the query parameter
response_fields
is present and does not include this property.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
- As part of JSON response format:
"2007-04-05T14:30:20Z"
(i.e., encoded as an RFC 3339 Internet Date/Time Format string.)
- As part of JSON response format:
- Description: Providers are able to add database-provider-specific and definition-provider-specific properties in the output of both standard entry types and custom entry types.
Similarly, an implementation MAY add keys with a namespace prefix to dictionary properties and their sub-dictionaries.
For example, the database-provider-specific property
_exmpl_oxidation_state
, can be placed within the OPTIMADE propertyspecies
. - Type: Decided by the API implementation. MUST be one of the OPTIMADE Data types.
- Requirements/Conventions:
- Support: Support for database-provider-specific properties is fully OPTIONAL.
- Query: Support for queries on these properties are OPTIONAL. If supported, only a subset of the filter features MAY be supported.
- Response: API implementations are free to choose whether database-provider-specific properties are only included when requested using the query parameter
response_fields
, or if they are included also whenresponse_fields
is not present. Implementations are thus allowed to decide that some of these properties are part of what can be seen as the default value ofresponse_fields
when that query parameter is omitted. Implementations SHOULD NOT include database-provider-specific properties when the query parameterresponse_fields
is present but does not list them. - These MUST be prefixed by a database-provider-specific prefix (see appendix Namespace Prefixes).
- Implementations MUST add the properties to the list of
properties
under the respective entry listinginfo
endpoint (see Entry Listing Info Endpoints).
- Examples: A few examples of valid database-provided-specific property names, for a predefined prefix
_exmpl
, are as follows:_exmpl_formula_sum
_exmpl_band_gap
_exmpl_supercell
_exmpl_trajectory
_exmpl_workflow_id
structures
entries (or objects) have the properties described above in section Properties Used by Multiple Entry Types, as well as the following properties:
- Description: The chemical symbols of the different elements present in the structure.
- Type: list of strings.
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- The strings are the chemical symbols, i.e., either a single uppercase letter or an uppercase letter followed by a number of lowercase letters.
- The order MUST be alphabetical.
- MUST refer to the same elements in the same order, and therefore be of the same length, as elements_ratios, if the latter is provided.
- Note: This property SHOULD NOT contain the string "X" to indicate non-chemical elements or "vacancy" to indicate vacancies (in contrast to the field
chemical_symbols
for thespecies
property).
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
["Si"]
["Al","O","Si"]
- Query examples:
- A filter that matches all records of structures that contain Si, Al and O, and possibly other elements:
elements HAS ALL "Si", "Al", "O"
. - To match structures with exactly these three elements, use
elements HAS ALL "Si", "Al", "O" AND elements LENGTH 3
. - Note: length queries on this property can be equivalently formulated by filtering on the nelements property directly.
- A filter that matches all records of structures that contain Si, Al and O, and possibly other elements:
- Description: Number of different elements in the structure as an integer.
- Type: integer
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- MUST be equal to the lengths of the list properties elements and elements_ratios, if they are provided.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
3
- Querying:
- Note: queries on this property can equivalently be formulated using
elements LENGTH
. - A filter that matches structures that have exactly 4 elements:
nelements=4
. - A filter that matches structures that have between 2 and 7 elements:
nelements>=2 AND nelements<=7
.
- Note: queries on this property can equivalently be formulated using
- Description: Relative proportions of different elements in the structure.
- Type: list of floats
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- Composed by the proportions of elements in the structure as a list of floating point numbers.
- The sum of the numbers MUST be 1.0 (within floating point accuracy)
- MUST refer to the same elements in the same order, and therefore be of the same length, as elements, if the latter is provided.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
[1.0]
[0.3333333333333333, 0.2222222222222222, 0.4444444444444444]
- Query examples:
- Note: Useful filters can be formulated using the set operator syntax for correlated values. However, since the values are floating point values, the use of equality comparisons is generally inadvisable.
- OPTIONAL: a filter that matches structures where approximately 1/3 of the atoms in the structure are the element Al is:
elements:elements_ratios HAS ALL "Al":>0.3333, "Al":<0.3334
.
- Description: The chemical formula for a structure as a string in a form chosen by the API implementation.
- Type: string
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- The chemical formula is given as a string consisting of properly capitalized element symbols followed by integers or decimal numbers, balanced parentheses, square, and curly brackets
(
,)
,[
,]
,{
,}
, commas, the+
,-
,:
and=
symbols. The parentheses are allowed to be followed by a number. Spaces are allowed anywhere except within chemical symbols. The order of elements and any groupings indicated by parentheses or brackets are chosen freely by the API implementation. - The string SHOULD be arithmetically consistent with the element ratios in the
chemical_formula_reduced
property. - It is RECOMMENDED, but not mandatory, that symbols, parentheses and brackets, if used, are used with the meanings prescribed by IUPAC's Nomenclature of Organic Chemistry.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
"(H2O)2 Na"
"NaCl"
"CaCO3"
"CCaO3"
"(CH3)3N+ - [CH2]2-OH = Me3N+ - CH2 - CH2OH"
- Query examples:
- Note: the free-form nature of this property is likely to make queries on it across different databases inconsistent.
- A filter that matches an exactly given formula:
chemical_formula_descriptive="(H2O)2 Na"
. - A filter that does a partial match:
chemical_formula_descriptive CONTAINS "H2O"
.
- Description: The reduced chemical formula for a structure as a string with element symbols and integer chemical proportion numbers. The proportion number MUST be omitted if it is 1.
- Type: string
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property.
However, support for filters using partial string matching with this property is OPTIONAL (i.e., BEGINS WITH, ENDS WITH, and CONTAINS).
Intricate queries on formula components are instead suggested to be formulated using set-type filter operators on the multi valued
elements
andelements_ratios
properties. - Element symbols MUST have proper capitalization (e.g.,
"Si"
, not"SI"
for "silicon"). - Elements MUST be placed in alphabetical order, followed by their integer chemical proportion number.
- For structures with no partial occupation, the chemical proportion numbers are the smallest integers for which the chemical proportion is exactly correct.
- For structures with partial occupation, the chemical proportion numbers are integers that within reasonable approximation indicate the correct chemical proportions. The precise details of how to perform the rounding is chosen by the API implementation.
- No spaces or separators are allowed.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
"H2NaO"
"ClNa"
"CCaO3"
- Query examples:
- A filter that matches an exactly given formula is
chemical_formula_reduced="H2NaO"
.
- A filter that matches an exactly given formula is
- Description: The chemical formula for a structure in Hill form with element symbols followed by integer chemical proportion numbers. The proportion number MUST be omitted if it is 1.
- Type: string
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL. If supported, only a subset of the filter features MAY be supported.
- The overall scale factor of the chemical proportions is chosen such that the resulting values are integers that indicate the most chemically relevant unit of which the system is composed.
For example, if the structure is a repeating unit cell with four hydrogens and four oxygens that represents two hydroperoxide molecules,
chemical_formula_hill
is"H2O2"
(i.e., not"HO"
, nor"H4O4"
). - If the chemical insight needed to ascribe a Hill formula to the system is not present, the property MUST be handled as unset.
- Element symbols MUST have proper capitalization (e.g.,
"Si"
, not"SI"
for "silicon"). - Elements MUST be placed in Hill order, followed by their integer chemical proportion number. Hill order means: if carbon is present, it is placed first, and if also present, hydrogen is placed second. After that, all other elements are ordered alphabetically. If carbon is not present, all elements are ordered alphabetically.
- If the system has sites with partial occupation and the total occupations of each element do not all sum up to integers, then the Hill formula SHOULD be handled as unset.
- No spaces or separators are allowed.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Examples:
"H2O2"
- Query examples:
- A filter that matches an exactly given formula is
chemical_formula_hill="H2O2"
.
- A filter that matches an exactly given formula is
- Description: The anonymous formula is the
chemical_formula_reduced
, but where the elements are instead first ordered by their chemical proportion number, and then, in order left to right, replaced by anonymous symbols A, B, C, ..., Z, Aa, Ba, ..., Za, Ab, Bb, ... and so on. - Type: string
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property. However, support for filters using partial string matching with this property is OPTIONAL (i.e., BEGINS WITH, ENDS WITH, and CONTAINS).
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
"A2B"
"A42B42C16D12E10F9G5"
- Querying:
- A filter that matches an exactly given formula is
chemical_formula_anonymous="A2B"
.
- A filter that matches an exactly given formula is
- Description: List of three integers describing the periodicity of the boundaries of the unit cell.
For each direction indicated by the three lattice_vectors, this list indicates if the direction is periodic (value
1
) or non-periodic (value0
). Note: the elements in this list each refer to the direction of the corresponding entry in lattice_vectors and not the Cartesian x, y, z directions. - Type: list of integers.
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: Support for queries on this property is OPTIONAL.
- MUST be a list of length 3.
- Each integer element MUST assume only the value 0 or 1.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
- A nonperiodic structure, for example, for a single molecule:
[0, 0, 0]
- A unit cell that is periodic in the direction of the third lattice vector, for example for a carbon nanotube:
[0, 0, 1]
- For a 2D surface/slab, with a unit cell that is periodic in the direction of the first and third lattice vectors:
[1, 0, 1]
- For a bulk 3D system with a unit cell that is periodic in all directions:
[1, 1, 1]
- A nonperiodic structure, for example, for a single molecule:
- Description: An integer specifying the number of periodic dimensions in the structure, equivalent to the number of non-zero entries in dimension_types.
- Type: integer
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- The integer value MUST be between 0 and 3 inclusive and MUST be equal to the sum of the items in the dimension_types property.
- This property only reflects the treatment of the lattice vectors provided for the structure, and not any physical interpretation of the dimensionality of its contents.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
2
should be indicated in cases wheredimension_types
is any of[1, 1, 0]
,[1, 0, 1]
,[0, 1, 1]
.
- Query examples:
- Match only structures with exactly 3 periodic dimensions:
nperiodic_dimensions=3
- Match all structures with 2 or fewer periodic dimensions:
nperiodic_dimensions<=2
- Match only structures with exactly 3 periodic dimensions:
- Description: The three lattice vectors in Cartesian coordinates, in ångström (Å).
- Type: list of list of floats or unknown values.
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: Support for queries on this property is OPTIONAL. If supported, filters MAY support only a subset of comparison operators.
- MUST be a list of three vectors a, b, and c, where each of the vectors MUST BE a list of the vector's coordinates along the x, y, and z Cartesian coordinates. (Therefore, the first index runs over the three lattice vectors and the second index runs over the x, y, z Cartesian coordinates).
- For databases that do not define an absolute Cartesian system (e.g., only defining the length and angles between vectors), the first lattice vector SHOULD be set along x and the second on the xy-plane.
- MUST always contain three vectors of three coordinates each, independently of the elements of property dimension_types.
The vectors SHOULD by convention be chosen so the determinant of the
lattice_vectors
matrix is different from zero. The vectors in the non-periodic directions have no significance beyond fulfilling these requirements. - The coordinates of the lattice vectors of non-periodic dimensions (i.e., those dimensions for which dimension_types is
0
) MAY be given as a list of allnull
values. If a lattice vector contains the valuenull
, all coordinates of that lattice vector MUST benull
.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
[[4.0,0.0,0.0],[0.0,4.0,0.0],[0.0,1.0,4.0]]
represents a cell, where the first vector is(4, 0, 0)
, i.e., a vector aligned along thex
axis of length 4 Å; the second vector is(0, 4, 0)
; and the third vector is(0, 1, 4)
.
Description: a list of symmetry operations given as general position x, y and z coordinates in algebraic form.
Type list of strings
Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
.- The property is RECOMMENDED if coordinates are returned in a form to which these operations can or must be applied (e.g. fractional atom coordinates of an asymmetric unit).
- The property is REQUIRED if symmetry operations are necessary to reconstruct the full model of the material and no other symmetry information (e.g., the Hall symbol) is provided that would allow the user to derive symmetry operations unambiguously.
- Query: Support for queries on this property is not required and in fact is NOT RECOMMENDED.
- MUST be
null
ifnperiodic_dimensions
is equal to 0. - Each symmetry operation is described by a string that gives that symmetry operation in Jones' faithful representation (Bradley & Cracknell, 1972: pp. 35-37), adapted for computer string notation.
- The letters
x
,y
andz
that are typesetted with overbars in printed text represent coordinate values multiplied by -1 and are encoded as-x
,-y
and-z
, respectively. - The syntax of the strings representing symmetry operations MUST conform to regular expressions given in appendix The Symmetry Operation String Regular Expressions.
- The interpretation of the strings MUST follow the conventions of the IUCr CIF core dictionary (IUCr, 2023). In particular, this property MUST explicitly provide all symmetry operations needed to generate all the atoms in the unit cell from the atoms in the asymmetric unit, for the setting used.
- This symmetry operation set MUST always include the
x,y,z
identity operation. - The symmetry operations are to be applied to fractional atom coordinates. In case only Cartesian coordinates are available, these Cartesian coordinates must be converted to fractional coordinates before the application of the provided symmetry operations.
- If the symmetry operation list is present, it MUST be compatible with other space group specifications (e.g. the ITC space group number, the Hall symbol, the Hermann-Mauguin symbol) if these are present.
- Support: OPTIONAL support in implementations, i.e., MAY be
Examples:
- Space group operations for the space group with ITC number 3 (H-M symbol
P 2
, extended H-M symbolP 1 2 1
, Hall symbolP 2y
):["x,y,z", "-x,y,-z"]
- Space group operations for the space group with ITC number 5 (H-M symbol
C 2
, extended H-M symbolC 1 2 1
, Hall symbolC 2y
):["x,y,z", "-x,y,-z", "x+1/2,y+1/2,z", "-x+1/2,y+1/2,-z"]
- Space group operations for the space group with ITC number 3 (H-M symbol
Notes: The list of space group symmetry operations applies to the whole periodic array of atoms and together with the lattice translations given in the
lattice_vectors
property provides the necessary information to reconstruct all atom site positions of the periodic material. Thus, the symmetry operations described in this property are only applicable to material models with at least one periodic dimension. This property is not meant to represent arbitrary symmetries of molecules, non-periodic (finite) collections of atoms or non-crystallographic symmetry.Bibliographic References:
Bradley, C. J. and Cracknell, A. P. (1972) The Mathematical Theory of Symmetry in Solids. Oxford, Clarendon Press (paperback edition 2010) 745 p. ISBN 978-0-19-958258-7.
IUCr (2023) Core dictionary (coreCIF) version 2.4.5; data name _space_group_symop_operation_xyz. Available from: https://www.iucr.org/__data/iucr/cifdic_html/1/cif_core.dic/Ispace_group_symop_operation_xyz.html [Accessed 2023-06-18T16:46+03:00].
Description: A Hall space group symbol representing the symmetry of the structure as defined in (Hall, 1981, 1981a).
Type: string
Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- The change-of-basis operations are used as defined in the International Tables of Crystallography (ITC) Vol. B, Sect. 1.4, Appendix A1.4.2 (IUCr, 2001).
- Each component of the Hall symbol MUST be separated by a single space symbol.
- If there exists a standard Hall symbol which represents the symmetry it SHOULD be used.
- MUST be
null
ifnperiodic_dimensions
is not equal to 3.
- Support: OPTIONAL support in implementations, i.e., MAY be
Examples:
- Space group symbols with explicit origin (the Hall symbols):
P 2c -2ac
-I 4bd 2ab 3
- Space group symbols with change-of-basis operations:
P 2yb (-1/2*x+z,1/2*x,y)
-I 4 2 (1/2*x+1/2*y,-1/2*x+1/2*y,z)
- Space group symbols with explicit origin (the Hall symbols):
Bibliographic References:
Hall, S. R. (1981) Space-group notation with an explicit origin. Acta Crystallographica Section A, 37, 517-525, International Union of Crystallography (IUCr), DOI: https://doi.org/10.1107/s0567739481001228
Hall, S. R. (1981a) Space-group notation with an explicit origin; erratum. Acta Crystallographica Section A, 37, 921-921, International Union of Crystallography (IUCr), DOI: https://doi.org/10.1107/s0567739481001976
IUCr (2001). International Tables for Crystallography vol. B. Reciprocal Space. Ed. U. Shmueli. 2-nd edition. Dordrecht/Boston/London, Kluwer Academic Publishers.
Description A human- and machine-readable string containing the short Hermann-Mauguin (H-M) symbol which specifies the space group of the structure in the response.
Type: string
Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- The H-M symbol SHOULD aim to convey the closest representation of the symmetry information that can be specified using the short format used in the International Tables for Crystallography vol. A (IUCr, 2005), Table 4.3.2.1 as described in the accompanying text.
- The symbol MAY be a non-standard short H-M symbol.
- The H-M symbol does not unambiguously communicate the axis, cell, and origin choice, and the given symbol SHOULD NOT be amended to convey this information.
- To encode as character strings, the following adaptations MUST be made when representing H-M symbols given in their typesetted form:
- the overbar above the numbers MUST be changed to the minus sign in front of the digit (e.g. '-2');
- subscripts that denote screw axes are written as digits immediately after the axis designator without a space (e.g. 'P 32')
- the space group generators MUST be separated by a single space (e.g. 'P 21 21 2');
- there MUST be no spaces in the space group generator designation (i.e. use 'P 21/m', not the 'P 21 / m');
- Support: OPTIONAL support in implementations, i.e., MAY be
Examples:
C 2
P 21 21 21
Bibliographic References:
IUCr (2005). International Tables for Crystallography vol. A. Space-Group Symmetry. Ed. Theo Hahn. 5-th edition. Dordrecht, Springer.
Description A human- and machine-readable string containing the extended Hermann-Mauguin (H-M) symbol which specifies the space group of the structure in the response.
Type: string
Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- The H-M symbols SHOULD be given as specified in the International Tables for Crystallography vol. A (IUCr, 2005), Table 4.3.2.1.
- The change-of-basis operation SHOULD be provided for the non-standard axis and cell choices.
- The extended H-M symbol does not unambiguously communicate the origin choice, and the given symbol SHOULD NOT be amended to convey this information.
- The description of the change-of-basis SHOULD follow conventions of the ITC Vol. B, Sect. 1.4, Appendix A1.4.2 (IUCr, 2001).
- The same character string encoding conventions MUST be used as for the specification of the
space_group_symbol_hermann_mauguin
property.
- Support: OPTIONAL support in implementations, i.e., MAY be
Examples:
C 1 2 1
Bibliographic References:
IUCr (2001). International Tables for Crystallography vol. B. Reciprocal Space. Ed. U. Shmueli. 2-nd edition. Dordrecht/Boston/London, Kluwer Academic Publishers.
IUCr (2005). International Tables for Crystallography vol. A. Space-Group Symmetry. Ed. Theo Hahn. 5-th edition. Dordrecht, Springer.
- Description: Space group number which specifies the space group of the structure as defined in the International Tables for Crystallography Vol. A. (IUCr, 2005).
- Type: integer
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- The integer value MUST be between 1 and 230.
- MUST be
null
ifnperiodic_dimensions
is not equal to 3.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Description: Cartesian positions of each site in the structure.
A site is usually used to describe positions of atoms; what atoms can be encountered at a given site is conveyed by the
species_at_sites
property, and the species themselves are described in thespecies
property. - Type: list of list of floats
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: Support for queries on this property is OPTIONAL. If supported, filters MAY support only a subset of comparison operators.
- It MUST be a list of length equal to the number of sites in the structure, where every element is a list of the three Cartesian coordinates of a site expressed as float values in the unit angstrom (Å).
- An entry MAY have multiple sites at the same Cartesian position (for a relevant use of this, see e.g., the property assemblies).
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
[[0,0,0],[0,0,2]]
indicates a structure with two sites, one sitting at the origin and one along the (positive) z-axis, 2 Å away from the origin.
- Description: An integer specifying the length of the
cartesian_site_positions
property. - Type: integer
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: MUST be a queryable property with support for all mandatory filter features.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
42
- Query examples:
- Match only structures with exactly 4 sites:
nsites=4
- Match structures that have between 2 and 7 sites:
nsites>=2 AND nsites<=7
- Match only structures with exactly 4 sites:
- Description: Name of the species at each site (where values for sites are specified with the same order of the property cartesian_site_positions). The properties of the species are found in the property species.
- Type: list of strings.
- Requirements/Conventions:
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
. - Query: Support for queries on this property is OPTIONAL. If supported, filters MAY support only a subset of comparison operators.
- MUST have length equal to the number of sites in the structure (first dimension of the list property cartesian_site_positions).
- Each species name mentioned in the
species_at_sites
list MUST be described in the list property species (i.e. for each value in thespecies_at_sites
list there MUST exist exactly one dictionary in thespecies
list with thename
attribute equal to the correspondingspecies_at_sites
value). - Each site MUST be associated only to a single species. Note: However, species can represent mixtures of atoms, and multiple species MAY be defined for the same chemical element. This latter case is useful when different atoms of the same type need to be grouped or distinguished, for instance in simulation codes to assign different initial spin states.
- Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
- Examples:
["Ti","O2"]
indicates that the first site is hosting a species labeled"Ti"
and the second a species labeled"O2"
.["Ac", "Ac", "Ag", "Ir"]
indicates that the first two sites contain the"Ac"
species, while the third and fourth sites contain the"Ag"
and"Ir"
species, respectively.
Description: A list describing the species of the sites of this structure. Species can represent pure chemical elements, virtual-crystal atoms representing a statistical occupation of a given site by multiple chemical elements, and/or a location to which there are attached atoms, i.e., atoms whose precise location are unknown beyond that they are attached to that position (frequently used to indicate hydrogen atoms attached to another element, e.g., a carbon with three attached hydrogens might represent a methyl group, -CH3).
Type: list of dictionary with keys:
name
: string (REQUIRED)chemical_symbols
: list of strings (REQUIRED)concentration
: list of float (REQUIRED)attached
: list of strings (OPTIONAL)nattached
: list of integers (OPTIONAL)mass
: list of floats (OPTIONAL)original_name
: string (OPTIONAL).
Requirements/Conventions:
Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be
null
.Query: Support for queries on this property is OPTIONAL. If supported, filters MAY support only a subset of comparison operators.
Each list member MUST be a dictionary with the following keys:
name: REQUIRED; gives the name of the species; the name value MUST be unique in the
species
list;chemical_symbols: REQUIRED; MUST be a list of strings of all chemical elements composing this species. Each item of the list MUST be one of the following:
- a valid chemical-element symbol, or
- the special value
"X"
to represent a non-chemical element, or - the special value
"vacancy"
to represent that this site has a non-zero probability of having a vacancy (the respective probability is indicated in theconcentration
list, see below).
If any one entry in the
species
list has achemical_symbols
list that is longer than 1 element, the correct flag MUST be set in the liststructure_features
(see property structure_features).concentration: REQUIRED; MUST be a list of floats, with same length as
chemical_symbols
. The numbers represent the relative concentration of the corresponding chemical symbol in this species. The numbers SHOULD sum to one. Cases in which the numbers do not sum to one typically fall only in the following two categories:- Numerical errors when representing float numbers in fixed precision, e.g. for two chemical symbols with concentrations
1/3
and2/3
, the concentration might look something like[0.33333333333, 0.66666666666]
. If the client is aware that the sum is not one because of numerical precision, it can renormalize the values so that the sum is exactly one. - Experimental errors in the data present in the database. In this case, it is the responsibility of the client to decide how to process the data.
Note that concentrations are uncorrelated between different sites (even of the same species).
- Numerical errors when representing float numbers in fixed precision, e.g. for two chemical symbols with concentrations
attached: OPTIONAL; if provided MUST be a list of length 1 or more of strings of chemical symbols for the elements attached to this site, or "X" for a non-chemical element.
nattached: OPTIONAL; if provided MUST be a list of length 1 or more of integers indicating the number of attached atoms of the kind specified in the value of the
attached
key.The implementation MUST include either both or none of the
attached
andnattached
keys, and if they are provided, they MUST be of the same length. Furthermore, if they are provided, the structure_features property MUST include the stringsite_attachments
.mass: OPTIONAL. If present MUST be a list of floats, with the same length as
chemical_symbols
, providing element masses expressed in a.m.u. Elements denoting vacancies MUST have masses equal to 0.original_name: OPTIONAL. Can be any valid Unicode string, and SHOULD contain (if specified) the name of the species that is used internally in the source database.
Note: With regard to "source database", we refer to the immediate source being queried via the OPTIMADE API implementation. The main use of this field is for source databases that use species names, containing characters that are not allowed (see description of the list property species_at_sites).
For systems that have only species formed by a single chemical symbol, and that have at most one species per chemical symbol, SHOULD use the chemical symbol as species name (e.g.,
"Ti"
for titanium,"O"
for oxygen, etc.) However, note that this is OPTIONAL, and client implementations MUST NOT assume that the key corresponds to a chemical symbol, nor assume that if the species name is a valid chemical symbol, that it represents a species with that chemical symbol. This means that a species{"name": "C", "chemical_symbols": ["Ti"], "concentration": [1.0]}
is valid and represents a titanium species (and not a carbon species).It is NOT RECOMMENDED that a structure includes species that do not have at least one corresponding site.
Examples:
[ {"name": "Ti", "chemical_symbols": ["Ti"], "concentration": [1.0]} ]
: any site with this species is occupied by a Ti atom.[ {"name": "Ti", "chemical_symbols": ["Ti", "vacancy"], "concentration": [0.9, 0.1]} ]
: any site with this species is occupied by a Ti atom with 90 % probability, and has a vacancy with 10 % probability.[ {"name": "BaCa", "chemical_symbols": ["vacancy", "Ba", "Ca"], "concentration": [0.05, 0.45, 0.5], "mass": [0.0, 137.327, 40.078]} ]
: any site with this species is occupied by a Ba atom with 45 % probability, a Ca atom with 50 % probability, and by a vacancy with 5 % probability.[ {"name": "C12", "chemical_symbols": ["C"], "concentration": [1.0], "mass": [12.0]} ]
: any site with this species is occupied by a carbon isotope with mass 12.[ {"name": "C13", "chemical_symbols": ["C"], "concentration": [1.0], "mass": [13.0]} ]
: any site with this species is occupied by a carbon isotope with mass 13.[ {"name": "CH3", "chemical_symbols": ["C"], "concentration": [1.0], "attached": ["H"], "nattached": [3]} ]
: any site with this species is occupied by a methyl group, -CH3, which is represented without specifying precise positions of the hydrogen atoms.
Description: A description of groups of sites that are statistically correlated.
Type: list of dictionary with keys:
sites_in_groups
: list of list of integers (REQUIRED)group_probabilities
: list of floats (REQUIRED)
Requirements/Conventions:
Support: OPTIONAL support in implementations, i.e., MAY be
null
.Query: Support for queries on this property is OPTIONAL. If supported, filters MAY support only a subset of comparison operators.
The property SHOULD be
null
for entries that have no partial occupancies.If present, the correct flag MUST be set in the list
structure_features
(see property structure_features).Client implementations MUST check its presence (as its presence changes the interpretation of the structure).
If present, it MUST be a list of dictionaries, each of which represents an assembly and MUST have the following two keys:
sites_in_groups: Index of the sites (0-based) that belong to each group for each assembly.
Example:
[[1], [2]]
: two groups, one with the second site, one with the third. Example:[[1,2], [3]]
: one group with the second and third site, one with the fourth.group_probabilities: Statistical probability of each group. It MUST have the same length as
sites_in_groups
. It SHOULD sum to one. See below for examples of how to specify the probability of the occurrence of a vacancy. The possible reasons for the values not to sum to one are the same as already specified above for theconcentration
of eachspecies
, see property species.
If a site is not present in any group, it means that it is present with 100 % probability (as if no assembly was specified).
A site MUST NOT appear in more than one group.
Examples (for each entry of the assemblies list):
{"sites_in_groups": [[0], [1]], "group_probabilities": [0.3, 0.7]}
: the first site and the second site never occur at the same time in the unit cell. Statistically, 30 % of the times the first site is present, while 70 % of the times the second site is present.{"sites_in_groups": [[1,2], [3]], "group_probabilities": [0.3, 0.7]}
: the second and third sites are either present together or not present; they form the first group of atoms for this assembly. The second group is formed by the fourth site. Sites of the first group (the second and the third) are never present at the same time as the fourth site. 30 % of times sites 1 and 2 are present (and site 3 is absent); 70 % of times site 3 is present (and sites 1 and 2 are absent).
Notes:
Assemblies are essential to represent, for instance, the situation where an atom can statistically occupy two different positions (sites).
By defining groups, it is possible to represent, e.g., the case where a functional molecule (and not just one atom) is either present or absent (or the case where it is present in two conformations).
Considerations on virtual alloys and on vacancies: In the special case of a virtual alloy, these specifications allow two different, equivalent ways of specifying them. For instance, for a site at the origin with 30 % probability of being occupied by Si, 50 % probability of being occupied by Ge, and 20 % of being a vacancy, the following two representations are possible:
Using a single species:
{ "cartesian_site_positions": [[0,0,0]], "species_at_sites": ["SiGe-vac"], "species": [ { "name": "SiGe-vac", "chemical_symbols": ["Si", "Ge", "vacancy"], "concentration": [0.3, 0.5, 0.2] } ] // ... }
Using multiple species and the assemblies:
{ "cartesian_site_positions": [ [0,0,0], [0,0,0], [0,0,0] ], "species_at_sites": ["Si", "Ge", "vac"], "species": [ { "name": "Si", "chemical_symbols": ["Si"], "concentration": [1.0] }, { "name": "Ge", "chemical_symbols": ["Ge"], "concentration": [1.0] }, { "name": "vac", "chemical_symbols": ["vacancy"], "concentration": [1.0] } ], "assemblies": [ { "sites_in_groups": [ [0], [1], [2] ], "group_probabilities": [0.3, 0.5, 0.2] } ] // ... }
It is up to the database provider to decide which representation to use, typically depending on the internal format in which the structure is stored. However, given a structure identified by a unique ID, the API implementation MUST always provide the same representation for it.
The probabilities of occurrence of different assemblies are uncorrelated. So, for instance in the following case with two assemblies:
{ "assemblies": [ { "sites_in_groups": [ [0], [1] ], "group_probabilities": [0.2, 0.8] }, { "sites_in_groups": [ [2], [3] ], "group_probabilities": [0.3, 0.7] } ] }
Site 0 is present with a probability of 20 % and site 1 with a probability of 80 %. These two sites are correlated (either site 0 or 1 is present). Similarly, site 2 is present with a probability of 30 % and site 3 with a probability of 70 %. These two sites are correlated (either site 2 or 3 is present). However, the presence or absence of sites 0 and 1 is not correlated with the presence or absence of sites 2 and 3 (in the specific example, the pair of sites (0, 2) can occur with 0.2*0.3 = 6 % probability; the pair (0, 3) with 0.2*0.7 = 14 % probability; the pair (1, 2) with 0.8*0.3 = 24 % probability; and the pair (1, 3) with 0.8*0.7 = 56 % probability).
- Description: A list of strings that flag which special features are used by the structure.
- Type: list of strings
- Requirements/Conventions:
- Support: MUST be supported by all implementations, MUST NOT be
null
. - Query: MUST be a queryable property. Filters on the list MUST support all mandatory HAS-type queries. Filter operators for comparisons on the string components MUST support equality, support for other comparison operators are OPTIONAL.
- MUST be an empty list if no special features are used.
- MUST be sorted alphabetically.
- If a special feature listed below is used, the list MUST contain the corresponding string.
- If a special feature listed below is not used, the list MUST NOT contain the corresponding string.
- List of strings used to indicate special structure features:
disorder
: this flag MUST be present if any one entry in the species list has achemical_symbols
list that is longer than 1 element.implicit_atoms
: this flag MUST be present if the structure contains atoms that are not assigned to sites via the property species_at_sites (e.g., because their positions are unknown). When this flag is present, the properties related to the chemical formula will likely not match the type and count of atoms represented by the species_at_sites, species, and assemblies properties.site_attachments
: this flag MUST be present if any one entry in the species list includesattached
andnattached
.assemblies
: this flag MUST be present if the property assemblies is present.
- Support: MUST be supported by all implementations, MUST NOT be
- Examples:
- A structure having implicit atoms and using assemblies:
["assemblies", "implicit_atoms"]
- A structure having implicit atoms and using assemblies:
The calculations
entries have the properties described above in section Properties Used by Multiple Entry Types.
The references
entries describe bibliographic references.
The following properties are used to provide the bibliographic details:
- address, annote, booktitle, chapter, crossref, edition, howpublished, institution, journal, key, month, note, number, organization, pages, publisher, school, series, title, volume, year: meanings of these properties match the BibTeX specification, values are strings;
- bib_type: type of the reference, corresponding to type property in the BibTeX specification, value is string;
- authors and editors: lists of person objects which are dictionaries with the following keys:
- name: Full name of the person, REQUIRED.
- firstname, lastname: Parts of the person's name, OPTIONAL.
- doi and url: values are strings.
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., any of the properties MAY be
null
. - Query: Support for queries on any of these properties is OPTIONAL. If supported, filters MAY support only a subset of comparison operators.
- Every references entry MUST contain at least one of the properties.
- Support: OPTIONAL support in implementations, i.e., any of the properties MAY be
Example:
{
"data": {
"type": "references",
"id": "Dijkstra1968",
"attributes": {
"authors": [
{
"name": "Edsger Dijkstra",
"firstname": "Edsger",
"lastname": "Dijkstra"
}
],
"year": "1968",
"title": "Go To Statement Considered Harmful",
"journal": "Communications of the ACM",
"doi": "10.1145/362929.362947"
}
}
}
The files
entries describe files.
The following properties are used to do so:
- Description: The URL to get the contents of a file.
- Type: string
- Requirements/Conventions:
- Support: MUST be supported by all implementations, MUST NOT be
null
. - Query: Support for queries on this property is OPTIONAL.
- Response: REQUIRED in the response.
- The URL MUST point to the actual contents of a file (i.e. byte stream), not an intermediate (preview) representation. For example, if referring to a file on GitHub, a link should point to raw contents.
- Support: MUST be supported by all implementations, MUST NOT be
- Examples:
"https://example.org/files/cifs/1000000.cif"
- Description: Point in time until which the URL in
url
is guaranteed to stay stable. - Type: timestamp
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
null
means that there is no stability guarantee for the URL inurl
. Indefinite support could be communicated by providing a date sufficiently far in the future, for example,9999-12-31
.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Description: Base name of a file.
- Type: string
- Requirements/Conventions:
- Support: MUST be supported by all implementations, MUST NOT be
null
. - Query: Support for queries on this property is OPTIONAL.
- File name extension is an integral part of a file name and, if available, MUST be included.
- Support: MUST be supported by all implementations, MUST NOT be
- Examples:
"1000000.cif"
- Description: Size of a file in bytes.
- Type: integer
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- If provided, it MUST be guaranteed that either exact size of a file is given or its upper bound. This way if a client reserves a static buffer or truncates the download stream after this many bytes the whole file would be received. Such provision is included to allow the providers to serve on-the-fly compressed files.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Description: Media type identifier (also known as MIME type), for a file as per RFC 6838 Media Type Specifications and Registration Procedures.
- Type: string
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Examples:
"chemical/x-cif"
- Description: Version information of a file (e.g. commit, revision, timestamp).
- Type: string
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- If provided, it MUST be guaranteed that file contents pertaining to the same combination of
id
andversion
are the same.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Description: Timestamp of the last modification of file contents. A modification is understood as an addition, change or deletion of one or more bytes, resulting in file contents different from the previous.
- Type: timestamp
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- Timestamps of subsequent file modifications SHOULD be increasing (not earlier than previous timestamps).
- Support: OPTIONAL support in implementations, i.e., MAY be
- Description: Free-form description of a file.
- Type: string
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Examples:
"POSCAR format file"
- Description: Dictionary providing checksums of file contents.
- Type: dictionary with keys identifying checksum functions and values (strings) giving the actual checksums
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- Supported dictionary keys:
md5
,sha1
,sha224
,sha256
,sha384
,sha512
. Checksums outside this list MAY be used, but their names MUST be prefixed by a database-provider-specific namespace prefix (see appendix Namespace Prefixes).
- Support: OPTIONAL support in implementations, i.e., MAY be
- Description: Time of last access of a file as per POSIX standard.
- Type: timestamp
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Description: Time of last status change of a file as per POSIX standard.
- Type: timestamp
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- Support: OPTIONAL support in implementations, i.e., MAY be
- Description: Time of last modification of a file as per POSIX standard.
- Type: timestamp
- Requirements/Conventions:
- Support: OPTIONAL support in implementations, i.e., MAY be
null
. - Query: Support for queries on this property is OPTIONAL.
- It should be noted that the values of
last_modified
,modification_timestamp
andmtime
do not necessary match.last_modified
pertains to the modification of the OPTIMADE metadata,modification_timestamp
pertains to file contents andmtime
pertains to the modification of the file (not necessary changing its contents). For example, appending an empty string to a file would result in the change ofmtime
in some operating systems, but this would not be deemed as a modification of its contents.
- Support: OPTIONAL support in implementations, i.e., MAY be
Database and definition providers can define custom entry types. The names of such entry types MUST start with corresponding namespace prefix (see appendix Namespace Prefixes). Custom entry types MUST have all properties described above in section Properties Used by Multiple Entry Types.
- Requirements/Conventions for properties in custom entry types:
- Support: Support for any properties in database-provider-specific or definition-provider-specific entry types is fully OPTIONAL.
- Query: Support for queries on these properties are OPTIONAL. If supported, only a subset of the filter features MAY be supported.
In accordance with section Relationships, all entry types MAY use relationships to describe relations to other entries.
The references relationship is used to provide bibliographic references for any of the entry types.
It relates an entry with any number of references
entries.
If the response format supports inclusion of entries of a different type in the response, then the response SHOULD include all references-type entries mentioned in the response.
For example, for the JSON response format, the top-level included
field SHOULD be used as per the JSON:API 1.1 specification:
{
"data": {
"type": "structures",
"id": "example.db:structs:1234",
"attributes": {
"formula": "Es2",
"url": "http://example.db/structs/1234",
"immutable_id": "http://example.db/structs/1234@123",
"last_modified": "2007-04-07T12:02:20Z"
},
"relationships": {
"references": {
"data": [
{ "type": "references", "id": "Dijkstra1968" },
{
"type": "references",
"id": "1234",
"meta": {
"description": "Reference for the general crystal prototype."
}
}
]
}
}
},
"included": [
{
"type": "references",
"id": "Dijkstra1968",
"attributes": {
"authors": [
{
"name": "Edsger Dijkstra",
"firstname": "Edsger",
"lastname": "Dijkstra"
}
],
"year": "1968",
"title": "Go To Statement Considered Harmful",
"journal": "Communications of the ACM",
"doi": "10.1145/362929.362947"
}
},
{
"type": "references",
"id": "1234",
"attributes": {
"doi": "10.1234/1234"
}
}
]
}
Relationships with calculations MAY be used to indicate provenance where a structure is either an input to or an output of calculations.
Note: We intend to implement in a future version of this API a standardized mechanism to differentiate these two cases, thus allowing databases a common way of exposing the full provenance tree with inputs and outputs between structures and calculations.
At the moment the database providers are suggested to extend their API the way they choose, always using their database-provider-specific prefix in non-standardized fields.
Relationships with files may be used to relate an entry with any number of files
entries.
{
"data": {
"type": "structures",
"id": "example.db:structs:1234",
"attributes": {
"chemical_formula_descriptive": "H2O"
},
"relationships": {
"files": {
"data": [
{ "type": "files", "id": "example.db:files:1234" }
]
}
}
},
"included": [
{
"type": "files",
"id": "example.db:files:1234",
"attributes": {
"media_type": "chemical/x-cif",
"url": "https://example.org/files/cifs/1234.cif"
}
}
]
}
(* BEGIN EBNF GRAMMAR Filter *)
(* The top-level 'filter' rule: *)
Filter = [Spaces], Expression ;
(* Values *)
OrderedConstant = String | Number ;
UnorderedConstant = ( TRUE | FALSE ) ;
Value = ( UnorderedConstant | OrderedValue ) ;
OrderedValue = ( OrderedConstant | Property ) ;
(* Note: support for Property in OrderedValue is OPTIONAL *)
ValueListEntry = ( Value | ValueEqRhs | ValueRelCompRhs | FuzzyStringOpRhs ) ;
(* Note: support for ValueEqRhs, ValueRelCompRhs and FuzzyStringOpRhs in ValueListEntry are OPTIONAL *)
ValueList = ValueListEntry, { Comma, ValueListEntry } ;
ValueZip = ValueListEntry, Colon, ValueListEntry, { Colon, ValueListEntry } ;
ValueZipList = ValueZip, { Comma, ValueZip } ;
(* Expressions *)
Expression = ExpressionClause, [ OR, Expression ] ;
ExpressionClause = ExpressionPhrase, [ AND, ExpressionClause ] ;
ExpressionPhrase = [ NOT ], ( Comparison | OpeningBrace, Expression, ClosingBrace ) ;
Comparison = ConstantFirstComparison
| PropertyFirstComparison ;
(* Note: support for ConstantFirstComparison is OPTIONAL *)
ConstantFirstComparison = ( OrderedConstant, ValueOpRhs
| UnorderedConstant, ValueEqRhs ) ;
PropertyFirstComparison = Property, [ ValueOpRhs
| KnownOpRhs
| FuzzyStringOpRhs
| SetOpRhs
| SetZipOpRhs
| LengthOpRhs ] ;
(* Note: support for SetZipOpRhs in Comparison is OPTIONAL *)
ValueOpRhs = ( ValueEqRhs | ValueRelCompRhs ) ;
ValueEqRhs = EqualityOperator, Value ;
ValueRelCompRhs = RelativeComparisonOperator, OrderedValue ;
KnownOpRhs = IS, ( KNOWN | UNKNOWN ) ;
FuzzyStringOpRhs = CONTAINS, Value
| STARTS, [ WITH ], Value
| ENDS, [ WITH ], Value ;
SetOpRhs = HAS, ( ( Value | EqualityOperator, Value | RelativeComparisonOperator, OrderedValue | FuzzyStringOpRhs ) | ALL, ValueList | ANY, ValueList | ONLY, ValueList ) ;
(* Note: support for the alternatives with EqualityOperator, RelativeComparisonOperator, FuzzyStringOpRhs, and ONLY in SetOpRhs are OPTIONAL *)
SetZipOpRhs = PropertyZipAddon, HAS, ( ValueZip | ONLY, ValueZipList | ALL, ValueZipList | ANY, ValueZipList ) ;
PropertyZipAddon = Colon, Property, { Colon, Property } ;
LengthOpRhs = LENGTH, [ Operator ], Value ;
(* Note: support for [ Operator ] in LengthOpRhs is OPTIONAL *)
(* Property *)
Property = Identifier, { Dot, Identifier } ;
(* TOKENS *)
(* Separators: *)
OpeningBrace = '(', [Spaces] ;
ClosingBrace = ')', [Spaces] ;
Dot = '.', [Spaces] ;
Comma = ',', [Spaces] ;
Colon = ':', [Spaces] ;
(* Boolean relations: *)
AND = 'AND', [Spaces] ;
NOT = 'NOT', [Spaces] ;
OR = 'OR', [Spaces] ;
IS = 'IS', [Spaces] ;
KNOWN = 'KNOWN', [Spaces] ;
UNKNOWN = 'UNKNOWN', [Spaces] ;
CONTAINS = 'CONTAINS', [Spaces] ;
STARTS = 'STARTS', [Spaces] ;
ENDS = 'ENDS', [Spaces] ;
WITH = 'WITH', [Spaces] ;
LENGTH = 'LENGTH', [Spaces] ;
HAS = 'HAS', [Spaces] ;
ALL = 'ALL', [Spaces] ;
ONLY = 'ONLY', [Spaces] ;
ANY = 'ANY', [Spaces] ;
(* Comparison operator tokens: *)
Operator = ( EqualityOperator | RelativeComparisonOperator ) ;
EqualityOperator = [ '!' ], '=' , [Spaces] ;
RelativeComparisonOperator = ( '<' | '>' ), [ '=' ], [Spaces] ;
(* Boolean values *)
TRUE = 'TRUE', [Spaces] ;
FALSE = 'FALSE', [Spaces] ;
(* Property syntax *)
Identifier = LowercaseLetter, { LowercaseLetter | Digit }, [Spaces] ;
Letter = UppercaseLetter | LowercaseLetter ;
UppercaseLetter = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I'
| 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R'
| 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ;
LowercaseLetter = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i'
| 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r'
| 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | '_' ;
(* Strings: *)
String = '"', { EscapedChar }, '"', [Spaces] ;
EscapedChar = UnescapedChar | '\', '"' | '\', '\' ;
UnescapedChar = Letter | Digit | Space | Punctuator | UnicodeHighChar ;
Punctuator = '!' | '#' | '$' | '%' | '&' | "'" | '(' | ')' | '*' | '+'
| ',' | '-' | '.' | '/' | ':' | ';' | '<' | '=' | '>' | '?'
| '@' | '[' | ']' | '^' | '`' | '{' | '|' | '}' | '~' ;
(* BEGIN EBNF GRAMMAR Number *)
(* Number token syntax: *)
Number = [ Sign ] ,
( Digits, [ '.', [ Digits ] ] | '.' , Digits ),
[ Exponent ], [Spaces] ;
Exponent = ( 'e' | 'E' ) , [ Sign ] , Digits ;
Sign = '+' | '-' ;
Digits = Digit, { Digit } ;
Digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
(* White-space: *)
(* Special character tokens: *)
tab = ? \t ?;
nl = ? \n ?;
cr = ? \r ?;
vt = ? \v ?;
ff = ? \f ?;
Space = ' ' | tab | nl | cr | vt | ff ;
Spaces = Space, { Space } ;
(* The 'UnicodeHighChar' specifies any Unicode character above 0x7F.
It is specified in this grammar by an extension to EBNF that allows a
regular expression to specify terminal symbol ranges. *)
UnicodeHighChar = ? [^\x00-\x7F] ? ;
(* END EBNF GRAMMAR Number *)
(* END EBNF GRAMMAR Filter *)
Note: when implementing a parser according this grammar, the implementers MAY choose to construct a lexer that ignores all whitespace (spaces, tabs, newlines, vertical tabulation and form feed characters, as described in the grammar 'Space' definition), and use such a lexer to recognize language elements that are described in the (* TOKENS *)
section of the grammar.
In that case, it can be beneficial to remove the '[Spaces]' element from the Filter = [Spaces], Expression
definition as well and use the remaining grammar rules as a parser generator input (e.g., for yacc, bison, antlr).
The string below contains Perl-Compatible Regular Expressions to recognize identifiers, number, and string values as specified in this specification.
#BEGIN PCRE identifiers [a-z_][a-z_0-9]* #END PCRE identifiers #BEGIN PCRE numbers [-+]?(?:\d+(\.\d*)?|\.\d+)(?:[eE][-+]?\d+)? #END PCRE numbers #BEGIN PCRE strings "([^\\"]|\\.)*" #END PCRE strings
The strings below contain Extended Regular Expressions (EREs) to recognize identifiers, number, and string values as specified in this specification.
#BEGIN ERE identifiers [a-z_][a-z_0-9]* #END ERE identifiers #BEGIN ERE numbers [-+]?([0-9]+(\.[0-9]*)?|\.[0-9]+)([eE][-+]?[0-9]+)? #END ERE numbers #BEGIN ERE strings "([^\"]|\\.)*" #END ERE strings
Symmetry operation strings that comprise the space_group_symmetry_operations_xyz
property MUST conform to the following regular expressions.
The regular expressions are recorded below in two forms, one in a more readable form using variables and the other as an explicit pattern compatible with the OPTIMADE Regular Expression Format.
Perl Compatible Regular Expression (PCRE) syntax, with Perl extensions used for readability and expressivity. The
symop_definitions
section defines several variables in Perl syntax that capture common parts of the regular expressions (REs) and need to be interpolated into the final REs used for matching. Thesymops
section contains the REs themselves. The whitespace characters in these definitions are not significant; if used in Perl programs, these expressions MUST be processed with the/x
RE modifier. A working example of these REs in action can be found in thetests/cases/pcre_symops_001.sh
and other test cases.#BEGIN PCRE symop_definitions $translations = '1\/2|[12]\/3|[1-3]\/4|[1-5]\/6'; $symop_translation_appended = "[-+]? [xyz] ([-+][xyz])? ([-+] ($translations) )?"; $symop_translation_prepended = "[-+]? ($translations) ([-+] [xyz] ([-+][xyz])? )?"; $symop_re = "($symop_translation_appended|$symop_translation_prepended)"; #END PCRE symop_definitions
#BEGIN PCRE symops ^ # From the beginning of the string... ($symop_re)(,$symop_re){2} $ # ... match to the very end of the string #END PCRE symops
The regular expression is also provided in an expanded form as an OPTIMADE regex:
#BEGIN ECMA symops ^([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?),([-+]?[xyz]([-+][xyz])?([-+](1/2|[12]/3|[1-3]/4|[1-5]/6))?|[-+]?(1/2|[12]/3|[1-3]/4|[1-5]/6)([-+][xyz]([-+][xyz])?)?)$ #END ECMA symops
The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data that are too large to fit in a single OPTIMADE response. The format is based on JSON Lines, which enables streaming of JSON data. Note: since the below definition references both JSON fields and OPTIMADE properties, the data type names depend on context: for JSON they are, e.g., "array" and "object" and for OPTIMADE properties they are, e.g., "list" and "dictionary".
To aid the definition of the format below, we first define a "slice object" to be a JSON object describing slices of arrays. The dictionary has the following OPTIONAL fields:
"start"
: Integer. The slice starts at the value with the given index (inclusive). The default is 0, i.e., the value at the start of the array."stop"
: Integer. The slice ends at the value with the given index (inclusive). If omitted, the end of the slice is the last index of the array."step"
: Integer. The absolute difference in index between two subsequent values that are included in the slice. The default is 1, i.e., every value in the range indicated bystart
andstop
is included in the slice. Hence, a value of 2 denotes a slice of every second value in the array.
For example, for the array ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
the slice object {"start": 1, "end": 7, "step": 3}
refers to the items ["b", "e", "h"]
.
Furthermore, we also define the following special markers:
- The end-of-data-marker is this exact JSON:
["PARTIAL-DATA-END", [""]]
. - A reference-marker is this exact JSON:
["PARTIAL-DATA-REF", ["<url>"]]
, where"<url>"
is to be replaced with a URL being referenced. A reference-marker MUST only occur in a place where the property being communicated could have an embedded list. - A next-marker is this exact JSON:
["PARTIAL-DATA-NEXT", ["<url>"]]
, where"<url>"
is to be replaced with the target URL for the next link.
There is no requirement on the syntax or format of the URLs provided in these markers.
When data is fetched from these URLs the response MUST use the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats.
The markers have been deliberately designed to be valid JSON objects but not valid OPTIMADE property values.
Since the OPTIMADE list data type is defined as a list of values of the same data type or null
, the above markers cannot be encountered inside the actual data of an OPTIMADE property.
Implementation note: the recognizable string values for the markers should make it possible to prescreen the raw text of the JSON data lines for the reference-marker string to determine which are the lines that one can exclude from further processing to resolve references (alternatively, this screening can be done by the string parser used by the JSON parser). The underlying design idea is that for lines that have reference-markers, the time it takes to process the data structure to locate the markers should be negligible compared to the time it takes to resolve and handle the large data they reference. Hence, the most relevant optimization is to avoid spending time processing data structures to find markers for lines where there are none.
The full response MUST be valid JSON Lines that adheres to the following format:
- The first line is a header object (defined below).
- The following lines are data lines adhering to the formats described below.
- The final line is either an end-of-data-marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retrieving data from the provided URL.
The first line MUST be a JSON object providing header information. The header object MUST contain the keys:
"optimade-partial-data"
: Object. An object identifying the response as being on OPTIMADE partial data format.It MUST contain the following key:
"format"
: String. Specifies the minor version of the partial data format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly1.2
. A client MUST NOT expect to be able to parse theformat
value if the field is not a string of the format MAJOR.MINOR or if the MAJOR version number is unrecognized.
"layout"
: String. A string either equal to"dense"
or"sparse"
to indicate whether the returned format uses a dense or sparse layout.
The following key is RECOMMENDED in the header object:
"returned_ranges"
: Array of Object. For dense layout, and sparse layout of one dimensional list properties, the array contains a single element which is a slice object representing the range of data present in the response. In the specific case of a hierarchy of list properties represented as a sparse multidimensional array, if the field"returned_ranges"
is given, it MUST contain one slice object per dimension of the multidimensional array, representing slices for each dimension that cover the data given in the response.
The header object MAY also contain the keys:
"property_name"
: String. The name of the property being provided."entry"
: Object. An object that MUST have the following two keys:"id"
: String. The id of the entry of the property being provided."type"
: String. The type of the entry of the property being provided.
"has_references"
: Boolean. An optional boolean to indicate whether any of the data lines in the response contains a reference marker. A value offalse
means that the client does not have to process any of the lines to detect reference markers, which may speed up the parsing."item_schema"
: Object. An object that represents a JSON Schema that validates the data lines of the response. The format SHOULD be the relevant partial extract of a valid property definition as described in Property Definitions. If a schema is provided, it MUST be a valid JSON schema using the same version of JSON schema as described in that section."links"
: Object. An object to provide relevant links for the property being provided. It MAY contain the following key:"base_url"
: String. The base URL of the implementation serving the database to which this property belongs."item_describedby"
: String. A URL to an external JSON Schema that validates the data lines of the response. The format and requirements on this schema are the same as for the inline schema fielditem_schema
.
The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the layout as "dense"
or "sparse"
.
- Dense layout: In the dense partial data layout, each data line reproduces one item from the OPTIMADE list property being transmitted in the JSON format.
If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker.
If a list is replaced by a reference marker, the client MAY use the provided URL to obtain the list items.
If the field
"returned_ranges"
is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. - Sparse layout for one-dimensional list: When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array of the format:
- The first item of the array is the zero-based index of list property item being provided by this line.
- The second item of the array is the list property item located at the indicated index, represented using the same format as each line in the dense layout. In the same way as for the dense layout, reference-markers are allowed inside the item data for embedded lists that do not fit in the response (see example below).
- Sparse layout for multidimensional lists: the server MAY use a specific sparse layout for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array).
In this case, each data line contains a JSON array of the format:
- All array items except the last one are integer zero-based indices of the list property item being provided by this line; these indices refer to the aggregated dimensions in the order of outermost to innermost.
- The last item of the array is the list property item located at the indicated coordinates, represented using the same format as each line in the dense layout. In the same way as for the dense layout, reference-markers are allowed inside the item data for embedded lists that do not fit in the response (see example below).
If the final line of the response is a next-marker, the client MAY continue fetching the data for the property by retrieving another partial data response from the provided URL.
If the final line is an end-of-data-marker, any data not covered by any of the responses are to be assigned the value null
.
If "returned_ranges"
is included in the response and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to the corresponding items, i.e., this is not an error.
Since the remaining values are not assigned a value, they will be null
if they are not assigned values by another response retrieved via a next link encountered before the final end-of-data-marker.
(Since there is no requirement that values are assigned in a specific order between responses, it is possible that the omitted values are already assigned.
In that case the values shall remain as assigned, i.e., they are not overwritten by null
in this situation.)
Below follows an example of a dense response for a partial array data of integer values. The request returns the first three items and provides the next-marker link to continue fetching data:
{"optimade-partial-data": {"format": "1.2.0"}, "layout": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]}
123
345
-12.6
["PARTIAL-DATA-NEXT", ["https://example.db.org/value4"]]
Below follows an example of a dense response for a list property as a partial array of multidimensional array values. The item with index 10 in the original list is provided explicitly in the response and is the first one provided in the response since start=10. The item with index 12 in the list, the second data item provided since start=10 and step=2, is not included only referenced. The third provided item (index 14 in the original list) is only partially returned: it is a list of three items, the first and last are explicitly provided, the second one is only referenced.
{"optimade-partial-data": {"format": "1.2.0"}, "layout": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]}
[[10,20,21], [30,40,50]]
["PARTIAL-DATA-REF", ["https://example.db.org/value2"]]
[[11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]]
["PARTIAL-DATA-NEXT", ["https://example.db.org/value4"]]
Below follows an example of the sparse layout for multidimensional lists with three aggregated dimensions. The underlying property value can be taken to be sparse data in lists in four dimensions of 10000 x 10000 x 10000 x N, where the innermost list is a non-sparse list of arbitrary length of numbers. The only non-null items in the outer three dimensions are, say, [3,5,19], [30,15,9], and [42,54,17]. The response below communicates the first item explicitly; the second one by deferring the innermost list using a reference-marker; and the third item is not included in this response, but deferred to another page via a next-marker.
{"optimade-partial-data": {"format": "1.2.0"}, "layout": "sparse"}
[3,5,19, [10,20,21,30]]
[30,15,9, ["PARTIAL-DATA-REF", ["https://example.db.org/value1"]]]
["PARTIAL-DATA-NEXT", ["https://example.db.org/"]]
An example of the sparse layout for multidimensional lists with three aggregated dimensions and integer values:
{"optimade-partial-data": {"format": "1.2.0"}, "layout": "sparse"}
[3,5,19, 10]
[30,15,9, 31]
["PARTIAL-DATA-NEXT", ["https://example.db.org/"]]
An example of the sparse layout for multidimensional lists with three aggregated dimensions and values that are multidimensional lists of integers of arbitrary lengths:
{"optimade-partial-data": {"format": "1.2.0"}, "layout": "sparse"}
[3,5,19, [ [10,20,21], [30,40,50] ] ]
[3,7,19, ["PARTIAL-DATA-REF", ["https://example.db.org/value2"]]]
[4,5,19, [ [11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]]]
["PARTIAL-DATA-END", [""]]
This appendix provides a more complete example of a Property Definition in the format defined in Property Definitions. (Note: the description strings have been wrapped for readability only.)
{
"title": "Forces and atomic masses list",
"$id": "https://properties.example.com/v1.2.0/forces_and_masses",
"x-optimade-type": "list",
"x-optimade-property": {
"version": "1.2.0",
"property-format": "1.2",
"units": [
{
"title": "Newton",
"symbol": "N",
"$id": "https://units.example.com/v1.2.0/N",
"description": "The newton SI unit of force, defined as 1 kg m/s^2
using the 2019 redefinition of the SI base units.",
"standard": {
"name": "gnu units",
"version": "3.15",
"symbol": "newton"
},
"defining-relation": {
"base-units": [
{
"symbol": "kg",
"id": "https://units.example.com/v1.2.0/kg"
},
{
"symbol": "m",
"id": "https://units.example.com/v1.2.0/m"
},
{
"symbol": "s",
"id": "https://units.example.com/v1.2.0/s"
}
],
"base-units-expression": "kg*m*s^-2"
}
},
{
"title": "Dalton mass unit",
"symbol": "u",
"$id": "https://units.example.com/v1.2.0/u",
"description": "The dalton mass unit defined as 1/12 of the mass of an
unbound neutral atom of carbon-12 in its nuclear and
electronic ground state and at rest. Approximately
equal to $1.66053906660(50)*10^{-27}$ kg",
"standard": {
"name": "gnu units",
"version": "3.15",
"symbol": "atomicmassunit"
}
}
]
},
"type": ["array", "null"],
"description": "A list of forces and atomic masses",
"examples": [
[{"force": 42.0, "mass": 28.0855}, {"force": 44.2, "mass": 15.9994}],
[{"force": 12.0, "mass": 24.3050}]
],
"x-optimade-unit": "inapplicable",
"x-optimade-requirements": {
"support": "should",
"sortable": false,
"query-support": "none"
},
"items": {
"title": "Force and atomic mass pair",
"x-optimade-type": "dictionary",
"description": "A dictionary containing a force and mass value",
"x-optimade-unit": "inapplicable",
"type": ["object"],
"properties": {
"force": {
"title": "Force",
"description": "A force value",
"x-optimade-type": "float",
"x-optimade-unit": "N",
"type": ["number"],
"examples": [42.0]
},
"mass": {
"title": "Mass",
"description": "An atomic mass",
"x-optimade-type": "float",
"x-optimade-unit": "u",
"type": ["number"],
"examples": [15.9994]
}
}
}
}
This section defines a Unicode string representation of regular expressions (regexes) to be referenced from other parts of the specification. The format will be referred to as an "OPTIMADE regex".
Regexes are commonly embedded in a context where they need to be enclosed by delimiters (e.g., double quotes or slash characters).
If this is the case, some outer-level escape rules likely apply to allow the end delimiter to appear within the regex.
Such delimiters and escape rules are not included in the definition of the OPTIMADE regex format itself and need to be clarified when this format is referenced.
The format defined in this section applies after such outer escape rules have been applied (e.g., when all occurrences of \/
have been translated into /
for a format where an unescaped slash character is the end delimiter).
Likewise, if an OPTIMADE regex is embedded in a serialized data format (e.g., JSON), this section documents the format of the Unicode string resulting from the deserialization of that format.
An OPTIMADE regex is a regular expression that adheres to ECMA-262, section 21.2.1 with additional restrictions described below which define a subset of the ECMA-262 format chosen to match features commonly available in different database backends. The regex is interpreted according to the ECMA-262 processing rules that apply for an expression where only the Unicode variable is set to true of all variables set by the RegExp internal slot described by ECMA-262, section 21.2.2.1.
The subset includes only the following tokens and features:
- Individual Unicode characters matching themselves, as defined by the JSON specification (RFC 8259).
- The
.
character to match any one Unicode character except the line break characters LINE FEED (LF) (U+000A), CARRAGE RETURN (U+000D), LINE SEPARATOR (U+2028), PARAGRAPH SEPARATOR (U+2029) (see ECMA-262 section 2.2.2.7). - A literal escape of one of the characters defined as syntax characters in the ECMA-262 standard, i.e., the escape character (
\
) followed by one of the following characters^ $ \ . * + ? ( ) [ ] { } |
to represent that literal character. No other characters can be escaped. (This rule prevents other escapes that are interpreted differently depending on regex flavor.) - Simple character classes (e.g.,
[abc]
), complemented character classes (e.g.[^abc]
), and their ranged versions (e.g.,[a-z]
,[^a-z]
) with the following constraints:- The character
-
designates ranges, unless it is the first or last character of the class in which case it represents a literal-
character. - If the first character is
^
then the expression matches all characters except the ones specified by the class as defined by the characters that follows. - The characters
\ [ ]
can only appear escaped with a preceding backslash, e.g.\\
designates that the class includes a literal\
character. The other syntax characters may appear either escaped or unescaped to designate that the class includes them. (This rule prevents other escapes inside classes that are not the same across regex flavors and expressions that, in some flavors, are interpreted as nested classes.) - Except as specified above, all characters represent themselves literally (including syntax characters).
- Characters that represent themselves literally can only appear at most once.
(This rule prevents various kinds of extended character class syntax that differs between regex formats that assigns special meaning to duplicated characters such as POSIX character classes, e.g.,
[:alpha:]
, equivalence classes, e.g.,[=a=]
, set constructs, e.g.[A--B]
,[A&&B]
, etc.).
- The character
- Simple quantifiers:
+
(one or more),*
(zero or more),?
(zero or one) that appear directly after a character, group, or character class. (This rule prevents expressions with special meaning in some regex flavors, e.g.,+?
and(?...)
.) - The beginning-of-input (
^
) and end-of-input ($
) anchors. - Simple grouping (
(...)
) and alternation (|
).
Note that lazy quantifiers (+?
, *?
, ??
) are not included, nor are range quantifiers ({x}
, {x,y}
, {x,}
).
Furthermore, there is no support for escapes designating shorthand character classes as \
and a letter or number, nor is there any way to represent a Unicode character by specifying a code point as a number, only via the Unicode character itself.
(However, the regex can be embedded in a context that defines such escapes, e.g., in serialized JSON a string containing the character \u
followed by four hexadecimal digits is deserialized into the corresponding Unicode character.)
An OPTIMADE regex matches the string at any position unless it contains a leading beginning-of-input (^
) or trailing end-of-input ($
) anchor listed above, i.e., the anchors are not implicitly assumed.
For example, the OPTIMADE regex "es" matches "expression".
Regexes that utilize tokens and features beyond the designated subset are allowed to have an undefined behavior, i.e., they MAY match or not match any string or MAY produce an error. Implementations that do not produce errors in this situation are RECOMMENDED to generate warnings if possible.
Compatibility notes:
- The subset is intended to be compatible with, but even further restricted than, the subset recommended in the JSON Schema standard, see JSON Schema: A Media Type for Describing JSON Documents 2020-12, section 6.4. The compatibility with the JSON Schema standard is expressed here as "intended" since there is some room for interpretation of the precise features included in the recommendation given in that standard.
- The definition tolerates (with undefined behavior) regexes that use tokens and features beyond the defined subset. Hence, a regex can be directly handed over to a backend implementation compatible with the subset without needing validation or translation.
- Additional consideration of how the
.
character operates in relation to line breaks may be required for multiline text. If the regex is applied to strings containing only the LINE FEED (U+000A) character and none of the other Unicode line break characters, most regex backend implementations are compatible with the defined behavior. If the regex is applied to string data containing arbitrary combinations of Unicode line break characters and the right behavior cannot be achieved via environmental settings and regex options, implementations can consider a translation step where other line break characters are translated into LINE FEED in the text operated on.- Compatibility with different regex implementations may change depending on the environment, implementation programming language versions, and options and has to be verified by implementations. However, as a general guide, we have used third-party sources, e.g., the Regular Expression Engine Comparison Chart to collect the following information for compatibility when operating on text using LINE FEED as the line break character:
- ECMAScript (also known as javascript) and version 1 and 2 of PCRE are meant to be compatible by design when used with appropriate options.
- The following regex formats appear generally compatible when operating in Unicode mode: Perl, Python, Ruby, Rust, Java, .NET, MySQL 8, MongoDB, Oracle, IBM Db2, Elasticsearch, DuckDB (which uses the re2 library).
- SQLite supports regexes via libraries and thus can use a compatible format (e.g., PCRE2).
- XML Schema appears to use a compatible regex format, except it is implicitly anchored: i.e., the beginning-of-input
^
and end-of-input$
anchors must be removed, and missing anchors replaced by.*
.- POSIX Extended regexes (and their extended GNU implementations) are incompatible because
\
is not a special character in character classes. POSIX Basic regexes also have further differences, e.g., the meaning of some escaped syntax characters is reversed.