Skip to content

Latest commit

 

History

History
514 lines (388 loc) · 20.5 KB

SPEC.md

File metadata and controls

514 lines (388 loc) · 20.5 KB

Specification

Introduction

This specification describes Medea, which is intended as a schema language for describing and validating the structure of JSON documents. In particular, this specification describes the following:

  • The human-readable representation of a Medea description of a JSON document (that is, the syntax);
  • The validation behaviour that is required from any given Medea construct (that is, the semantics); and
  • Any requirements or limitations, as precisely as possible.

Conventions

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY and OPTIONAL are to be interpreted as described in RFC 2119.

Compile time refers to the step when the human-readable representation of a Medea description is being converted to a machine-usable form. Validation time refers to the step when, given a JSON document to validate and a machine-usable form of a Medea description, validation of said document against said description is requested.

To indicate a unique error condition means that the Medea validator MUST:

  • Demonstrate that it failed; and
  • Have said demonstration be programmatically distinguishable from any demonstration of another unique error condition.

A Medea validator SHOULD use language-native means to indicate unique error conditions (such as a language-specific error or exception), as opposed to use of string error messages or output to the standard error stream. Unique error conditions can be indicated both at compile time and validation time; if not specified, they SHOULD be indicated at compile time.

A JSON value is taken to be defined: that is, undefined is not considered to be a valid JSON value. A Medea validator MUST NOT indicate that an undefined value is valid against any schema.

A newline is a platform-specific, non-empty sequence of bytes indicating the end of a line.

Identifiers

A Medea identifier is a non-empty sequence of UTF-8 scalar values (as defined by definition D76 (pdf) of the Unicode 5.2 standard), not exceeding 32 bytes in length, containing no symbols from categories Zs, Zl, Zp or Cc. If limited to text using the ASCII code points only, this means a limit of 32 symbols. A Medea validator MUST indicate a unique error condition if given an identifier that contains more than this number of symbols.

Reserved identifiers

Any identifier starting with the "$" symbol is reserved. Users MUST NOT define identifiers starting with the "$" symbol, as they are used by Medea validators internally. A Medea validator MAY fail if given a user-defined identifier starting with the "$" symbol; if it does so, it MUST indicate a unique error condition.

Unless stated otherwise, a Medea identifier is non-reserved.

Strings

A Medea string is a non-empty sequence of UTF-8 scalar values (as defined by definition D76 (pdf) of the Unicode 5.2 standard), containing no symbols from categories Zs, Zl, Zp or Cc. Furthermore, the first and last symbol of a Medea string must both be QUOTATION MARK Unicode character (hex code 0x22). A Medea validator MUST indicate a unique error condition if given a sequence of UTF-8 scalar values which has a QUOTATION MARK at the first and last symbol, but contains symbols from any of Zs, Zl, Zp or Cc.

Natural numbers

A Medea natural number is a non-empty sequence of UTF-8 scalar values (as defined by definition D76 (pdf) of the Unicode 5.2 standard), containing only symbols from DIGIT ZERO to DIGIT NINE inclusive (hex codes 0x30 to 0x39), and not starting with DIGIT ZERO (hex code 0x30). A Medea validator SHOULD indicate a unique error condition if it finds a sequence of DIGIT ZERO to DIGIT NINE starting with DIGIT ZERO.

The value of a Medea natural number is the natural number which it textually represents.

Primitive type

A Medea primitive type corresponds to a set of basic types, as provided by JSON. These are defined as follows:

  • null: The null value.
  • boolean: A non-null boolean (true or false).
  • object: A non-null JSON object.
  • array: A non-null JSON array.
  • number: A non-null JSON number.
  • string: A non-null JSON string.

A Medea validator MUST provide validation of JSON values of these types, and MUST provide the following primitive type identifiers:

  • $null
  • ``$boolean
  • $object
  • $array
  • $number
  • $string

Schema graph file format

A Medea schema graph file is a human-readable representation of a Medea specification. A Medea schema graph file MUST be encoded as UTF-8. A Medea validator MUST indicate a unique error condition if asked to parse a schema graph file which is not encoded as valid UTF-8.

A Medea schema graph file SHOULD have the extension .medea.

A Medea file is made up of one or more schemata. A schema (singular of 'schemata') MUST consist of the following, in this order:

  1. The reserved identifier "$schema";
  2. A single space symbol;
  3. A Medea identifier (called the name or naming identifier);
  4. A newline; and
  5. Zero or more specifications (defined fully in the subsequent section).

A Medea validator MUST indicate a unique error condition if a schema is defined with a name that has already been used as the naming identifier of an existing schema in the same file. Additionally, a Medea validator MUST indicate a unique error condition if the order, or formation rules, described above (or subsequent in the case of type specifications or additional specifications) are violated: each possible violation is distinct from any other.

Additionally, a Medea graph file MUST contain a schema named $start. A Medea validator MUST indicate a unique error condition if no such schema is defined.

Schemata MUST be separated by a single extra newline. Thus, a schema graph file containing schemata foo and $start is formed like this:

$schema foo

$schema $start

Specifications

Any schema can include any of the following specifications at most once, in any order. Some specifications are conditional on others (noted in their descriptions). A Medea validator MUST indicate a unique error condition if a specification is provided for a schema where its conditions are not met.

Each of the subsequent entries has the following format:

  • Description: An overview of the purpose and intended semantics of this specification.
  • Preconditions: Any requirements of the schema that MUST be met for this type specifier to be valid. A Medea validator MUST indicate a unique error condition when these are not met.
  • Syntax: Describes the rules of form for this specification type. A Medea validator MUST indicate a unique error condition if any of these are violated.
  • Semantics: Describes how this specification affects the validation behaviour of its schema.
  • Postconditions: Any requirements or restrictions on the use of this type specifier of a non-syntactic nature. A Medea validator MUST indicate a unique error condition if any of these are violated.
  • Default: Describes the validation behaviour of a schema missing this specification.

Certain combinations of specifications can produce contradictory requirements: for example, a schema may have a type specification which requires a JSON object with a property "foo" with a JSON string value, but then have an object property specification saying that property "foo" should have a value which is a JSON array. A Medea validator MUST indicate a unique error condition at compile time in such situations.

List specification

Description: A list specification describes the specifics of a JSON array meant to serve as a list; that is, a homogenously-typed collection of varying length.

Preconditions: If the schema has a type specification, said type specification must contain the type specifier line $array. Additionally, the schema must not contain a tuple specification.

Syntax: A list specification MUST consist of one, or both, of the following, in any order:

  1. A length specification; and
  2. An element schema specification.

A length specification MUST consist of one, or both, of the following, in any order:

  1. A minimum length specification; and
  2. A maximum length specification.

An element schema specification MUST consist of the following, in this order:

  1. Four space symbols;
  2. The reserved identifier $element-type;
  3. A space symbol;
  4. Either a Medea identifier, or one of $null, $boolean, $object, $array, $number, $string; and
  5. A newline.

A minimum length specification MUST consist of the following, in this order:

  1. Four space symbols;
  2. The reserved identifier $min-length;
  3. A single space symbol;
  4. A Medea natural number; and
  5. A newline.

A maximum length specification MUST consist of the following, in this order:

  1. Four space symbols;
  2. The reserved identifier $max-length;
  3. A single space symbol;
  4. A Medea natural number; and
  5. A newline.

Semantics: A JSON value is considered valid by this specifier if it is a JSON array. Additionally:

  • If an element schema specification is provided, every element of the array must be valid, as defined by the following validation rules:
    • $null: The value is null.
    • $boolean: The value is a JSON boolean.
    • $object: The value is a JSON object.
    • $array: The value is a JSON array.
    • $number: The value is a JSON number.
    • $string: The value is a JSON string.
    • Any other identifier: The value is valid according to the schema named by this identifier.
  • If a minimum length specification is provided, the array must have at least as many elements as the value of the Medea natural number in said specification.
  • If a maximum length specification is provided, the array must not have more elements than the value of the Medea natural number in said specification.

Postconditions: A Medea validatory MUST indicate a unique error condition if the identifier in an element schema specification does not correspond to any schema defined in the current schema file.

If both a minimum length specification and a maximum length specification are provided, a Medea validator MUST indicate a unique error condition if the value of the Medea natural number in the minimum length specification is greater than the value of the Medea natural number in the maximum length specification.

Default: An array may have any length (no minimum or maximum), and its elements may be any JSON value.

Object property specification

Description: An object property specification describes permitted properties for an object, what schemata they must validate against, whether the property is optional or required, and whether additional properties are allowed.

Preconditions: If the schema has a type specification, said type specification must contain the type specifier line $object.

Syntax: An object property specification MUST consist of the following, in this order:

  1. Four space symbols;
  2. The reserved identifier $properties;
  3. A newline;
  4. Zero or more object property specifier sections; and
  5. An optional additional property declaration.

Each object property specifier section MUST consist of the following, in this order:

  1. A property name line; and
  2. An optional property schema line.
  3. An optional optional property declaration.

A property name line MUST consist of the following, in this order:

  1. Eight space symbols;
  2. The reserved identifier $property-name;
  3. A single space symbol;
  4. A Medea string; and
  5. A newline.

A property schema line MUST consist of the following, in this order:

  1. Eight space symbols;
  2. The reserved identifier $property-schema;
  3. A single space symbol;
  4. Either a Medea identifier, or one of $null, $boolean, $object, $array, $number, $string; and
  5. A newline.

An optional property declaration MUST consist of the following, in this order:

  1. Eight space symbols;
  2. The reserved identifier $optional-property; and
  3. A newline.

An additional property declaration MUST consist of the following, in this order:

  1. Eight space symbols;
  2. The reserved identifier $additional-properties-allowed;
  3. A newline;
  4. An optional additional property schema line.

An additional property schema line MUST consist of the following, in this order:

  1. Eight space symbols;
  2. The reserved identifier $additional-property-schema;
  3. A single space symbol;
  4. Either a Medea identifer, or one of $null, $boolean, $object, $array, $number, $string; and
  5. A newline.

Semantics: A JSON value is considered valid by this specifier if it a JSON object, and for each of its object property specifier sections, the following all hold:

  • The object has a property whose name is the same as the Medea string given a the property name line;
  • If a corresponding property schema line is provided, the value of said property is valid by the schema named by the identifier given in the property schema line.
  • If a corresponding optional property declaration is not provided, said property is defined (that is, is not undefined).

Furthermore, if the additional property declaration is absent, no property is defined for the object other than those given by some object property specifier section. If the additional property declaration is present, any value of any property other than those given by some object property specifier section must be valid by the schema named in the identifier given in its additional property schema line (if present).

A property value is always valid by no property schema line or no additional property schema line. Otherwise, these validation rules apply, based on the naming identifier:

  • $null: The property value is null.
  • $boolean: The property value is a JSON boolean.
  • $object: The property value is a JSON object.
  • $array: The property value is a JSON array.
  • $number: The property value is a JSON number.
  • $string: The property value is a JSON string.
  • Any other identifier: The property value is valid according to the schema named by this identifier.

Postconditions: A Medea validator MUST indicate a unique error condition if an identifier in a property schema line or an additional property schema line does not correspond to any schema defined in the current schema file.

If multiple object property specifier sections have a property name line naming the same schema, a Medea validator MUST indicate a unique error condition.

Default: If an object property specification is not present at all, a JSON object is considered valid regardless of its properties and their values.

If an object property specification is present, but provides no additional information (that is, no object property specifier sections and no additional property permission), a JSON object is only considered valid if it is empty (that is, it defines no properties at all).

If an object property specifier contains a property name line, but no property schema line, then, provided that the named property is defined, any value for said property is considered valid.

String value specification

Description: A string value specification describes which values a JSON string is allowed to have.

Preconditions: If the schema has a type specification, said type specification must contain the type specifier line $string.

Syntax: A string value specification MUST consist of the following, in this order:

  1. Four space symbols;
  2. The reserved identifier $string-values;
  3. A newline;
  4. One or more string value lines; and

Each string value line MUST consist of the following, in this order:

  1. Eight space symbols;
  2. A Medea string; and
  3. A newline.

Semantics: A JSON value is considered valid by this specifier if it is a JSON string. Additionally, the value must be equal to any of the Medea strings in a string value line.

Postconditions: A Medea validator MAY indicate a unique error condition at compile time if two or more string value lines for the same string value specifier are the same.

Default: The JSON string may have any value.

Tuple specification

Description: A tuple specification describes the specifics of a JSON array meant to serve as a tuple; that is, a heterogenously-typed collection of fixed length.

Preconditions: If the schema has a type specification, said type specification must contain the type specifier line $array. Additionally, the schema must not contain a list specification.

Syntax: A tuple specification MUST consist of the following, in this order:

  1. Four space symbols;
  2. The reserved identifier $tuple;
  3. A newline; and
  4. Zero or more positional schema specifications.

A positional schema specification MUST consist of the following, in this order:

  1. Eight space symbols;
  2. Either a Medea identifier, or one of $null, $boolean, $object, $array, $number, $string; and
  3. A newline.

Semantics: A JSON value is considered valid by this specifier if it is a JSON array. Additionally, let p(1), p(2), ..., p(N) be each of the positional specifications, in the order declared, where N is the total number of positional schema specifications. For each i in 1, 2, ... N, the _i - 1_th element of the array must be valid according to the following rules, based on the Medea identifier used in p(i):

  • $null: The value is null.
  • $boolean: The value is a JSON boolean.
  • $object: The value is a JSON object.
  • $array: The value is a JSON array.
  • $number: The value is a JSON number.
  • $string: The value is a JSON string.
  • Any other identifier: The value is valid according to the schema named by this identifier.

Lastly, the array must have a length of exactly N.

Postconditions: A Medea validator MUST indicate a unique error condition if an identifier in a positional schema specification does not correspond to any schema defined in the curent schema file.

Default: An array may have any length, and its elements may be any JSON value.

Type specification

Description: A type specification describes basic rules of form for JSON values.

Preconditions: None

Syntax: A type specification MUST consist of the following, in this order:

  1. Four space symbols;
  2. The reserved identifier $type;
  3. A newline; and
  4. One or more type specifier lines.

Each type specifier line MUST consist of the following, in this order:

  1. Eight space symbols;
  2. Either a Medea identifier, or one of $null, $boolean, $object, $array, $number, $string; and
  3. A newline.

Semantics: A JSON value is considered valid by this specifier if it is valid by any of the identifiers provided for all of its type specifiers. For each individual identifier, the following validation rules apply:

  • $null: The JSON value is ``null`.
  • $boolean: The JSON value is a JSON boolean.
  • $object: The JSON value is a JSON object.
  • $array: The JSON value is a JSON array.
  • $number: The JSON value is a JSON number.
  • $string: The JSON value is a JSON string.
  • Any other identifier: The JSON value is valid according to the schema named by this identifier.

Postconditions: Let S, T be schemata in a single Medea schema graph i file. We say that S types as T if:

  • S has a type specifier; and
  • The type specifier of S contains a type specifier line with the naming identifier of T.

For any schema S, the typing neighbourhood of S (denoted T(S)) is the transitive closure of the 'types as' relation for S. We say that schema S is circularly-typed if S is a member of T(S).

The type specifiers of a Medea schema graph file MUST NOT induce the circular typing of any schema within it. A Medea validator MUST indicate a unique error condition if a Medea graph file contains any schema S such that S is circularly-typed.

A Medea validator MUST indicate a unique error condition if an identifier in a type specifier line does not correspond to any schema defined in the current schema file.

Default: Any JSON value is considered valid by this specifier.

Isolated schemata

We say that a schema S in a Medea schema graph file is isolated when it is not referred to by any specification in its Medea schema graph file. A Medea validator SHOULD indicate a unique error condition if it detects any isolated schemata.