Skip to content

Latest commit

 

History

History
376 lines (314 loc) · 12.4 KB

UBJSON-Specification.md

File metadata and controls

376 lines (314 loc) · 12.4 KB

Index

For the official, most up-to-date, more verbose specification (with discussion and examples) of Universal Binary JSON please visit ubjson.org. Since at the time of writing (6th August 2015) neither said website nor the community workspace git repository indicated what version the specification applies to, I made the decision to produce this minimal document to act as a reference in case the specification on ubjson.org changes. I contacted Riyad Kalla of The Buzz Media (and maintainer of ubjson.org) and he confirmed to me that (at said point in time) the current version was indeed Draft 12.

The UBJSON Specification is licensed under the Apache 2.0 License.

A single construct with two optional segments (length and data) is used for all types:

[type, 1 byte char]([integer numeric length])([data])

Each element in the tuple is defined as:

  • type - A 1 byte ASCII char (Marker) used to indicate the type of the data following it.

  • length (optional) - A positive, integer numeric type (uint8, int16, int32, int64) specifying the length of the following data payload.

  • data (optional) - A run of bytes representing the actual binary data for this type of value.

Notes

  • Some values are simple enough that just writing the 1 byte ASCII marker into the stream is enough to represent the value (e.g. null) while others have a type that is specific enough that no length is needed as the length is implied by the type (e.g. int32) while others still require both a type and a length to communicate their value (e.g. string). Additionally some values (e.g. array) have additional (optional) parameters to improve decoding efficiency and/or to reduce size of the encoded value even further.

  • The UBJSON specification requires that all numeric values be written in Big-Endian order.

  • To store binary data, use a strongly-typed array of uint8 values.

  • application/ubjson should be used as the mime type

  • .ubj should be used a the file extension when storing UBJSON-encoded data is saved to a file

Type Total size ASCII Marker(s) Length required Data (payload)
null 1 byte Z No No
no-op 1 byte N No No
true 1 byte T No No
false 1 byte F No No
int8 2 bytes i No Yes
uint8 2 bytes U No Yes
int16 3 bytes I (upper case i) No Yes
int32 5 bytes l (lower case L) No Yes
int64 9 bytes L No Yes
float32 5 bytes d No Yes
float64 9 bytes D No Yes
high-precision number 1 byte + int num val + string byte len H Yes Yes
char 2 bytes C No Yes
string 1 byte + int num val + string byte len S Yes Yes (if not empty)
array 2+ bytes [ and ] Optional Yes (if not empty)
object 2+ bytes { and } Optional Yes (if not empty)

The null value in is equivalent to the null value from the JSON specification.

Example

In JSON:

{
    "passcode": null
}

In UBJSON (using block-notation):

[{]
    [i][8][passcode][Z]
[}]

There is no equivalent to no-op value in the original JSON specification. When decoding, No-Op values should be skipped. Also, they can only occur as elements of a container.


A boolean type is is equivalent to the boolean value from the JSON specification.

Example

In JSON:

{
    "authorized": true,
    "verified": false
}

In UBJSON (using block-notation):

[{]
    [i][10][authorized][T]
    [i][8][verified][F]
[}]

Unlike in JSON whith has a single Number type (used for both integers and floating point numbers), UBJSON defines multiple types for integers. The minimum/maximum of values (inclusive) for each integer type are as follows:

Type Signed Minimum Maximum
int8 Yes -128 127
uint8 No 0 255
int16 Yes -32,768 32,767
int32 Yes -2,147,483,648 2,147,483,647
int64 Yes -9,223,372,036,854,775,808 9,223,372,036,854,775,807
float32 Yes See IEEE 754 Spec See IEEE 754 Spec
float64 Yes See IEEE 754 Spec See IEEE 754 Spec
high-precision number Yes Infinite Infinite

Notes:

  • Numeric values of infinity (and NaN) are to be encoded as a null in all cases
  • It is advisable to use the smallest applicable type when encoding a number.

Integer

All integer types are written in Big-Endian order.

Float

High-Precision

These are encoded as a string and thus are only limited by the maximum string size. Values must be written out in accordance with the original JSON number type specification. Infinity (and NaN) are to be encoded as a null value.

Examples

Numeric values in JSON:

{
    "int8": 16,
    "uint8": 255,
    "int16": 32767,
    "int32": 2147483647,
    "int64": 9223372036854775807,
    "float32": 3.14,
    "float64": 113243.7863123,
    "huge1": "3.14159265358979323846",
    "huge2": "-1.93+E190",
    "huge3": "719..."
}

In UBJSON (using block-notation):

[{]
    [i][4][int8][i][16]
    [i][5][uint8][U][255]
    [i][5][int16][I]32767]
    [i][5][int32][l][2147483647]
    [i][5][int64][L][9223372036854775807]
    [i][7][float32][d][3.14]
    [i][7][float64][D][113243.7863123]
    [i][5][huge1][H][i][22][3.14159265358979323846]
    [i][5][huge2][H][i][10][-1.93+E190]
    [i][5][huge3][H][U][200][719...]
[}]

The char type in UBJSON is an unsigned byte meant to represent a single printable ASCII character (decimal values 0-127). It must not have a decimal value larger than 127. It is functionally identical to the uint8 type, but semantically is meant to represent a character and not a numeric value.

Example

Char values in JSON:

{
    "rolecode": "a",
    "delim": ";",
}

UBJSON (using block-notation):

[{]
    [i][8][rolecode][C][a]
    [i][5][delim][C][;]
[}]

The string type in UBJSON is equivalent to the string type from the JSON specification apart from that the UBJSON string value requires UTF-8 encoding.

Example

String values in JSON:

{
    "username": "rkalla",
    "imagedata": "...huge string payload..."
}

UBJSON (using block-notation):

[{]
    [i][8][username][S][i][5][rkalla]
    [i][9][imagedata][S][l][2097152][...huge string payload...]
[}]

See also optimized format below.

The array type in UBJSON is equivalent to the array type from the JSON specification.

Example

Array in JSON:

[
    null,
    true,
    false,
    4782345193,
    153.132,
    "ham"
]

UBJSON (using block-notation):

[[]
    [Z]
    [T]
    [F]
    [l][4782345193]
    [d][153.132]
    [S][i][3][ham]
[]]

The object type in UBJSON is equivalent to the object type from the JSON specification. Since value names can only be strings, the S (string) marker must not be included since it is redundant.

Example

Object in JSON:

{
    "post": {
        "id": 1137,
        "author": "rkalla",
        "timestamp": 1364482090592,
        "body": "I totally agree!"
    }
}

UBJSON (using block-notation):

[{]
    [i][4][post][{]
        [i][2][id][I][1137]
        [i][6][author][S][i][5][rkalla]
        [i][9][timestamp][L][1364482090592]
        [i][4][body][S][i][16][I totally agree!]
    [}]
[}]

Both container types support optional parameters that can help optimize the container for better parsing performance and smaller size.

Type - $

When a type is specified, all value types stored in the container (either array or object) are considered to be of that singular type and as a result, type markers are omitted for each value within the container. This can be thought of providing the ability to create a strongly typed container in UBJSON.

  • If a type is specified, it must be done so before a count.
  • If a type is specified, a count must be specified as well. (Otherwise it is impossible to tell when a container is ending, e.g. did you just parse ] or the int8 value of 93?)

Example (string type):

[$][S]

Count - #

When a count is specified, the parser is able to know ahead of time how many child elements will be parsed. This allows the parser to pre-size any internal construct used for parsing, verify that the promised number of child values were found and avoid scanning for any terminating bytes while parsing.

  • A count can be specified without a type.

Example (count of 64):

[#][i][64]

Additional rules

  • A count must be >= 0.
  • A count can be specified by itself.
  • If a count is specified the container must not specify an end-marker.
  • A container that specifies a count must contain the specified number of child elements.
  • If a type is specified, it must be done so before count.
  • If a type is specified, a count must also be specified. A type cannot be specified by itself.
  • A container that specifies a type must not contain any additional type markers for any contained value.

Array Examples

Optimized with count

[[][#][i][5] // An array of 5 elements.
    [d][29.97]
    [d][31.13]
    [d][67.0]
    [d][2.113]
    [d][23.8889]
// No end marker since a count was specified.

Optimized with type & count

[[][$][d][#][i][5] // An array of 5 float32 elements.
    [29.97] // Value type is known, so type markers are omitted.
    [31.13]
    [67.0]
    [2.113]
    [23.8889]
// No end marker since a count was specified.

Object Examples

Optimized with count

[{][#][i][3] // An object of 3 name:value pairs.
    [i][3][lat][d][29.976]
    [i][4][long][d][31.131]
    [i][3][alt][d][67.0]
// No end marker since a count was specified.

Optimized with type & count

[{][$][d][#][i][3] // An object of 3 name:float32-value pairs.
    [i][3][lat][29.976] // Value type is known, so type markers are omitted.
    [i][4][long][31.131]
    [i][3][alt][67.0]
// No end marker since a count was specified.

Special case: Marker-only types (null, no-op & boolean)

If using both count and type optimisations, the marker itself represent the value thus saving repetition (since these types to not have a payload). Additional requirements are:

Strongly typed array of type true (boolean) and with a count of 512:

[[][$][T][#][I][512]

Strongly typed object of type null and with a count of 3:

[{][$][Z][#][i][3]
    [i][4][name] // name only, no value specified.
    [i][8][password]
    [i][5][email]