Skip to content

Latest commit

 

History

History
202 lines (147 loc) · 7.51 KB

spec_19.rst

File metadata and controls

202 lines (147 loc) · 7.51 KB
GitHub is NOT the preferred viewer for this file. Please visit
https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_19.html

19/Flux Locally Unique ID (FLUID)

This specification describes a scheme for a distributed, uncoordinated flux locally unique ID service that generates 64 bit k-ordered, unique identifiers that are a combination of timestamp since some epoch, generator id, and sequence number. The scheme is used to generate Flux job IDs.

Name github.com/flux-framework/rfc/spec_19.rst
Editor Mark Grondona <[email protected]>
State raw

Language

Background

Criteria

Very low probability of collision: Like 128 bit UUIDs on a global scale, FLUIDs should provide a reasonable guarantee against collisions on a smaller scale: within a Flux instance.

Distributed: Flux job ingest should be distributed to achieve a high job ingest rate, therefore the generation of ID’s should also be capable of being distributed to promote scalability.

Loosely ordered: Job ID’s generated by legacy resource managers are typically monotonically increasing integers that reflect job submission order. This property is of debatable utility, but following the principle of least astonishment, FLUIDs should retain it if possible.

Existing Solutions

The design of FLUIDs is patterned after Twitter Snowflake, and the derived implementation Boundary Flake. The basic scheme is to couple a timestamp, machine or generator id, and sequence number into a number of bits.

Placing the timestamp into the most significant bits of the ID allows independently generated IDs to be loosely sorted (k-ordered). Use of a separate machine ID per generator ensures uniqueness without coordination, and the sequence number ensures each generator can create a certain number of IDs per timestamp unit.

Implementation

FLUIDs are composed of [ timestamp | id | sequence ] similar to Snowflake, to allow distributed, uncoordinated ID generation across a Flux instance, with the allocation of bits customized for the unique use case in Flux:

  • 40 bits for timestamp since epoch in milliseconds, good for a 35 year long runtime with custom epoch set to job start time.
  • 14 bits for generator ID (up to 16K generators). By default, the generator ID could be set to the rank. For sessions greater than 16K ranks, some generators could be idled and forward requests up the tree to keep the max generators to 16K.
  • 10 bits for sequence number (1024 IDs per ms)

With this scheme it is theoretically possible to create a max of about 16B FLUIDs per second for 34 years.

This type of generator guarantees unique IDs, with probability of collision equal to zero, so no collision detection is required.

Representation

A FLUID is a 64-bit integer, e.g. 6731191091817518.

Representations other than decimal MAY be used where appropriate, for instance for compactness or ease of transcription over the phone.

The following sections describe the set of supported alternate representations for FLUIDs.

FLUID base58 (F58) Encoding

In order to create a compact, human readable representation of a FLUID, the main alternate encoding of a FLUID SHALL be Base58, using the alphabet

123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

To disambiguate this Base58 representation from decimal or other representations, FLUIDs with this encoding SHALL be prefixed with the Unicode character U+0192 ƒ, and the final result SHALL be termed "FLUID base58 Encoding" or F58 for short. For example, the FLUID 6731191091817518 from above SHALL be represented in F58 as ƒuZZybuNNy. As a fallback mechanism, a FLUID prefixed with an ASCII lowercase f will also be decoded as F58.

Examples: ƒZemgA8Bzf, ƒ278oEf7zGf

FLUID Hexadecimal (hex) Encoding

A hexadecimal encoding SHALL represent a FLUID in base16, including a 0x prefix to unambiguously differentiate the representation from other FLUID standard encodings.

Examples: 0x17e9fb8df16c2e, 0xedaf97d000000

FLUID Dotted-Hexadecimal (dothex) Encoding

In order to support indexing of FLUIDs in a hierarchical KVS namespace, a dotted-hexadecimal encoding SHALL represent a FLUID in base16, with each 4 hexadecimal digits separated by dots (.).

Examples: 0017.e9fb.8df1.6c2e, 000e.daf9.7d00.0000

FLUID Mnemonic (words) Encoding

In order to ease transferring of FLUIDs via human interaction, a mnemonic representation of FLUIDS SHALL be supported by a conformant implementation.

The mnemonicode implementation converts integers to strings of pronounceable words and back again. This encoding MAY be used when a FLUID must be conveyed by speaking, e.g. over the phone.

Examples: reform-remote-galileo--heart-package-academy, random-idea-yoyo--sugar-printer-academy

FLUID Emoji Encoding

In order to encode a FLUID using a minimal number of printable characters, and to increase visual appeal when displaying FLUIDs in various settings, a conformant implementation MAY supply an encoding of FLUIDs to a string of Unicode emoji characters.

The emoji used for encoding SHALL be selected to have the following properties:

  • For maximum compatibility, the selected emoji SHALL be taken from the standard Unicode 6.0 2010 emoji set here: https://unicode.org/emoji/charts/emoji-versions.html#2010,
  • The selected emoji SHALL be encoded as 4 bytes in UTF-8
  • The selected emoji SHALL start with the common bytes: F0 9F

The implementation SHALL store the emoji in the order specified by the CLDR collation rules as in the above array and as shown here:

https://unicode.org/emoji/charts-12.1/emoji-ordering.txt

The above specifications result in a selection of 576 emoji here:

.. literalinclude:: data/spec_19/basemoji.h
   :language: C

The above table of 576 emoji SHALL be used to encode a FLUID in base 576 with the position of an emoji in the array above denoting its value. That is, the first emoji (U+1F603 😃 grinning face with big eyes) SHALL represent decimal 0, the second (U+1F604 😄 grinning face with smiling eyes) SHALL represent 1, etc.

Examples: 🚹💂🙌😳💱🏃, 😄😹🎇📥🏧🙉🔞, 🚹💂🈳💰🎩🏃

Decoding Alternate FLUID Representations

The standard FLUID representations described in this RFC are unambiguous by design. That is, the type of FLUID encoding can be definitively determined by the string representation.

Implementations that take an encoded FLUID as a string argument SHALL use the following rules to decode the argument:

  • If a string contains ., then decode as "dothex"
  • Else if the string contains -, then decode as "words"
  • Else if the string starts with ƒ or f, decode as F58
  • Else if the string begins with bytes 0xf0 and 0x9f then decode as "emoji"
  • Else if the string starts with 0x decode as "hex"
  • Otherwise, decode as decimal

An implementation decoding FLUID string representations SHALL ignore leading and trailing whitespace.