apache · emkornfield · Dec 10, 2024 · Oct 29, 2024 · Nov 1, 2024 · Nov 24, 2024
diff --git a/VariantEncoding.md b/VariantEncoding.md
@@ -365,6 +365,7 @@ It is semantically identical to the "string" primitive type.
 The Decimal type contains a scale, but no precision. The implied precision of a decimal value is `floor(log_10(val)) + 1`.
 
 # Encoding types
+*Variant basic types*
 
 | Basic Type   | ID  | Description                                       |
 |--------------|-----|---------------------------------------------------|
@@ -373,25 +374,37 @@ The Decimal type contains a scale, but no precision. The implied precision of a
 | Object       | `2` | A collection of (string-key, variant-value) pairs |
 | Array        | `3` | An ordered sequence of variant values             |
 
-| Logical Type         | Physical Type               | Type ID | Equivalent Parquet Type     | Binary format                                                                                                       |
-|----------------------|-----------------------------|---------|-----------------------------|---------------------------------------------------------------------------------------------------------------------|
-| NullType             | null                        | `0`     | any                         | none                                                                                                                |
-| Boolean              | boolean (True)              | `1`     | BOOLEAN                     | none                                                                                                                |
-| Boolean              | boolean (False)             | `2`     | BOOLEAN                     | none                                                                                                                |
-| Exact Numeric        | int8                        | `3`     | INT(8, signed)              | 1 byte                                                                                                              |
-| Exact Numeric        | int16                       | `4`     | INT(16, signed)             | 2 byte little-endian                                                                                                |
-| Exact Numeric        | int32                       | `5`     | INT(32, signed)             | 4 byte little-endian                                                                                                |
-| Exact Numeric        | int64                       | `6`     | INT(64, signed)             | 8 byte little-endian                                                                                                |
-| Double               | double                      | `7`     | DOUBLE                      | IEEE little-endian                                                                                                  |
-| Exact Numeric        | decimal4                    | `8`     | DECIMAL(precision, scale)   | 1 byte scale in range [0, 38], followed by little-endian unscaled value (see decimal table)                         |
-| Exact Numeric        | decimal8                    | `9`     | DECIMAL(precision, scale)   | 1 byte scale in range [0, 38], followed by little-endian unscaled value (see decimal table)                         |
-| Exact Numeric        | decimal16                   | `10`    | DECIMAL(precision, scale)   | 1 byte scale in range [0, 38], followed by little-endian unscaled value (see decimal table)                         |
-| Date                 | date                        | `11`    | DATE                        | 4 byte little-endian                                                                                                |
-| Timestamp            | timestamp                   | `12`    | TIMESTAMP(true, MICROS)     | 8-byte little-endian                                                                                                |
-| TimestampNTZ         | timestamp without time zone | `13`    | TIMESTAMP(false, MICROS)    | 8-byte little-endian                                                                                                |
-| Float                | float                       | `14`    | FLOAT                       | IEEE little-endian                                                                                                  |
-| Binary               | binary                      | `15`    | BINARY                      | 4 byte little-endian size, followed by bytes                                                                        |
-| String               | string                      | `16`    | STRING                      | 4 byte little-endian size, followed by UTF-8 encoded bytes                                                          |
+*Variant primitive types*
+
+| Type Equivalence Class         | Physical Type               | Type ID | Equivalent Parquet Type     | Binary format                                                                               |
+|----------------------|-----------------------------|---------|-----------------------------|---------------------------------------------------------------------------------------------|
+| NullType             | null                        | `0`     | any                         | none                                                                                        |
+| Boolean              | boolean (True)              | `1`     | BOOLEAN                     | none                                                                                        |
+| Boolean              | boolean (False)             | `2`     | BOOLEAN                     | none                                                                                        |
+| Exact Numeric        | int8                        | `3`     | INT(8, signed)              | 1 byte                                                                                      |
+| Exact Numeric        | int16                       | `4`     | INT(16, signed)             | 2 byte little-endian                                                                        |
+| Exact Numeric        | int32                       | `5`     | INT(32, signed)             | 4 byte little-endian                                                                        |
+| Exact Numeric        | int64                       | `6`     | INT(64, signed)             | 8 byte little-endian                                                                        |
+| Double               | double                      | `7`     | DOUBLE                      | IEEE little-endian                                                                          |
+| Exact Numeric        | decimal4                    | `8`     | DECIMAL(precision, scale)   | 1 byte scale in range [0, 38], followed by little-endian unscaled value (see decimal table) |
+| Exact Numeric        | decimal8                    | `9`     | DECIMAL(precision, scale)   | 1 byte scale in range [0, 38], followed by little-endian unscaled value (see decimal table) |
+| Exact Numeric        | decimal16                   | `10`    | DECIMAL(precision, scale)   | 1 byte scale in range [0, 38], followed by little-endian unscaled value (see decimal table) |
+| Date                 | date                        | `11`    | DATE                        | 4 byte little-endian                                                                        |
+| Timestamp            | timestamp with time zone    | `12`    | TIMESTAMP(isAdjustedToUTC=true, MICROS)     | 8-byte little-endian                                                                        |
+| TimestampNTZ         | timestamp without time zone | `13`    | TIMESTAMP(isAdjustedToUTC=false, MICROS)    | 8-byte little-endian                                                                        |
+| Float                | float                       | `14`    | FLOAT                       | IEEE little-endian                                                                          |
+| Binary               | binary                      | `15`    | BINARY                      | 4 byte little-endian size, followed by bytes                                                |
+| String               | string                      | `16`    | STRING                      | 4 byte little-endian size, followed by UTF-8 encoded bytes                                  |
+| TimeNTZ              | time without time zone      | `21`    | TIME(isAdjustedToUTC=false, MICROS)          | 8-byte little-endian                                                                        |
+| Timestamp            | timestamp with time zone   | `22`    | TIMESTAMP(isAdjustedToUTC=true, NANOS)       | 8-byte little-endian                                                                        |
+| TimestampNTZ         | timestamp without time zone | `23`    | TIMESTAMP(isAdjustedToUTC=false, NANOS)      | 8-byte little-endian                                                                        |
+| UUID                 | uuid                        | `24`    | UUID                         | 16-byte big-endian                                                                         |
+
+The *Type Equivalence Class* column indicates logical equivalence of physically encoded types.
+For example, a user expression operating on a string value containing "hello" should behave the same, whether it is encoded with the short string optimization, or long string encoding.
+Similarly, user expressions operating on an *int8* value of 1 should behave the same as a decimal16 with scale 2 and unscaled value 100.
+
+*Decimal table*
 
 | Decimal Precision     | Decimal value type |
 |-----------------------|--------------------|
@@ -400,10 +413,6 @@ The Decimal type contains a scale, but no precision. The implied precision of a
 | 18 <= precision <= 38 | int128             |
 | > 38                  | Not supported      |
 
-The *Logical Type* column indicates logical equivalence of physically encoded types.
-For example, a user expression operating on a string value containing "hello" should behave the same, whether it is encoded with the short string optimization, or long string encoding.
-Similarly, user expressions operating on an *int8* value of 1 should behave the same as a decimal16 with scale 2 and unscaled value 100.
-
 # String values must be UTF-8 encoded
 
 All strings within the Variant binary format must be UTF-8 encoded.