Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-465: Clarify backward-compatibility rules on LIST type #466

Merged
merged 19 commits into from
Dec 8, 2024
70 changes: 61 additions & 9 deletions LogicalTypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -670,6 +670,8 @@ optional group array_of_arrays (LIST) {

#### Backward-compatibility rules

##### 3-level structure with different field names

It is required that the repeated group of elements is named `list` and that
its element field is named `element`. However, these names may not be used in
existing data and should not be enforced as errors when reading. For example,
Expand All @@ -684,49 +686,99 @@ optional group my_list (LIST) {
}
```

Some existing data does not include the inner element layer. For
backward-compatibility, the type of elements in `LIST`-annotated structures
should always be determined by the following rules:
##### 2-level structure

Some existing data does not include the inner element layer, meaning that `LIST`
annotates a 2-level structure. In contrast to 3-level structure, the repetition
of 2-level structure can be `optional`, `required`, or `repeated`.

```
<list-repetition> group <name> (LIST) {
repeated <element-type> <element-name>;
}
```

For backward-compatibility, the type of elements in `LIST`-annotated 2-level
structures should always be determined by the following rules:

1. If the repeated field is not a group, then its type is the element type and
elements are required.
2. If the repeated field is a group with multiple fields, then its type is the
element type and elements are required.
3. If the repeated field is a group with one field and is named either `array`
3. If the repeated field is a group with a `repeated` field, then the repeated
field is the element type because the type cannot be a 3-level list.
4. If the repeated field is a group with one field and is named either `array`
or uses the `LIST`-annotated group's name with `_tuple` appended then the
repeated type is the element type and elements are required.
4. Otherwise, the repeated field's type is the element type with the repeated
5. Otherwise, the repeated field's type is the element type with the repeated
field's repetition.

Examples that can be interpreted using these rules:

```
// List<Integer> (nullable list, non-null elements)
// Rule 1: List<Integer> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated int32 element;
}

// List<Tuple<String, Integer>> (nullable list, non-null elements)
// Rule 2: List<Tuple<String, Integer>> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated group element {
required binary str (STRING);
required int32 num;
};
}

// List<OneTuple<String>> (nullable list, non-null elements)
// Rule 3: List<List<Integer>> (nullable outer list, non-null elements)
optional group my_list (LIST) {
repeated group array (LIST) {
repeated int32 array;
};
}

// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated group array {
required binary str (STRING);
};
}

// List<OneTuple<String>> (nullable list, non-null elements)
// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated group my_list_tuple {
required binary str (STRING);
};
}

// Rule 5: List<OneTuple<List<Integer>>> (nullable outer list, non-null elements)
optional group my_list (LIST) {
repeated group foo {
repeated int32 bar;
};
}
```

##### 1-level structure without `LIST` annotation

Some existing data does not even have the `LIST` annotation and simply uses
`repeated` repetition to annotate the element type. For backward-compatibility,
both the list and elements are `required`.

```
// List<Integer> (non-null list, non-null elements)
repeated int32 num;

// Tuple<List<Integer>, List<String>> (non-null list, non-null elements)
optional group my_list {
repeated int32 num;
repeated binary str (STRING);
}

// List<Tuple<Integer, String>> (non-null list, non-null elements)
repeated group my_list {
required int32 num;
optional binary str (STRING);
}
```

### Maps
Expand Down