Skip to content

Commit

Permalink
[T2] Wide column metadata improvemnts
Browse files Browse the repository at this point in the history
1. Make `ColumnMetaData.type` optional
2. Make `ColumnMetaData.path_in_schema` optional
3. Add `ColumnMetaData.schema_index`. This is the ordinal in `FileMetaData.schema` this column corresponds to. This allows sparse representation of columns in a rowgroup.
  • Loading branch information
alkis committed May 30, 2024
1 parent 384bedd commit f0c75b9
Showing 1 changed file with 22 additions and 5 deletions.
27 changes: 22 additions & 5 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@ enum Encoding {
// GROUP_VAR_INT = 1;

/**
* Deprecated: Dictionary encoding. The values in the dictionary are encoded in the
* DEPRECATED: Dictionary encoding. The values in the dictionary are encoded in the
* plain type.
* in a data page use RLE_DICTIONARY instead.
* in a Dictionary page use PLAIN instead
Expand Down Expand Up @@ -772,15 +772,25 @@ struct PageEncodingStats {
* Description for column metadata
*/
struct ColumnMetaData {
/** Type of this column **/
1: required Type type
/**
* DEPRECATED: can be found in SchemaElement
*
* Writers MUST NOT omit this field until 2025-10-01.
* Readers MUST ignore this field before 2025-10-01.
*/
1: optional Type type

/** Set of all encodings used for this column. The purpose is to validate
* whether we can decode those pages. **/
2: required list<Encoding> encodings

/** Path in schema **/
3: required list<string> path_in_schema
/**
* DEPRECATED: can be found in SchemaElement
*
* Writers MUST NOT omit this field until 2025-10-01.
* Readers MUST ignore this field before 2025-10-01.
*/
3: optional list<string> path_in_schema

/** Compression codec **/
4: required CompressionCodec codec
Expand Down Expand Up @@ -833,6 +843,13 @@ struct ColumnMetaData {
* filter pushdown.
*/
16: optional SizeStatistics size_statistics;

/**
* The index into FileMetadata.schema (list<SchemaElement>) for this column.
* This implies that ColumnMetaData can be sparse in a rowgroup, if for example
* a column does not have any data pages in a rowgroup.
*/
17: optional i32 schema_index;
}

struct EncryptionWithFooterKey {
Expand Down

0 comments on commit f0c75b9

Please sign in to comment.