-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PARQUET-2310: implementation status (#34)
Add outline of implementation status tables. Co-authored-by: Andrew Lamb <[email protected]>
- Loading branch information
Showing
1 changed file
with
124 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
--- | ||
title: "Implementation status" | ||
linkTitle: "Implementation status" | ||
weight: 8 | ||
--- | ||
|
||
This page summarizes the features supported by different Parquet | ||
implementations. | ||
|
||
*Note*: This is a work in progress and we would welcome help expanding its scope. | ||
|
||
### Legend | ||
The value in each box means: | ||
* ✅: supported | ||
* ❌: not supported | ||
* (blank) no data | ||
|
||
Implementations: | ||
* `C++`: [parquet-cpp](https://github.com/apache/arrow/tree/main/cpp/src/parquet) | ||
* `Java`: [parquet-java](https://github.com/apache/parquet-java) | ||
* `Go`: [parquet-go](https://github.com/apache/arrow/tree/main/go/parquet) | ||
* `Rust`: [parquet-rs](https://github.com/apache/arrow-rs/blob/master/parquet/README.md) | ||
|
||
|
||
|
||
### Physical types | ||
|
||
| Data type | C++ | Java | Go | Rust | | ||
| ----------------------------------------- | ----- | ------ | ----- | ----- | | ||
| BOOLEAN | | | | | | ||
| INT32 | | | | | | ||
| INT64 | | | | | | ||
| INT96 (1) | | | | | | ||
| FLOAT | | | | | | ||
| DOUBLE | | | | | | ||
| BYTE_ARRAY | | | | | | ||
| FIXED_LEN_BYTE_ARRAY | | | | | | ||
|
||
* \(1) This type is deprecated, but as of 2024 it's common in currently produced parquet files | ||
|
||
|
||
### Logical types | ||
|
||
| Data type | C++ | Java | Go | Rust | | ||
| ----------------------------------------- | ----- | ------ | ----- | ----- | | ||
| STRING | | | | | | ||
| ENUM | | | | | | ||
| UUID | | | | | | ||
| 8, 16, 32, 64 bit signed and unsigned INT | | | | | | ||
| DECIMAL (INT32) | | | | | | ||
| DECIMAL (INT64) | | | | | | ||
| DECIMAL (BYTE_ARRAY) | | | | | | ||
| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | | ||
| DATE | | | | | | ||
| TIME (INT32) | | | | | | ||
| TIME (INT64) | | | | | | ||
| TIMESTAMP (INT64) | | | | | | ||
| INTERVAL | | | | | | ||
| JSON | | | | | | ||
| BSON | | | | | | ||
| LIST | | | | | | ||
| MAP | | | | | | ||
| UNKNOWN (always null) | | | | | | ||
| FLOAT16 | | | | | | ||
|
||
### Encodings | ||
|
||
| Encoding | C++ | Java | Go | Rust | | ||
| ----------------------------------------- | ----- | ------ | ----- | ----- | | ||
| PLAIN | | | | | | ||
| PLAIN_DICTIONARY | | | | | | ||
| RLE_DICTIONARY | | | | | | ||
| RLE | | | | | | ||
| BIT_PACKED (deprecated) | | | | | | ||
| DELTA_BINARY_PACKED | | | | | | ||
| DELTA_LENGTH_BYTE_ARRAY | | | | | | ||
| DELTA_BYTE_ARRAY | | | | | | ||
| BYTE_STREAM_SPLIT | | | | | | ||
|
||
### Compressions | ||
|
||
| Compression | C++ | Java | Go | Rust | | ||
| ----------------------------------------- | ----- | ------ | ----- | ----- | | ||
| UNCOMPRESSED | | | | | | ||
| BROTLI | | | | | | ||
| GZIP | | | | | | ||
| LZ4 (deprecated) | | | | | | ||
| LZ4_RAW | | | | | | ||
| LZO | | | | | | ||
| SNAPPY | | | | | | ||
| ZSTD | | | | | | ||
|
||
### Other format level features | ||
|
||
| | C++ | Java | Go | Rust | | ||
| ----------------------------------------- | ----- | ------ | ----- | ----- | | ||
| xxxHash-based bloom filters | | | | | | ||
| Bloom filter length (1) | | | | | | ||
| Statistics min_value, max_value | | | | | | ||
| Page index | | | | | | ||
| Page CRC32 checksum | | | | | | ||
| Modular encryption | | | | | | ||
| Size statistics (2) | | | | | | ||
|
||
|
||
* \(1) In parquet.thrift: ColumnMetaData->bloom_filter_length | ||
|
||
* \(2) In parquet.thrift: ColumnMetaData->size_statistics | ||
|
||
### High level data APIs for Parquet feature usage | ||
|
||
| Format | C++ | Java | Go | Rust | | ||
| -------------------------------------------- | ----- | ------ | ----- | ----- | | ||
| External column data (1) | | | | | | ||
| Row group "Sorting column" metadata (2) | | | | | | ||
| Row group pruning using statistics | | | | | | ||
| Reading select columns only | | | | | | ||
| Page pruning using statistics | | | | | | ||
| Page pruning using bloom filter | | | | | | ||
|
||
|
||
* \(1) In parquet.thrift: ColumnChunk->file_path | ||
|
||
* \(2) In parquet.thrift: RowGroup->sorting_columns |