-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Selfeer edited this page Nov 15, 2024
·
37 revisions
Versions | Releases |
---|---|
License | Apache-2.0 |
Parquetify is a lightweight tool leveraging the parquet-java library to generate Apache Parquet files based on the file definition provided in a JSON file.
Feature | Description |
---|---|
Physical Data Types: | All physical data types: INT32 , INT64 , BOOLEAN , FLOAT , DOUBLE , BINARY , FIXED_LEN_BYTE_ARRAY . |
Logical Data Types: | Most logical types (except for FLOAT16 ): UTF8 , DECIMAL , DATE , TIME_MILLIS , TIME_MICROS , TIMESTAMP_MILLIS , TIMESTAMP_MICROS , ENUM , NONE , MAP , LIST , STRING , MAP_KEY_VALUE , TIME , INTEGER , JSON , BSON , UUID , INTERVAL , UINT_8 , UINT_16 , UINT_32 , UINT_64 , INT_8 , INT_16 , INT_32 , INT_64 . |
Precision & Scale: | Precision and scale for DECIMAL types. |
Compression: |
NONE , SNAPPY , GZIP , LZO , BROTLI , LZ4 , ZSTD . |
Encodings: | Automatically set by the writer for a given column. |
Bloom Filter: | Apply a bloom filter to specific columns or all columns (including those within groups). |
Writer Version: | Specify writer version (1.0 , 2.0 ). |
Customizable Sizes: | Row group and page sizes. |
- Specifying the Parquet File Name
- Specifying Options of the File
- Choosing File Compression
- Choosing the Writer Version
- Specifying Row and Page Size
- Enabling the Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types
Developed and maintained by the Altinity team.
- Home
- Specifying the Parquet File Name
- Specifying Options of the File
- Choosing File Compression
- Choosing the Writer Version
- Specifying Row and Page Size
- Enabling the Bloom Filter
- Configure with Hadoop
- Integer Columns
- Unsigned Integer Columns
- UTF8 Columns
- Decimal Columns
- Date Columns
- Time and Timestamp Columns
- JSON and BSON Columns
- String Columns
- Enum Columns
- UUID Columns
- Array Columns
- Nested Array Columns
- Tuple Columns
- Nested Tuple Columns
- Schema Types