An Apache Parquet implementation in Rust.
NOTE: this project has merged into Apache Arrow, and development will continue there. To file an issue or pull request, please file a JIRA in the Arrow project.
Add this to your Cargo.toml:
[dependencies]
parquet = "0.4"
and this to your crate root:
extern crate parquet;
Example usage of reading data:
use std::fs::File;
use std::path::Path;
use parquet::file::reader::{FileReader, SerializedFileReader};
let file = File::open(&Path::new("/path/to/file")).unwrap();
let reader = SerializedFileReader::new(file).unwrap();
let mut iter = reader.get_row_iter(None).unwrap();
while let Some(record) = iter.next() {
println!("{}", record);
}
See crate documentation on available API.
- Parquet-format 2.4.0
To update Parquet format to a newer version, check if parquet-format
version is available. Then simply update version of parquet-format
crate in Cargo.toml.
- All encodings supported
- All compression codecs supported
- Read support
- Primitive column value readers
- Row record reader
- Arrow record reader
- Statistics support
- Write support
- Primitive column value writers
- Row record writer
- Arrow record writer
- Predicate pushdown
- Parquet format 2.5 support
- HDFS support
- Rust nightly
See Working with nightly Rust to install nightly toolchain and set it as default.
Run cargo build
or cargo build --release
to build in release mode.
Some features take advantage of SSE4.2 instructions, which can be
enabled by adding RUSTFLAGS="-C target-feature=+sse4.2"
before the
cargo build
command.
Run cargo test
for unit tests.
The following binaries are provided (use cargo install
to install them):
-
parquet-schema for printing Parquet file schema and metadata.
Usage: parquet-schema <file-path> [verbose]
, wherefile-path
is the path to a Parquet file, and optionalverbose
is the boolean flag that allows to print full metadata or schema only (when not specified only schema will be printed). -
parquet-read for reading records from a Parquet file.
Usage: parquet-read <file-path> [num-records]
, wherefile-path
is the path to a Parquet file, andnum-records
is the number of records to read from a file (when not specified all records will be printed).
If you see Library not loaded
error, please make sure LD_LIBRARY_PATH
is set properly:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib
Run cargo bench
for benchmarks.
To build documentation, run cargo doc --no-deps
.
To compile and view in the browser, run cargo doc --no-deps --open
.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.