From 5f690a3d0b89d8680a3f610e9241b040f7e95a58 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Thu, 16 May 2024 10:20:10 -0400 Subject: [PATCH] PARQUET-2470: Update website with larger ecosystem emphasis (#59) Co-authored-by: Ed Seidl --- content/en/_index.md | 5 ++++- content/en/docs/Overview/_index.md | 6 +++--- static/doap_Parquet.rdf | 2 +- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/content/en/_index.md b/content/en/_index.md index fddd0239..1644d4cc 100644 --- a/content/en/_index.md +++ b/content/en/_index.md @@ -9,7 +9,10 @@ title: Parquet Download -

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

+

+Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. +It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming language and analytics tools. +

{{< blocks/link-down color="info" >}} {{< /blocks/cover >}} diff --git a/content/en/docs/Overview/_index.md b/content/en/docs/Overview/_index.md index 58e9e1d4..a8bddb8b 100644 --- a/content/en/docs/Overview/_index.md +++ b/content/en/docs/Overview/_index.md @@ -6,11 +6,11 @@ description: > All about Parquet. --- -Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language. +Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. +It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming language and analytics tools. This documentation contains information about both the [parquet-mr](https://github.com/apache/parquet-mr) and [parquet-format](https://github.com/apache/parquet-format) repositories. - ### parquet-format The parquet-format repository hosts the official specification of the Apache Parquet file format, defining how data is structured and stored. This specification, along with Thrift metadata definitions and other crucial components, is essential for developers to effectively read and write Parquet files. The parquet-format project specifically contains the format specifications needed to understand and properly utilize Parquet files. @@ -43,4 +43,4 @@ Here is a non-exhaustive list of Parquet implementations: * [cuDF](https://github.com/rapidsai/cudf) * [Apache Impala](https://github.com/apache/impala) * [DuckDB](https://github.com/duckdb/duckdb) -* [fastparquet, a Python implementation of the Apache Parquet format](https://github.com/dask/fastparquet) \ No newline at end of file +* [fastparquet, a Python implementation of the Apache Parquet format](https://github.com/dask/fastparquet) diff --git a/static/doap_Parquet.rdf b/static/doap_Parquet.rdf index ca14f2cf..939f29c6 100644 --- a/static/doap_Parquet.rdf +++ b/static/doap_Parquet.rdf @@ -28,7 +28,7 @@ Apache Parquet is a general-purpose columnar storage format. - Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Parquet is available in multiple languages including Java, C++, and Python. + Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming language and analytics tools.