Skip to content

Commit c92bd41

Browse files
CopilotWenyXufengjiachun
authored
docs: add compression_type option for CSV/JSON import/export (#2222)
Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: WenyXu <[email protected]> Co-authored-by: Weny Xu <[email protected]> Co-authored-by: jeremyhi <[email protected]>
1 parent 8da53c0 commit c92bd41

File tree

2 files changed

+68
-0
lines changed
  • docs/reference/sql
  • i18n/zh/docusaurus-plugin-content-docs/current/reference/sql

2 files changed

+68
-0
lines changed

docs/reference/sql/copy.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,19 @@ COPY tbl TO '/path/to/file.csv' WITH (
3131
);
3232
```
3333

34+
You can also export data to a compressed CSV or JSON file:
35+
36+
```sql
37+
COPY tbl TO '/path/to/file.csv.gz' WITH (
38+
FORMAT = 'csv',
39+
compression_type = 'gzip'
40+
);
41+
```
42+
43+
:::tip NOTE
44+
When using compression, ensure the file extension matches the compression type: `.gz` for gzip, `.zst` for zstd, `.bz2` for bzip2, and `.xz` for xz.
45+
:::
46+
3447
#### `WITH` Option
3548

3649
`WITH` adds options such as the file `FORMAT` which specifies the format of the exported file. In this example, the format is Parquet; it is a columnar storage format used for big data processing. Parquet efficiently compresses and encodes columnar data for big data analytics.
@@ -39,6 +52,7 @@ COPY tbl TO '/path/to/file.csv' WITH (
3952
|---|---|---|
4053
| `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet | **Required** |
4154
| `START_TIME`/`END_TIME`| The time range within which data should be exported. `START_TIME` is inclusive and `END_TIME` is exclusive. | Optional |
55+
| `compression_type` | Compression algorithm for the exported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional |
4256
| `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d %H:%M:%S'`). Only supported for CSV format. | Optional |
4357
| `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%Y-%m-%d'`). Only supported for CSV format. | Optional |
4458
| `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers (e.g., `'%H:%M:%S'`). Only supported for CSV format. | Optional |
@@ -85,10 +99,20 @@ Specifically, if you only have one file to import, you can use the following syn
8599
COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet');
86100
```
87101

102+
You can also import data from a compressed CSV or JSON file:
103+
104+
```sql
105+
COPY tbl FROM '/path/to/file.csv.gz' WITH (
106+
FORMAT = 'csv',
107+
compression_type = 'gzip'
108+
);
109+
```
110+
88111
| Option | Description | Required |
89112
|---|---|---|
90113
| `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet, ORC | **Required** |
91114
| `PATTERN` | Use regex to match files. e.g., `*_today.parquet` | Optional |
115+
| `compression_type` | Compression algorithm for the imported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional |
92116

93117
:::tip NOTE
94118
The CSV file must have a header row to be imported correctly. The header row should contain the column names of the table.
@@ -158,6 +182,7 @@ COPY (<QUERY>) TO '<PATH>' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' });
158182
| `QUERY` | The SQL SELECT statement to execute | **Required** |
159183
| `PATH` | The file path where the output will be written | **Required** |
160184
| `FORMAT` | The output file format: 'CSV', 'JSON', or 'PARQUET' | **Required** |
185+
| `compression_type` | Compression algorithm for the exported file. Supported values: `gzip`, `zstd`, `bzip2`, `xz`. Only supported for CSV and JSON formats. | Optional |
161186
| `TIMESTAMP_FORMAT` | Custom format for timestamp columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional |
162187
| `DATE_FORMAT` | Custom format for date columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional |
163188
| `TIME_FORMAT` | Custom format for time columns when exporting to CSV format. Uses [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format specifiers. Only supported for CSV format. | Optional |
@@ -168,6 +193,15 @@ For example, the following statement exports query results to a CSV file:
168193
COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv');
169194
```
170195

196+
You can also export query results to a compressed file:
197+
198+
```sql
199+
COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH (
200+
FORMAT = 'json',
201+
compression_type = 'gzip'
202+
);
203+
```
204+
171205
You can also specify custom date and time formats when exporting to CSV:
172206

173207
```sql

i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/copy.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,19 @@ COPY tbl TO '/path/to/file.csv' WITH (
2727
);
2828
```
2929

30+
也可以将数据导出为压缩的 CSV 或 JSON 文件:
31+
32+
```sql
33+
COPY tbl TO '/path/to/file.csv.gz' WITH (
34+
FORMAT = 'csv',
35+
compression_type = 'gzip'
36+
);
37+
```
38+
39+
:::tip NOTE
40+
使用压缩时,请确保文件扩展名与压缩类型匹配:gzip 使用 `.gz`,zstd 使用 `.zst`,bzip2 使用 `.bz2`,xz 使用 `.xz`
41+
:::
42+
3043
#### `WITH` 选项
3144

3245
`WITH` 可以添加一些选项,比如文件的 `FORMAT` 用来指定导出文件的格式。本例中的格式为 Parquet,它是一种用于大数据处理的列式存储格式。Parquet 为大数据分析高效地压缩和编码列式数据。
@@ -35,6 +48,7 @@ COPY tbl TO '/path/to/file.csv' WITH (
3548
|---|---|---|
3649
| `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet | **** |
3750
| `START_TIME`/`END_TIME`| 需要导出数据的时间范围,时间范围为左闭右开 | 可选 |
51+
| `compression_type` | 导出文件的压缩算法。支持的值:`gzip``zstd``bzip2``xz`。仅支持 CSV 和 JSON 格式。 | 可选 |
3852
| `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d %H:%M:%S'`)。仅支持 CSV 格式。 | 可选 |
3953
| `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%Y-%m-%d'`)。仅支持 CSV 格式。 | 可选 |
4054
| `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符(例如 `'%H:%M:%S'`)。仅支持 CSV 格式。 | 可选 |
@@ -78,10 +92,20 @@ COPY tbl FROM '/path/to/folder/' WITH (FORMAT = 'parquet', PATTERN = '.*parquet.
7892
COPY tbl FROM '/path/to/folder/xxx.parquet' WITH (FORMAT = 'parquet');
7993
```
8094

95+
也可以从压缩的 CSV 或 JSON 文件导入数据:
96+
97+
```sql
98+
COPY tbl FROM '/path/to/file.csv.gz' WITH (
99+
FORMAT = 'csv',
100+
compression_type = 'gzip'
101+
);
102+
```
103+
81104
| 选项 | 描述 | 是否必需 |
82105
|---|---|---|
83106
| `FORMAT` | 目标文件格式,例如 JSON, CSV, Parquet, ORC | **** |
84107
| `PATTERN` | 使用正则匹配文件,例如 `*_today.parquet` | 可选 |
108+
| `compression_type` | 导入文件的压缩算法。支持的值:`gzip``zstd``bzip2``xz`。仅支持 CSV 和 JSON 格式。 | 可选 |
85109

86110
:::tip NOTE
87111
CSV 文件必须带有 header,包含表的列名。
@@ -151,6 +175,7 @@ COPY (<QUERY>) TO '<PATH>' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' });
151175
| `QUERY` | 要执行的 SQL SELECT 语句 | **** |
152176
| `PATH` | 输出文件的路径 | **** |
153177
| `FORMAT` | 输出文件格式:'CSV'、'JSON' 或 'PARQUET' | **** |
178+
| `compression_type` | 导出文件的压缩算法。支持的值:`gzip``zstd``bzip2``xz`。仅支持 CSV 和 JSON 格式。 | 可选 |
154179
| `TIMESTAMP_FORMAT` | 导出 CSV 格式时自定义时间戳列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 |
155180
| `DATE_FORMAT` | 导出 CSV 格式时自定义日期列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 |
156181
| `TIME_FORMAT` | 导出 CSV 格式时自定义时间列的格式。使用 [strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) 格式说明符。仅支持 CSV 格式。 | 可选 |
@@ -161,6 +186,15 @@ COPY (<QUERY>) TO '<PATH>' WITH (FORMAT = { 'CSV' | 'JSON' | 'PARQUET' });
161186
COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.csv' WITH (FORMAT = 'csv');
162187
```
163188

189+
也可以将查询结果导出为压缩文件:
190+
191+
```sql
192+
COPY (SELECT * FROM tbl WHERE host = 'host1') TO '/path/to/file.json.gz' WITH (
193+
FORMAT = 'json',
194+
compression_type = 'gzip'
195+
);
196+
```
197+
164198
也可以在导出到 CSV 时指定自定义日期和时间格式:
165199

166200
```sql

0 commit comments

Comments
 (0)