Parquet wildcard writing

```INSERT INTO s3(‘s3://<my_bucket>/myfiles*.parquet’)```

That will automatically split data into multiple files, use existing min_insert_block_size_rows/bytes. Should close https://github.com/ClickHouse/ClickHouse/issues/41537

For example, other systems implement it as follows:

BigQuery

The path must contain exactly one wildcard * anywhere in the leaf directory of the path string, for example, ../aa/*, ../aa/b*c, ../aa/*bc, and ../aa/bc*. BigQuery replaces * with 0000..N depending on the number of files exported. BigQuery determines the file count and sizes. If BigQuery decides to export two files, then * in the first file's filename is replaced by 000000000000, and * in the second file's filename is replaced by 000000000001.

RedShift

to 's3://amzn-s3-demo-bucket/unload/venue_pipe_'

By default, UNLOAD writes one or more files per slice. Assuming a two-node cluster with two slices per node, the previous example creates these files in amzn-s3-demo-bucket as follows:
venue_pipe_0000_part_00
venue_pipe_0001_part_00
venue_pipe_0002_part_00
venue_pipe_0003_part_00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parquet wildcard writing #489

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parquet wildcard writing #489

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions