HTTP range requests for remote parquet files. #2107

NoxDecima · 2025-09-26T10:40:54Z

NoxDecima
Sep 26, 2025

The main branch of DuckDB supports partial reads of remote parquet files, allowing to only need to download part of the data instead of the entire file. DuckDB WASM on the other hand seems to need to download the entire file for any query.

Example query:

Running this example query for example downloads the entire 1GB file on DuckDB WASM but way less on for example the Python DuckDB instance.

SELECT * 
FROM read_parquet('https://overturemaps-us-west-2.s3.us-west-2.amazonaws.com/release/2025-09-24.0/theme=places/type=place/part-00000-e9dcc321-c94a-41a9-a475-8123482f8fab-c000.zstd.parquet')
LIMIT 10

Side note

On a side note DuckDB WASM does also not support reading split parquet files from S3:

SET s3_region='us-west-2';
SELECT * FROM read_parquet('s3://overturemaps-us-west-2/release/2025-09-24.0/theme=places/type=place/*.parquet') LIMIT 10;

This query fails with this error on DuckDB WASM while it works with normal DuckDB.

IO Error: No files found that match the pattern "s3://overturemaps-us-west-2/release/2025-09-24.0/theme=places/type=place/*.parquet"

carlopi · 2025-09-26T11:55:55Z

carlopi
Sep 26, 2025
Collaborator

Thanks, we are in the process of swapping out the http-backend, at the moment you should be able to experiment from that via:

SET builtin_httpfs = false;
LOAD httpfs;

Then running your query like:

CALL enable_logging('HTTP');
SELECT * FROM read_parquet('https://overturemaps-us-west-2.s3.us-west-2.amazonaws.com/release/2025-09-24.0/theme=places/type=place/part-00000-e9dcc321-c94a-41a9-a475-8123482f8fab-c000.zstd.parquet') LIMIT 10;
FROM duckdb_logs_parsed('HTTP') SELECT request.type, count(*) GROUP BY request.type;

should return:

┌─────────┬──────────────┐
│  type   │ count_star() │
│ varchar │    int64     │
├─────────┼──────────────┤
│ HEAD    │            1 │
│ GET     │            3 │
└─────────┴──────────────┘

(both native AND wasm)

while in the second case I do run into a weird error, I would have a look.

Note that after some more rounds of testing, SET builtin_httpfs = false; LOAD httpfs; will become available by default in the near future.

0 replies

carlopi · 2025-09-26T12:42:17Z

carlopi
Sep 26, 2025
Collaborator

Second example is a bit more tricky, it's sort of solved by #2108, but it gets a CORS error on the listing request. Do you by any chance have control over the bucket?

4 replies

NoxDecima Sep 26, 2025
Author

Unfortunately not, at a latter time I might be able to set up one for testing purposes, but do currently not have access to a S3 bucket with parquet files.

carlopi Sep 26, 2025
Collaborator

I did raised an issue to overturemaps: OvertureMaps/data#425 (comment)

carlopi Sep 26, 2025
Collaborator

Issue is solved, and both examples you raised should work correctly if prepending SET builtin_httpfs = false; LOAD httpfs;, now working in making that not necessary to be explicit.

NoxDecima Sep 29, 2025
Author

Awesome, just had time to check it and you are right, it all works! (I did have to make sure to at least have version 1.31.0 of DuckDB WASM, 1.30.0 did not work)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HTTP range requests for remote parquet files. #2107

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

HTTP range requests for remote parquet files. #2107

Uh oh!

NoxDecima Sep 26, 2025

Example query:

Side note

Replies: 2 comments · 4 replies

Uh oh!

carlopi Sep 26, 2025 Collaborator

Uh oh!

carlopi Sep 26, 2025 Collaborator

Uh oh!

NoxDecima Sep 26, 2025 Author

Uh oh!

carlopi Sep 26, 2025 Collaborator

Uh oh!

carlopi Sep 26, 2025 Collaborator

Uh oh!

NoxDecima Sep 29, 2025 Author

NoxDecima
Sep 26, 2025

Replies: 2 comments 4 replies

carlopi
Sep 26, 2025
Collaborator

carlopi
Sep 26, 2025
Collaborator

NoxDecima Sep 26, 2025
Author

carlopi Sep 26, 2025
Collaborator

carlopi Sep 26, 2025
Collaborator

NoxDecima Sep 29, 2025
Author