-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
Hey!
I created a short example on how to read data from Hetzner Storage boxes here:
# Install dependencies
brew install curl pwgen hcloud jq
# Get api keys from https://console.hetzner.com/projects/YOUR-PROJECT-ID/security/tokens
hcloud context create my-hetzner-context --token YOUR_HETZNER_API_TOKEN
# Generate new password with '_' in the middle to suit for the special character requirement
export HETZNER_STORAGEBOX_PASSWORD=$(pwgen -1 -n 16)_$(pwgen -1 -n 16)
# Create the 1 terabyte 3.2€/month (+ VAT) storage box in Helsinki
hcloud storage-box create --enable-webdav --reachable-externally --name duckdb-test-box --location hel1 --type bx11 --password $HETZNER_STORAGEBOX_PASSWORD
# Get the username for the storage box
export HETZNER_STORAGEBOX_USER=$(hcloud storage-box describe duckdb-test-box -o json | jq -r .username)
# Get sample data
curl -o data.parquet https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet
# Upload the sample data
curl -T data.parquet --user "$HETZNER_STORAGEBOX_USER:$HETZNER_STORAGEBOX_PASSWORD" https://$HETZNER_STORAGEBOX_USER.your-storagebox.de/data.parquet
# Nice thing in webdav is that it supports http range requests, so we can read parts of the file without downloading the whole file.
# Query the data with duckdb
duckdb -c "
SET VARIABLE basic_auth = (
SELECT base64((getenv('HETZNER_STORAGEBOX_USER') || ':' || getenv('HETZNER_STORAGEBOX_PASSWORD'))::BLOB)
);
SET VARIABLE storagebox_parquet_url = format(
'https://{}.your-storagebox.de/data.parquet',
getenv('HETZNER_STORAGEBOX_USER')
);
SELECT getvariable('basic_auth') AS basic_auth;
CREATE SECRET IF NOT EXISTS http_auth (
TYPE HTTP,
EXTRA_HTTP_HEADERS MAP {
'Authorization': 'Basic ' || getvariable('basic_auth')
}
);
SELECT COUNT(*)
FROM read_parquet(getvariable('storagebox_parquet_url'));
"
# Remove the storage box after testing
hcloud storage-box delete duckdb-test-boxIt works but it's pretty convoluted and uploading files doesn't work.
Since httpfs already has curl in the dependencies would it be possible to add support on reading and writing from webdav based remote sources like Hetzner Storage box?
This is one of the cheapest remote storages available (2.18€+VAT when using 5TB or more and with unlimited but slow bandwidth) and thus it would lower the barriers on many data storage use cases.
Metadata
Metadata
Assignees
Labels
No labels