You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I noticed this when uploading data - that the version was being conveyed as a float (e.g. 1.0) in the parquet despite being defined as an int.
This only affects the parquet files, not the csv files. This may affect other code paths too, but I only happened to notice it with the version.
To Reproduce
Steps to reproduce the behavior:
Take a study that has a version defined like:
CREATE TABLE data_metrics__meta_version AS
SELECT 1 AS data_package_version;
(You can even add CAST(1 AS INTEGER) if you like.
3. Build and export it.
4. Inspect the parquet file - it holds a float type there.
Expected behavior
I'd expect integer type.
Additional context
This seems to be due to DuckDB's col_pyarrow_types_from_sql (called by exporter.py) converting NUMBER to float64. However, this is not wrong per se:
>>> duckdb.sql("SELECT 1 AS i")
┌───────┐
│ i │
│ int32 │
├───────┤
│ 1 │
└───────┘
>>> duckdb.sql("SELECT 1 AS i").description
[('i', 'NUMBER', None, None, None, None, None)]
>>> duckdb.sql("SELECT CAST(1 AS FLOAT) AS i").description
[('i', 'NUMBER', None, None, None, None, None)]
So... ideally we'd ask duckdb for a better schema description - this one is super vague. (Note that you can't do CAST(1 as NUMBER) -- duckdb doesn't recognize it as a type -- don't know why it's using it in this description.)
The text was updated successfully, but these errors were encountered:
Describe the bug
I noticed this when uploading data - that the version was being conveyed as a float (e.g.
1.0
) in the parquet despite being defined as an int.This only affects the parquet files, not the csv files. This may affect other code paths too, but I only happened to notice it with the version.
To Reproduce
Steps to reproduce the behavior:
(You can even add
CAST(1 AS INTEGER)
if you like.3. Build and export it.
4. Inspect the parquet file - it holds a float type there.
Expected behavior
I'd expect integer type.
Additional context
This seems to be due to DuckDB's
col_pyarrow_types_from_sql
(called byexporter.py
) converting NUMBER to float64. However, this is not wrong per se:So... ideally we'd ask duckdb for a better schema description - this one is super vague. (Note that you can't do
CAST(1 as NUMBER)
-- duckdb doesn't recognize it as a type -- don't know why it's using it in this description.)The text was updated successfully, but these errors were encountered: