Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filtering on timestamp column not working when tried with duckdb #2514

Open
rkunnamp opened this issue Jun 23, 2024 · 2 comments
Open

filtering on timestamp column not working when tried with duckdb #2514

rkunnamp opened this issue Jun 23, 2024 · 2 comments
Labels
bug Something isn't working duckdb rust Rust related tasks

Comments

@rkunnamp
Copy link

Tried filtering on a timestamp column using duckdb against lance, and it did not work.
The same query worked against a parquet file.

import pyarrow as pa
import lancedb
import duckdb
from datetime import datetime, timedelta

data = pa.table({
                  "c1": ["A", "B"],
                  "c2": ["a", "b"],
                  "c3": [1, 2],
                  "c4": [datetime(2021,6,16),datetime(2021,6,17)]
               })



db = lancedb.connect("lancedb_files")
db.create_table("lance_table", data)
table = db.open_table("lance_table")
arrow_dataset = table.to_lance()

duckdb.query("select * from arrow_dataset where c4='2021-06-16 00:00:00.000000'").df()
index c1 c2 c3 c4
pa.dataset.write_dataset(data, "parquet_files", format='parquet')
duckdb.query("select * from 'parquet_files/*.parquet' where c4='2021-06-16 00:00:00.000000'").df()
index c1 c2 c3 c4
0 A a 1 2021-06-16 00:00:00
arrow_dataset.schema

c1: string
c2: string
c3: int64
c4: timestamp[us]


Not sure if this is a bug that is to be reported here, or if it should be filed with duckdb .

@chebbyChefNEQ
Copy link
Contributor

chebbyChefNEQ commented Jun 25, 2024

I can confirm I can repro when querying lance using duckdb. I have also confirmed that lance timestamp filter is working correctly

In [11]: arrow_dataset.scanner(filter="c4=timestamp '2021-06-16 00:00:00.000000'").to_table().to_pandas()
Out[11]:
  c1 c2  c3         c4
0  A  a   1 2021-06-16

In [12]: duckdb.query("select * from arrow_dataset where c4=timestamp '2021-06-16 00:00:00.000000'").df()
Out[12]:
Empty DataFrame
Columns: [c1, c2, c3, c4]
Index: []

the issue is most likely somewhere in how we translate filters from duckdb into lance filters.

@wjones127
Copy link
Contributor

Yeah seems like an issue with how the substrait translation is happening @westonpace

@chebbyChefNEQ chebbyChefNEQ added bug Something isn't working duckdb rust Rust related tasks labels Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duckdb rust Rust related tasks
Projects
None yet
Development

No branches or pull requests

3 participants