You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import lance
import pyarrow as pa
def test_indexed_between(tmp_path):
dataset = lance.write_dataset(
pa.table({"u32": pa.array(range(100), pa.uint32()),
tmp_path,
)
dataset.create_scalar_index("u32", index_type="BTREE")
scanner = dataset.scanner(
filter=f"u32 BETWEEN 10 AND 20",
columns=[],
with_row_id=True,
prefilter=True,
)
assert "MaterializeIndex" in scanner.explain_plan()
The problem is that this ends up compiling down to the physical expression CAST(u32 as u64) >= 10_u64 AND CAST(u32 as u64) <= 20_u64 and the scalar index parser gets confused by the CAST statements.
If we just pass the filter as u32 >= 10 AND u32 <= 20 then the SQL parsers correctly infers the type of the literals to be u32.
Not sure if it's easier to fix the SQL parser or to insert some kind of optimizer rule in datafusion (we should always be able to normalize CAST(column) BINARY_OP literal into column BINARY_OP CAST(literal)) or to put a workaround in lance.
The text was updated successfully, but these errors were encountered:
To reproduce:
The problem is that this ends up compiling down to the physical expression
CAST(u32 as u64) >= 10_u64 AND CAST(u32 as u64) <= 20_u64
and the scalar index parser gets confused by theCAST
statements.If we just pass the filter as
u32 >= 10 AND u32 <= 20
then the SQL parsers correctly infers the type of the literals to beu32
.Not sure if it's easier to fix the SQL parser or to insert some kind of optimizer rule in datafusion (we should always be able to normalize
CAST(column) BINARY_OP literal
intocolumn BINARY_OP CAST(literal)
) or to put a workaround in lance.The text was updated successfully, but these errors were encountered: