Skip to content

[BUG] Joins fail when filtering on casted columns #1150

Open
@charlesbluca

Description

@charlesbluca

What happened:
When attempting to join tables using a WHERE clause containing casted columns, we sometimes fail on the Rust side due to an unexpected join condition, or on the Python side due to improper datatype handling.

What you expected to happen:
I would expect the joins to succeed, being filtered properly by the condition using the casted column.

Minimal Complete Verifiable Example:

If we attempt a join that would require a cast operation on the lefthand column, we fail at PyJoin.getCondition():

import pandas as pd

from dask_sql import Context

c = Context()

c.create_table("df1", pd.DataFrame({"a": [0, 1], "b": ["cat", "dog"]}))
c.create_table("df2", pd.DataFrame({"c": ["0", "1"], "d": ["cat", "dog"]}))

# Inner Join: CAST(df1.a AS Utf8) = df2.c
#   TableScan: df1 projection=[a, b], full_filters=[CAST(df1.a AS Utf8) IS NOT NULL]
#   TableScan: df2 projection=[c, d], full_filters=[df2.c IS NOT NULL]

c.sql("select * from df1, df2 where a = c")
# TypeError: "unsupported join condition. Left: CAST(df1.a AS Utf8) - Right: df2.c"

Otherwise, we don't properly handle the cast operation, which can potentially cause issues on the Python side:

c.create_table("df1", pd.DataFrame({"a": ["0", "1"], "b": ["cat", "dog"]}))
c.create_table("df2", pd.DataFrame({"c": [0, 1], "d": ["cat", "dog"]}))

# Inner Join: df1.a = CAST(df2.c AS Utf8)
#   TableScan: df1 projection=[a, b], full_filters=[df1.a IS NOT NULL]
#   TableScan: df2 projection=[c, d], full_filters=[CAST(df2.c AS Utf8) IS NOT NULL]

c.sql("select * from df1, df2 where a = c")
# ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Anything else we need to know?:
It looks like this case is part of the query failures currently blocking CI (example of a failing run).

Environment:

  • dask-sql version: latest main
  • Python version: 3.10
  • Operating System: ubuntu
  • Install method (conda, pip, source): source

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpythonAffects Python APIrustPull requests that update Rust code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions