[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of `replace` in ANSI mode #51297

xinrong-meng · 2025-06-26T23:58:32Z

What changes were proposed in this pull request?

Avoid CAST_INVALID_INPUT of replace in ANSI mode.

Specifically, under ANSI mode

used try_cast() to safely cast values
NaN checks, we now avoid F.isnan() on non-numeric types

An example of the spark plan difference between ANSI on/off is:

# if the original column is of StringType
# ANSI off
Column<'CASE WHEN in(C, 0, 1, 2, 3, 5, 6) THEN 4 ELSE C END'>

# ANSI on
Column<'CASE WHEN in(C, TRY_CAST(0 AS STRING), TRY_CAST(1 AS STRING), TRY_CAST(2 AS STRING), TRY_CAST(3 AS STRING), TRY_CAST(5 AS STRING), TRY_CAST(6 AS STRING)) THEN TRY_CAST(4 AS STRING) ELSE TRY_CAST(C AS STRING) END'>

Why are the changes needed?

Ensure pandas on Spark works well with ANSI mode on.
Part of https://issues.apache.org/jira/browse/SPARK-52556.

Does this PR introduce any user-facing change?

Yes, replace works in ANSI, for example

>>> ps.set_option("compute.fail_on_ansi_mode", False)
>>> ps.set_option("compute.ansi_mode_support", True)
>>> pdf = pd.DataFrame(
...             {"A": [0, 1, 2, 3, np.nan], "B": [5, 6, 7, 8, np.nan], "C": ["a", "b", "c", "d", None]},
...             index=np.random.rand(5),
...         )
>>> psdf = ps.from_pandas(pdf)
>>> psdf["C"].replace([0, 1, 2, 3, 5, 6], 4)
0.458472       a
0.749773       b
0.222904       c
0.397280       d
0.293933    None
Name: C, dtype: object
>>> psdf.replace([0, 1, 2, 3, 5, 6], [6, 5, 4, 3, 2, 1])
            A    B     C                                                        
0.458472  6.0  2.0     a
0.749773  5.0  1.0     b
0.222904  4.0  7.0     c
0.397280  3.0  8.0     d
0.293933  NaN  NaN  None

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

xinrong-meng added 2 commits June 26, 2025 16:39

fix

19b5831

test

72d803c

github-actions bot added PYTHON PANDAS API ON SPARK labels Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of `replace` in ANSI mode #51297

[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of `replace` in ANSI mode #51297

xinrong-meng commented Jun 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of replace in ANSI mode #51297

Are you sure you want to change the base?

[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of replace in ANSI mode #51297

Conversation

xinrong-meng commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of `replace` in ANSI mode #51297

[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of `replace` in ANSI mode #51297

xinrong-meng commented Jun 26, 2025 •

edited

Loading