perf: Optimize CASE for any WHEN false #17835

petern48 · 2025-09-30T02:01:58Z

Which issue does this PR close?

Rationale for this change

Extends the case simplification by simplifying the expr when any of the when statements are false.

What changes are included in this PR?

Implements the following simplifications:

CASE WHEN false THEN A END --> NULL
CASE WHEN false THEN A ELSE B END --> B
CASE WHEN X THEN A WHEN false THEN B END --> CASE WHEN X THEN A ELSE B END

Are these changes tested?

Yes

Are there any user-facing changes?

No

petern48 · 2025-09-30T02:08:16Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

-                if i == 0 {
-                    return Ok(Transformed::yes(*then_));
+                let mut remove_indices = Vec::with_capacity(when_then_expr.len());
+                let out_type = info.get_data_type(&when_then_expr[0].1)?;


Introducing this get_data_type call made some of the existing tests fail because it was trying to get the data type of a column that didn't exist in the schema. I updated the existing tests to use the actual column names e.g (col("c1"), col("c3")) or string literals (e.g lit("a")) instead of the invalid column names (e.g col("a")) hence why so many random changes in the old tests. When I ran queries in the CLI, it seemed like Datafusion was catching the invalid column names before it got to this code, so I think this should be safe.

It took me moment to convince myself that we did not need to gate on !when_then_expr.empty() to ensure when_then_expr[0] doesn't panic -- and that is because .any() needs at least one expr to evaluate to true.

TLDR this is fine, I am just recording my thought process for anyone else who is interested

petern48 · 2025-09-30T03:02:12Z

cc @alamb

Jefffrey

Makes sense to me 👍

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

Jefffrey · 2025-09-30T05:54:05Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

+                // Remove any CASE false statements
+                for i in remove_indices.iter().rev() {
+                    when_then_expr.remove(*i);
+                }


I wonder if we need to consider any weird cases like if there's a very large case statement with lots of false; if removing one by one like this could be a performance issue or if we shouldn't worry about that too much 🤔

I changed the logic to instead move all valid entries to a new new_when_then_expr vector. Do you think this is better? I figured it should at least be more predictable, since it's at most O(n) moves. Whereas the old deletion logic can end up involving multiple O(n) left shifts. I believe that would technically be O(n^2).

…oving from the original vector

alamb

Thank you @petern48 and @Jefffrey -- this looks great

alamb · 2025-10-01T13:40:50Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

-                if i == 0 {
-                    return Ok(Transformed::yes(*then_));
+                let mut remove_indices = Vec::with_capacity(when_then_expr.len());
+                let out_type = info.get_data_type(&when_then_expr[0].1)?;


It took me moment to convince myself that we did not need to gate on !when_then_expr.empty() to ensure when_then_expr[0] doesn't panic -- and that is because .any() needs at least one expr to evaluate to true.

TLDR this is fine, I am just recording my thought process for anyone else who is interested

alamb · 2025-10-01T15:41:43Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

        );

-        // CASE WHEN true THEN col("a") ELSE col("b") END --> col("a")
+        // CASE WHEN true THEN col('a') ELSE col('b') END --> col('a')


I was worried about this change, as it seems to potentially change the intent of the test -- to use literals rather than column references.

However, I see the issue is that the DataTypes need to match and after spending some time rewriting these tests to use column references rather than literals I think the literals are fine

petern48 added 4 commits September 29, 2025 18:32

Implement WHEN false logic for case statements

95e6858

Fix tests to use valid column names (e.g c3 instead of a)

b0e3e9b

Add comments and add negative test case

e370969

clean up

befdc48

github-actions bot added the optimizer Optimizer rules label Sep 30, 2025

petern48 commented Sep 30, 2025

View reviewed changes

Fix negative test to avoid (case -> or/and) simplification

01e3b6e

petern48 marked this pull request as ready for review September 30, 2025 03:01

Jefffrey approved these changes Sep 30, 2025

View reviewed changes

petern48 added 2 commits September 30, 2025 07:16

Delete let guard comment (not useful anymore)

a0c4b2e

Modify logic to move all kept elements to a new vector instead of rem…

b25c38f

…oving from the original vector

alamb approved these changes Oct 1, 2025

View reviewed changes

alamb mentioned this pull request Oct 1, 2025

Minor: reuse test schemas in simplify tests #17864

Open

alamb added this pull request to the merge queue Oct 1, 2025

Merged via the queue into apache:main with commit f1246a9 Oct 1, 2025
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Optimize CASE for any WHEN false #17835

perf: Optimize CASE for any WHEN false #17835

petern48 commented Sep 30, 2025 •

edited by alamb

Loading

Uh oh!

petern48 Sep 30, 2025 •

edited

Loading

Uh oh!

alamb Oct 1, 2025

Uh oh!

petern48 commented Sep 30, 2025

Uh oh!

Jefffrey left a comment

Uh oh!

Uh oh!

Jefffrey Sep 30, 2025

Uh oh!

petern48 Sep 30, 2025 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

alamb Oct 1, 2025

Uh oh!

alamb Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

perf: Optimize CASE for any WHEN false #17835

perf: Optimize CASE for any WHEN false #17835

Conversation

petern48 commented Sep 30, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

petern48 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

petern48 commented Sep 30, 2025

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jefffrey Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

petern48 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

petern48 commented Sep 30, 2025 •

edited by alamb

Loading

petern48 Sep 30, 2025 •

edited

Loading

petern48 Sep 30, 2025 •

edited

Loading