ES|QL: Improve field resolution for FORK #128501

ioanatia · 2025-05-27T10:02:12Z

fixes #128271
fixes #128272
related #121950

In #121950 we made some changes for how we do field resolution for FORK.
Previously we would ask for all fields when FORK was used in the query, which was suboptimal.

After I merged #121950, we got some failing tests in IndexResolverFieldNamesTests. That's because prior to me merging the #121950 there were some other fixes made in EsqlSession.fieldNames that were merged and I did not update my branch with the latest changes.

I am just reverting back to requesting all fields for FORK atm.

ioanatia · 2025-05-27T10:04:05Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java

@@ -1994,15 +1994,15 @@ public void testForkFieldsWithKeepAfterFork() {
                   (WHERE d > 1000 AND e == "aaa" | EVAL c = a + 200)
            | WHERE x > y
            | KEEP a, b, c, d, x
-            """, Set.of("a", "a.*", "c", "c.*", "d", "d.*", "e", "e.*", "x", "x.*", "y", "y.*"));
+            """, ALL_FIELDS);


technically here what we had before was better
what we do now is that if we detect that a FORK branch requires all fields, we don't look further and we end up requiring all fields from field caps.
I think this is fine for now - we can look into improving this later - but at least the current implementation is less error prone than before.

ioanatia · 2025-05-27T10:04:52Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java

+                   (STATS x = count(*), y=min(z))
+            | LOOKUP JOIN my_lookup_index ON xyz
+            | WHERE x > y OR _fork == "fork1"
+            """, Set.of("a", "c", "abc", "b", "def", "z", "xyz", "def.*", "xyz.*", "z.*", "abc.*", "a.*", "c.*", "b.*"));


this actually got better - because we stopped asking field caps for _fork and b (which was the result of an eval).

ioanatia · 2025-05-27T10:05:21Z

...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java

@@ -2089,7 +2065,7 @@ public void testForkWithStatsInAllBranches() {
                   (EVAL z = a * b | STATS m = max(z))
                   (STATS x = count(*), y=min(z))
            | WHERE x > y
-            """, Set.of("a", "a.*", "b", "b.*", "c", "c.*", "z", "z.*"));
+            """, Set.of("a", "a.*", "c", "c.*", "z", "z.*"));


we correctly detect that now that we don't need to ask for b because it's the result of an eval

elasticsearchmachine · 2025-05-27T11:20:08Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

carlosdelest

LGTM, though I have some questions about legibility on the plan traversal

carlosdelest · 2025-05-27T11:51:34Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

        parsed.forEachDown(p -> {// go over each plan top-down
+            if (hasFork && seenFork.get() == false && p instanceof Fork == false) {


I'm a bit confused about this forEachDown loop:

Shouldn't we check first if we have seenFork? Otherwise makes no sense to step down on the individual plans?

Why do we have to check for seenFork in the inner loop?

Maybe adding some comments would help me understand the plan traversal here

Shouldn't we check first if we have seenFork? Otherwise makes no sense to step down on the individual plans?

If there is no FORK command, this loop will execute exactly as before. That's why I am checking hasFork first, because the other conditions are relevant only in the context where FORK is being used.

Why do we have to check for seenFork in the inner loop?

This has to do with how FORK is being modelled.
A query like:

FROM test | WHERE id > 1 // common pre-filter | FORK ( WHERE content:"fox" ) ( WHERE content:"dog" ) | SORT _fork | KEEP _fork, id, content

is parsed as:

If FORK is used we want to analyze the FORK branches separately - which is why we skip the plans in this forEachDown loop until we reach FORK. When we reach FORK we call fieldNames on each child and we try to union the field names.

This forEachDown loop assumes the plan is linear (except for some LookupJoin check).
If someone is making a change to fieldNames, I wanted to make sure they don't have to necessarily worry about FORK and whether the plan is linear or if it contains plans that have more than one child.
But I also didn't want us to introduce bugs with FORK because we did not consider it when we made changes to fieldNames.
Which is why I put the handling of FORK right at the beginning and decided to call fieldNames on each child.

I will add a comment in the code with some explanation.

... and I figured out the source of confusion here - we do traverse this with forEachDown, not forEachUp 🤦‍♀️ 🤦‍♀️ 🤦‍♀️

I am just reverting to requesting all fields when FORK and not try to optimize this further which seems so brittle atm

when we are out to tech preview and we have amazing test coverage for FORK we can go back and modify this more confidently.

astefan · 2025-05-27T14:52:28Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+            if (hasFork && seenFork.get() == false && p instanceof Fork == false) {
+                return;
+            }


This condition here seems to bypass all other plan types that are not fork until a fork is found. This seems brittle. I have the feeling this is like this because of the current limitations in fork but I am not sure it's true. Nevertheless, if fork evolves in the future and has no limitations anymore, this condition here still stands?

The condition makes everything else in the query dependent on fork existence. Maybe some comments in code would explain why this logic makes sense here.

yes it is very brittle
I will just revert back to requesting all fields - we can improve this later and I'd argue it's not a must for tech preview

carlosdelest

LGTM - I think we can simplify some code

carlosdelest · 2025-05-28T07:31:26Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

@@ -637,6 +616,8 @@ static PreAnalysisResult fieldNames(LogicalPlan parsed, Set<String> enrichPolicy

        boolean[] canRemoveAliases = new boolean[] { true };

+        PreAnalysisResult initialResult = result;
+        projectAll.set(false);


Nit - I think this assignment is unnecessary

carlosdelest · 2025-05-28T07:33:58Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

@@ -711,6 +692,10 @@ static PreAnalysisResult fieldNames(LogicalPlan parsed, Set<String> enrichPolicy
            }
        });

+        if (projectAll.get()) {


This can't happen, right? I don't see projectAll being updated in the previous forEachDown() block 🤔

astefan

LGTM

Improve field resolution for FORK

Verified

This commit was signed with the committer’s verified signature.

FrankYang0529 PoAn Yang

GPG key ID: 72E522CC9FCBBAC9

Verified
Learn about vigilant mode

Loading
Loading status checks…

f8f0478

ioanatia added >non-issue :Analytics/ES|QL v9.1.0 Team:Search - Relevance labels May 27, 2025

ioanatia commented May 27, 2025

View reviewed changes

ioanatia marked this pull request as ready for review May 27, 2025 11:19

ioanatia requested review from ChrisHegarty, carlosdelest and astefan May 27, 2025 11:19

elasticsearchmachine added Team:Analytics and removed Team:Search - Relevance labels May 27, 2025

carlosdelest approved these changes May 27, 2025

View reviewed changes

astefan reviewed May 27, 2025

View reviewed changes

ioanatia and others added 2 commits May 27, 2025 18:04

Request all fields when FORK is used

Verified

This commit was signed with the committer’s verified signature.

FrankYang0529 PoAn Yang

GPG key ID: 72E522CC9FCBBAC9

Verified
Learn about vigilant mode

Loading
Loading status checks…

1474874

Merge branch 'main' into fork_field_resolution

Verified

This commit was signed with the committer’s verified signature.

FrankYang0529 PoAn Yang

GPG key ID: 72E522CC9FCBBAC9

Verified
Learn about vigilant mode

Loading
Loading status checks…

40b9c7e

ioanatia requested review from astefan and carlosdelest May 28, 2025 07:03

carlosdelest approved these changes May 28, 2025

View reviewed changes

Remove unused variable

Verified

This commit was signed with the committer’s verified signature.

FrankYang0529 PoAn Yang

GPG key ID: 72E522CC9FCBBAC9

Verified
Learn about vigilant mode

Loading
Loading status checks…

c2fa62f

astefan approved these changes May 28, 2025

View reviewed changes

Merge branch 'main' into fork_field_resolution

Verified

This commit was signed with the committer’s verified signature.

FrankYang0529 PoAn Yang

GPG key ID: 72E522CC9FCBBAC9

Verified
Learn about vigilant mode

Loading
Loading status checks…

2e83b97

ioanatia merged commit f275b71 into elastic:main May 28, 2025
18 checks passed

ioanatia mentioned this pull request May 28, 2025

Initial FORK with restrictions #121950

Open

26 tasks

astefan mentioned this pull request May 29, 2025

ES|QL: tests for FORK's evaluation of field names used in field_caps resolve calls #127208

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ES|QL: Improve field resolution for FORK #128501

ES|QL: Improve field resolution for FORK #128501

ioanatia commented May 27, 2025 •

edited

Loading

Uh oh!

ioanatia May 27, 2025 •

edited

Loading

Uh oh!

ioanatia May 27, 2025

Uh oh!

ioanatia May 27, 2025

Uh oh!

elasticsearchmachine commented May 27, 2025

Uh oh!

carlosdelest left a comment

Uh oh!

carlosdelest May 27, 2025

Uh oh!

ioanatia May 27, 2025

Uh oh!

ioanatia May 27, 2025

Uh oh!

astefan May 27, 2025

Uh oh!

ioanatia May 27, 2025

Uh oh!

carlosdelest left a comment

Uh oh!

carlosdelest May 28, 2025

Uh oh!

carlosdelest May 28, 2025

Uh oh!

astefan left a comment

Uh oh!

Uh oh!

		parsed.forEachDown(p -> {// go over each plan top-down
		if (hasFork && seenFork.get() == false && p instanceof Fork == false) {

ES|QL: Improve field resolution for FORK #128501

ES|QL: Improve field resolution for FORK #128501

Conversation

ioanatia commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ioanatia May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented May 27, 2025

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ioanatia commented May 27, 2025 •

edited

Loading

ioanatia May 27, 2025 •

edited

Loading