-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Aggs: Fix significant terms not finding background docuemnts for nested fields #128472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Aggs: Fix significant terms not finding background docuemnts for nested fields #128472
Conversation
Hi @ivancea, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help a bit with the review, these tests have:
- Setup: Index with a type ("normal" and "outlier"), and a value (1 and 2). That value is replicated 6 times (integer and keyword, and then as a nested and doubly nested field. Every "value" field has the same value in each document, so testing against each of them should render the same results.
- A first "Checking" test to ensure all data ingested is correct
- Test cases for the 3 kinds of values (normal, nested, doubly nested). Each of them do a sig_terms expecting the same results (except for the background_filter one)
Pinging @elastic/es-analytical-engine (Team:Analytics) |
var nestedParentField = context.nestedLookup().getNestedParent(fieldType.name()); | ||
if (nestedParentField != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the best way to detect a nested field, but it worked well, and I didn't find any edge case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martijnvg, do you know the most right way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if buildQuery
should take the nested context into account? That sounds like a bigger change than I'd like to make to aggs at the moment.
Hi @ivancea, I've updated the changelog YAML for you. |
Closes #101163
Fixes the
significant_terms
aggregation not working correctly on nested fields.It was returning buckets with
bg_count: 0
, meaning it wasn't detecting any background document.The filter it used to check for background documents was a plain TermsQuery, which wouldn't work on Nested fields.
The PR adds an extra NestedQuery wrapping the Terms in case the field is inside a Nested parent.
SignificantText
was also checked, but it's explicitly documented that it doesn't work on text fields, as it needs to access the source.TBD: Backports