Skip to content

Aggs: Fix significant terms not finding background docuemnts for nested fields #128472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented May 26, 2025

Closes #101163

Fixes the significant_terms aggregation not working correctly on nested fields.

It was returning buckets with bg_count: 0, meaning it wasn't detecting any background document.
The filter it used to check for background documents was a plain TermsQuery, which wouldn't work on Nested fields.

The PR adds an extra NestedQuery wrapping the Terms in case the field is inside a Nested parent.

SignificantText was also checked, but it's explicitly documented that it doesn't work on text fields, as it needs to access the source.

TBD: Backports

@ivancea ivancea requested a review from nik9000 May 26, 2025 15:26
@ivancea ivancea added >bug :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0 labels May 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help a bit with the review, these tests have:

  • Setup: Index with a type ("normal" and "outlier"), and a value (1 and 2). That value is replicated 6 times (integer and keyword, and then as a nested and doubly nested field. Every "value" field has the same value in each document, so testing against each of them should render the same results.
  • A first "Checking" test to ensure all data ingested is correct
  • Test cases for the 3 kinds of values (normal, nested, doubly nested). Each of them do a sig_terms expecting the same results (except for the background_filter one)

@ivancea ivancea marked this pull request as ready for review May 27, 2025 12:07
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Comment on lines +225 to +226
var nestedParentField = context.nestedLookup().getNestedParent(fieldType.name());
if (nestedParentField != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best way to detect a nested field, but it worked well, and I didn't find any edge case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg, do you know the most right way to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if buildQuery should take the nested context into account? That sounds like a bigger change than I'd like to make to aggs at the moment.

@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've updated the changelog YAML for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Background Count (bg_count) Remains Zero in Nested and Filtered significant_terms Aggregation
3 participants