Skip to content

Make query.size_limit only affect the final results #3623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 28, 2025

Conversation

qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented May 14, 2025

Description

Make query.size_limit only affect the final results

Related Issues

Resolves
#3595,
#703,
#3607,
#3637,
#3638

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT),
this.createExprValueFactory(),
settings);
return new OpenSearchRequestBuilder(getMaxResultWindow(), createExprValueFactory(), settings);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSearchRequestBuilder.build() and buildRequestWithPit() already has maxResultWindow as paramater, do we still need in constructor? or consider remove it from build and buildRequestWithPit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another bug around #3607. Seems can be fixed in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSearchRequestBuilder.build() and buildRequestWithPit() already has maxResultWindow as paramater, do we still need in constructor? or consider remove it from build and buildRequestWithPit?

Both is required in the API, and they play as different roles.

The parameter in constructor is stored as field requestedTotalSize which decides how many rows will be returned, and it will be updated by the limit value in LIMIT(limit, offset). In initialization, we expect requestedTotalSize should be set by maxResultWindow.

The parameter in build or buildRequestWithPit is actually the real place used to restrict the max window of index.
So it has logic size = (startFrom + requestedTotalSize > maxResultWindow) ? maxResultWindow - startFrom) : requestedTotalSize.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct it, after looking into the issue #3607. Seems we should never pass in that parameter in the constructor to initialize requestedTotalSize. Without limit pushdown, we actually never know how many rows should be returned. It should be initialized with -1, which means no restriction on the total size.

Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
…imit

# Conflicts:
#	integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.json
#	integ-test/src/test/resources/expectedOutput/calcite/explain_output.json
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
…imit

# Conflicts:
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java
Signed-off-by: Heng Qian <[email protected]>
@qianheng-aws qianheng-aws requested a review from penghuo May 21, 2025 08:23
Comment on lines 14 to 17
@Getter private final Optional<Split> split;
private final Integer querySizeLimit;

public Integer getQuerySizeLimit() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using @Getter private final Optional<Integer> querySizeLimit; to keep style consistency

@@ -280,7 +280,7 @@ The Aggregation operator will merge into OpenSearch Aggregation::
{
"name": "OpenSearchIndexScan",
"description": {
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)"
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: does "size":0 mean fetching without limitation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For aggregation, size is meaningless here, so I don't set a new value to the size of sourceBuilder, then it keeps using the value when pushing down aggregation, which is 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size define how many search hits will be returned alone with aggregation. size = 0 is fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin set size=0, also solve #3528?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@penghuo no, the benchmark was already set size=0. The performance issue is not related to this. But yes for a general improving for aggregation.

Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
@qianheng-aws qianheng-aws requested a review from LantaoJin May 22, 2025 08:20
LantaoJin
LantaoJin previously approved these changes May 27, 2025
penghuo
penghuo previously approved these changes May 27, 2025
@@ -280,7 +280,7 @@ The Aggregation operator will merge into OpenSearch Aggregation::
{
"name": "OpenSearchIndexScan",
"description": {
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)"
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size define how many search hits will be returned alone with aggregation. size = 0 is fine.

@@ -280,7 +280,7 @@ The Aggregation operator will merge into OpenSearch Aggregation::
{
"name": "OpenSearchIndexScan",
"description": {
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)"
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LantaoJin set size=0, also solve #3528?

@LantaoJin
Copy link
Member

LantaoJin commented May 28, 2025

@qianheng-aws please update the doc https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/settings.rst#plugins-query-size-limit

The size configure the maximum amount of documents to be pull from OpenSearch. The default value is: 10000

Notes: This setting will impact the correctness of the aggregation operation, for example, there are 1000 docs in the index, if you change the value to 200, only 200 docs will be extract from index and do aggregation.

Sounds out of date.

Signed-off-by: Heng Qian <[email protected]>
@qianheng-aws qianheng-aws dismissed stale reviews from penghuo and LantaoJin via 48274ec May 28, 2025 02:40
@qianheng-aws
Copy link
Collaborator Author

@qianheng-aws please update the doc https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/settings.rst#plugins-query-size-limit

The size configure the maximum amount of documents to be pull from OpenSearch. The default value is: 10000
Notes: This setting will impact the correctness of the aggregation operation, for example, there are 1000 docs in the index, if you change the value to 200, only 200 docs will be extract from index and do aggregation.

Sounds out of date.

done

@qianheng-aws qianheng-aws requested a review from LantaoJin May 28, 2025 02:40
query: "source=test | where like(body, '%field 2%') | fields body"
- match: { root.children.0.description.request: "OpenSearchQueryRequest(indexName=test, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"query\":{\"wildcard\":{\"body\":{\"wildcard\":\"*field 2*\",\"case_insensitive\":true,\"boost\":1.0}}},\"_source\":{\"includes\":[\"body\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"}
# ---
# TODO: pitId is dynamic, try to match the request body more precisely by using regex match
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you open an issue for this temporary workaround?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3692, tracked here

@qianheng-aws qianheng-aws merged commit 66e230e into opensearch-project:main May 28, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants