-
Notifications
You must be signed in to change notification settings - Fork 154
Make query.size_limit only affect the final results #3623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make query.size_limit only affect the final results #3623
Conversation
Signed-off-by: Heng Qian <[email protected]>
settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT), | ||
this.createExprValueFactory(), | ||
settings); | ||
return new OpenSearchRequestBuilder(getMaxResultWindow(), createExprValueFactory(), settings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSearchRequestBuilder.build() and buildRequestWithPit() already has maxResultWindow as paramater, do we still need in constructor? or consider remove it from build and buildRequestWithPit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another bug around #3607. Seems can be fixed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSearchRequestBuilder.build() and buildRequestWithPit() already has maxResultWindow as paramater, do we still need in constructor? or consider remove it from build and buildRequestWithPit?
Both is required in the API, and they play as different roles.
The parameter in constructor is stored as field requestedTotalSize
which decides how many rows will be returned
, and it will be updated by the limit value in LIMIT(limit, offset)
. In initialization, we expect requestedTotalSize
should be set by maxResultWindow
.
The parameter in build or buildRequestWithPit is actually the real place used to restrict the max window of index.
So it has logic size = (startFrom + requestedTotalSize > maxResultWindow) ? maxResultWindow - startFrom) : requestedTotalSize
.
sql/opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Line 111 in 1329e02
if (startFrom + size > maxResultWindow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct it, after looking into the issue #3607. Seems we should never pass in that parameter in the constructor to initialize requestedTotalSize
. Without limit pushdown, we actually never know how many rows should be returned. It should be initialized with -1
, which means no restriction on the total size.
core/src/testFixtures/java/org/opensearch/sql/executor/DefaultExecutionEngine.java
Show resolved
Hide resolved
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
…imit # Conflicts: # integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.json # integ-test/src/test/resources/expectedOutput/calcite/explain_output.json
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
…imit # Conflicts: # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java
Signed-off-by: Heng Qian <[email protected]>
@Getter private final Optional<Split> split; | ||
private final Integer querySizeLimit; | ||
|
||
public Integer getQuerySizeLimit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using @Getter private final Optional<Integer> querySizeLimit;
to keep style consistency
@@ -280,7 +280,7 @@ The Aggregation operator will merge into OpenSearch Aggregation:: | |||
{ | |||
"name": "OpenSearchIndexScan", | |||
"description": { | |||
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)" | |||
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: does "size":0
mean fetching without limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For aggregation, size is meaningless here, so I don't set a new value to the size of sourceBuilder, then it keeps using the value when pushing down aggregation, which is 0.
sql/opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Line 178 in 730f066
sourceBuilder.size(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size define how many search hits will be returned alone with aggregation. size = 0 is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LantaoJin set size=0, also solve #3528?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@penghuo no, the benchmark was already set size=0. The performance issue is not related to this. But yes for a general improving for aggregation.
integ-test/src/test/java/org/opensearch/sql/ppl/StandaloneIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/sql/PaginationIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/util/StandaloneModule.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Outdated
Show resolved
Hide resolved
.../java/org/opensearch/sql/opensearch/executor/protector/OpenSearchExecutionProtectorTest.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScan.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
@@ -280,7 +280,7 @@ The Aggregation operator will merge into OpenSearch Aggregation:: | |||
{ | |||
"name": "OpenSearchIndexScan", | |||
"description": { | |||
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)" | |||
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size define how many search hits will be returned alone with aggregation. size = 0 is fine.
@@ -280,7 +280,7 @@ The Aggregation operator will merge into OpenSearch Aggregation:: | |||
{ | |||
"name": "OpenSearchIndexScan", | |||
"description": { | |||
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)" | |||
"request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"gender\":{\"terms\":{\"field\":\"gender.keyword\",\"missing_bucket\":true,\"missing_order\":\"first\",\"order\":\"asc\"}}}]},\"aggregations\":{\"avg(age)\":{\"avg\":{\"field\":\"age\"}}}}}}, searchDone=false)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LantaoJin set size=0, also solve #3528?
@qianheng-aws please update the doc https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/settings.rst#plugins-query-size-limit
Sounds out of date. |
Signed-off-by: Heng Qian <[email protected]>
done |
Signed-off-by: Heng Qian <[email protected]>
query: "source=test | where like(body, '%field 2%') | fields body" | ||
- match: { root.children.0.description.request: "OpenSearchQueryRequest(indexName=test, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"query\":{\"wildcard\":{\"body\":{\"wildcard\":\"*field 2*\",\"case_insensitive\":true,\"boost\":1.0}}},\"_source\":{\"includes\":[\"body\"],\"excludes\":[]},\"sort\":[{\"_doc\":{\"order\":\"asc\"}}]}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"} | ||
# --- | ||
# TODO: pitId is dynamic, try to match the request body more precisely by using regex match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you open an issue for this temporary workaround?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3692, tracked here
Description
Make query.size_limit only affect the final results
Related Issues
Resolves
#3595,
#703,
#3607,
#3637,
#3638
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.