You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When an index has at least one search-only replica (SR) and all search replicas are unassigned, the search request fails with a 503 status and a generic SearchPhaseExecutionException. Instead, the request should be blocked with a clear and valid response explaining the reason for the failure.
Related component
No response
To Reproduce
Create a cluster with 3 nodes.
Create an index with 1 primary shard (1P), 1 replica shard (1R), and 1 search-only replica (1SR).
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:775) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:395) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:815) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:548) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:290) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:373) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:994) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Expected behavior
The search request should be blocked with a valid response that clearly explains the failure reason.
Ex: Search request failed because all search-only replicas are unassigned state. Also provide hint for user to ensure that search-only replicas are properly assigned to dedicated search nodes.
The text was updated successfully, but these errors were encountered:
@vinaykpud This is expected behavior - if all the shards capable of performing the search are unassigned it should return an all shards failed. In the case of RW split with search replicas, if there is at least one desired SR and it is unassigned it will only attempt to route to those shards.
However, We do need to cover the case where there are no SRs and someone issues a search. In this case I think we need to introduce an additional flag for strict/lenient enforcement where lenient would fall back to querying a writer. However, the point of separation is to have isolation, and we don't want primaries flooded with requests if that is not intended.
Describe the bug
When an index has at least one search-only replica (SR) and all search replicas are unassigned, the search request fails with a
503
status and a genericSearchPhaseExecutionException
. Instead, the request should be blocked with a clear and valid response explaining the reason for the failure.Related component
No response
To Reproduce
The search request fails with the following response:
And the following exception is logged:
Expected behavior
The search request should be blocked with a valid response that clearly explains the failure reason.
Ex: Search request failed because all search-only replicas are unassigned state. Also provide hint for user to ensure that search-only replicas are properly assigned to dedicated search nodes.
The text was updated successfully, but these errors were encountered: