Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flaky-test] SSB Integration Test Failed Due to Failure to Download Segment #14655

Open
ankitsultana opened this issue Dec 13, 2024 · 3 comments

Comments

@ankitsultana
Copy link
Contributor

Error:

Error:    SSBQueryIntegrationTest.testSSBQueries:103->testQueriesValidateAgainstH2:109 » PinotClient Query had processing exceptions: 
[{"errorCode":235,"message":"ServerSegmentMissing:\nFound 1 unavailable segments for table dates: [dates_0 %]"}]

Cause:

00:22:08.522 WARN [BrokerRoutingManager] [HelixTaskExecutor-message_handle_thread_40] Cannot enable SegmentPartitionMetadataManager. Expecting SegmentPartitionConfig with exact 1 partition column
00:22:09.811 WARN [BrokerRoutingManager] [HelixTaskExecutor-message_handle_thread_42] Cannot enable SegmentPartitionMetadataManager. Expecting SegmentPartitionConfig with exact 1 partition column
00:22:10.251 WARN [BrokerRoutingManager] [HelixTaskExecutor-message_handle_thread_44] Cannot enable SegmentPartitionMetadataManager. Expecting SegmentPartitionConfig with exact 1 partition column
00:22:10.901 WARN [BrokerRoutingManager] [HelixTaskExecutor-message_handle_thread_46] Cannot enable SegmentPartitionMetadataManager. Expecting SegmentPartitionConfig with exact 1 partition column
00:22:11.252 WARN [BrokerRoutingManager] [HelixTaskExecutor-message_handle_thread_48] Cannot enable SegmentPartitionMetadataManager. Expecting SegmentPartitionConfig with exact 1 partition column
00:22:12.713 ERROR [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread_43] Exception while executing a state transition task dates_0 %
java.lang.reflect.InvocationTargetException: null
	...
Caused by: org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 3 attempts
	at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:65) ~[pinot-spi-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.common.utils.fetcher.BaseSegmentFetcher.fetchSegmentToLocal(BaseSegmentFetcher.java:74) ~[pinot-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:127) ~[pinot-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:135) ~[pinot-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchAndDecryptSegmentToLocal(SegmentFetcherFactory.java:168) ~[pinot-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegmentFromDeepStore(BaseTableDataManager.java:811) ~[pinot-core-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegment(BaseTableDataManager.java:761) ~[pinot-core-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.core.data.manager.BaseTableDataManager.downloadAndLoadSegment(BaseTableDataManager.java:405) ~[pinot-core-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.core.data.manager.BaseTableDataManager.addNewOnlineSegment(BaseTableDataManager.java:376) ~[pinot-core-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.core.data.manager.offline.OfflineTableDataManager.doAddOnlineSegment(OfflineTableDataManager.java:54) ~[pinot-core-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.core.data.manager.BaseTableDataManager.addOnlineSegment(BaseTableDataManager.java:330) ~[pinot-core-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addOnlineSegment(HelixInstanceDataManager.java:259) ~[pinot-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:131) ~[pinot-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT-baac1e4b0beed3e81a80cbbe14f68595856f01cc]
	... 12 more
...

00:22:12.721 ERROR [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread_43] Skip internal error. errCode: ERROR, errMsg: null
00:22:12.757 WARN [BaseInstanceSelector] [ClusterChangeHandlingThread] Failed to find servers hosting old segment: dates_0 % for table: dates_OFFLINE (all candidate instances: [] are disabled, counting segment as unavailable)
00:22:12.760 ERROR [MessageGenerationPhase] [HelixController-pipeline-default-SSBQueryIntegrationTest-(82906a8c_DEFAULT)] Event 82906a8c_DEFAULT : Unable to find a next state for resource: dates_OFFLINE partition: dates_0 % from stateModelDefinitionclass org.apache.helix.model.StateModelDefinition from:ERROR to:ONLINE
00:22:12.775 ERROR [MessageGenerationPhase] [HelixController-pipeline-default-SSBQueryIntegrationTest-(724aaa09_DEFAULT)] Event 724aaa09_DEFAULT : Unable to find a next state for resource: dates_OFFLINE partition: dates_0 % from stateModelDefinitionclass org.apache.helix.model.StateModelDefinition from:ERROR to:ONLINE
00:22:15.882 WARN [SegmentDeletionManager] [grizzly-http-server-4] Failed to find local segment file for segment file:/tmp/test-controller-data-dir1734049315170/dates/dates_0+%25
00:22:16.050 WARN [BaseInstanceSelector] [ClusterChangeHandlingThread] Failed to find servers hosting old segment: supplier_0 % for table: supplier_OFFLINE (all candidate instances: [Server_localhost_22001] are disabled, counting segment as unavailable)
Dec 13, 2024 12:22:16 AM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
@ankitsultana ankitsultana added flaky-test beginner-task Small task for new contributors to ramp up labels Dec 13, 2024
@cutiepie-10
Copy link
Contributor

Hi @ankitsultana,
How to produce this error?
I have tried many times but the test succeeded.

@cutiepie-10
Copy link
Contributor

Hello @ankitsultana,
This test was successfully passed on my device mutlipe times.

@ankitsultana
Copy link
Contributor Author

This test fails only in CI and I am not sure of the root-cause. On further thought, this might not be a beginner friendly task after all.

@cutiepie-10 : could you consider attempting this instead: #14787 ?

@ankitsultana ankitsultana removed the beginner-task Small task for new contributors to ramp up label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants