-
Couldn't load subscription status.
- Fork 2.3k
Fix a flaky test with Files.walk() #19758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #19758 +/- ##
============================================
+ Coverage 73.19% 73.24% +0.04%
- Complexity 70946 70983 +37
============================================
Files 5735 5735
Lines 324654 324654
Branches 46962 46962
============================================
+ Hits 237643 237789 +146
+ Misses 67875 67702 -173
- Partials 19136 19163 +27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fixed formatting issue with extra empty line Signed-off-by: Joe Liu <[email protected]>
|
❌ Gradle check result for 196fb16: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 196fb16: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| try { | ||
| if (path.endsWith(".lock") == false) { | ||
| Files.copy(path, newIndexDataPath.resolve(indexDataPath.relativize(path))); | ||
| Files.walkFileTree(indexDataPath, new SimpleFileVisitor<Path>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this same race condition was attempted to be fixed by this code:
OpenSearch/server/src/internalClusterTest/java/org/opensearch/index/shard/IndexShardIT.java
Lines 322 to 330 in f3632f1
| // race condition: async flush may cause translog file deletion resulting in an inconsistent stream from | |
| // Files.walk below during copy phase | |
| // temporarily disable refresh to avoid any flushes or syncs that may inadvertently cause the deletion | |
| assertAcked( | |
| client().admin() | |
| .indices() | |
| .prepareUpdateSettings(index) | |
| .setSettings(Settings.builder().put(IndexSettings.INDEX_REFRESH_INTERVAL_SETTING.getKey(), "-1").build()) | |
| ); |
Is it possible there is still an in-flight refresh that causes the inconsistency even after disabling refresh? If so, could we just force a refresh after disabling and that might ensure everything is quiescent?
My concern with the approach here is that if things are being mutated while traversing the files then we can't know that we've copied a consistent view of the flies for the shard, which seems like a requirement of this test.
Description
The original code was walking the source data path while it’s still being mutated by engine/shard cleanup, and
Files.walk(...)is lazy: as it iterates it callsreadAttributeson each entry to decide file/dir. One of those entries (translog-7.ckp) was deleted between directory listing and attribute read →NoSuchFileException, which then gets wrapped asUncheckedIOExceptionRelated Issues
Resolves #19731
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.