Skip to content

Fix WindowsFS onClose race condition#15951

Merged
dweiss merged 4 commits intoapache:mainfrom
szybia:fix-windows-fs-onClose-race
Apr 16, 2026
Merged

Fix WindowsFS onClose race condition#15951
dweiss merged 4 commits intoapache:mainfrom
szybia:fix-windows-fs-onClose-race

Conversation

@szybia
Copy link
Copy Markdown
Contributor

@szybia szybia commented Apr 10, 2026

  • If i create an InputStream on a target file, and then close it, if i concurrently (before onClose finishes) mv tmp target, i shouldn't get an AssertionError
  • Happens because with underlying unix FS, getKey will return current inode for the path, if concurrent move happens, the key within openFiles changes, and now it'll fail to remove it from openFiles
  • Spotted in ES test failure, where we atomically replace a file
  • Reproduced with (tested with my change, no failure):
    public void testAtomicMoveRaceDuringClose() throws Exception {
        Path rawDir = createTempDir();
        WindowsFS windowsFs = new WindowsFS(rawDir.getFileSystem());
        Path dir = windowsFs.wrapPath(rawDir);
        Path target = dir.resolve("target");
        Files.write(target, new byte[] { 42 });
        int writerCount = 16;
        int readerCount = 16;
        Thread[] threads = new Thread[writerCount + readerCount];
        AtomicReference<AssertionError> raceHit = new AtomicReference<>();
        for (int i = 0; i < writerCount; i++) {
            final int id = i;
            threads[i] = new Thread(() -> {
                int counter = 0;
                while (Thread.currentThread().isInterrupted() == false) {
                    try {
                        Path tmp = dir.resolve("tmp-" + id + "-" + counter++);
                        Files.write(tmp, new byte[] { 42 });
                        Files.move(tmp, target, StandardCopyOption.ATOMIC_MOVE);
                    } catch (Exception ignored) {}
                }
            });
        }
        for (int i = 0; i < readerCount; i++) {
            threads[writerCount + i] = new Thread(() -> {
                while (Thread.currentThread().isInterrupted() == false) {
                    try (InputStream in = Files.newInputStream(target)) {
                        in.readAllBytes();
                    } catch (Exception ignored) {}
                }
            });
            threads[writerCount + i].setUncaughtExceptionHandler((t, e) -> { if (e instanceof AssertionError ae) raceHit.set(ae); });
        }
        for (Thread t : threads)
            t.start();
        Thread.sleep(120_000);
        for (Thread t : threads)
            t.interrupt();
        for (Thread t : threads)
            t.join(5_000);
        final var e = raceHit.get();
        if (e != null) {
            logger.info("Race reproduced: ", e);
        }
    }
[2026-04-10T20:43:31,537][INFO ][o.e.x.s.c.c.StatelessHeartbeatStoreTests][testAtomicMoveRaceDuringClose] Race reproduced: 
java.lang.AssertionError: null
	at org.apache.lucene.tests.mockfile.WindowsFS.onClose(WindowsFS.java:80) ~[lucene-test-framework-10.4.0.jar:10.4.0 9983b7ce7fdd04f4d357688fb85c14277c15ea8d - 2026-02-20 18:05:31]
	at org.apache.lucene.tests.mockfile.HandleTrackingFS$1.close(HandleTrackingFS.java:103) ~[lucene-test-framework-10.4.0.jar:10.4.0 9983b7ce7fdd04f4d357688fb85c14277c15ea8d - 2026-02-20 18:05:31]```

@szybia szybia marked this pull request as ready for review April 10, 2026 15:54
Copy link
Copy Markdown
Member

@rmuir rmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find, thank you

@szybia
Copy link
Copy Markdown
Contributor Author

szybia commented Apr 10, 2026

@rmuir thanks for the review!

  1. should i add a changelog? was scanning it and my best guess is under Other but not sure...
  2. i'll also fix the styling...

@rmuir
Copy link
Copy Markdown
Member

rmuir commented Apr 10, 2026

@rmuir thanks for the review!

  1. should i add a changelog? was scanning it and my best guess is under Other but not sure...
  2. i'll also fix the styling...

@szybia Changelog is always a plus I think. "Other" or "Tests" or any other best guess is fine. Thanks!

@github-actions github-actions bot added this to the 10.5.0 milestone Apr 10, 2026
@szybia
Copy link
Copy Markdown
Contributor Author

szybia commented Apr 10, 2026

i added changelog under 10.5.0, i'm assuming could go into a 10.4.1 patch but there's no changelog entry for it, i could add it...

pushed changes as a conversation starter

@rmuir
Copy link
Copy Markdown
Member

rmuir commented Apr 10, 2026

10.5 is good, i'll backport it. I don't know if there will be a 10.4.1, but if someone wants to make one we could then cherry-pick it back to that, too.

@szybia
Copy link
Copy Markdown
Contributor Author

szybia commented Apr 10, 2026

thank you!

i don't have write access so leaving up to you to merge 🚀

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Apr 16, 2026

I'll merge and backport.

@dweiss dweiss merged commit 181c3ad into apache:main Apr 16, 2026
13 checks passed
dweiss added a commit that referenced this pull request Apr 16, 2026
* Fix WindowsFS onClose race condition

* add changelog and fix styling

---------

Co-authored-by: Uwe Schindler <uschindler@apache.org>
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
@szybia szybia deleted the fix-windows-fs-onClose-race branch April 16, 2026 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants