8380103: Perfdata shared memory file flock failures by caspernorrbin · Pull Request #30537 · openjdk/jdk

caspernorrbin · 2026-04-01T13:01:16Z

Hi everyone,

The existing posix hsperfdata logic has a potential race between stale-file cleanup and new file creation. During startup, a JVM first scans the hsperfdata directory and cleans up old files before creating its own file. While one JVM is scanning the directory and probing pid files to decide whether they are stale, another JVM may already have created its own file but not yet acquired the flock on it. Because ownership is established with an open followed by a flock, the cleanup path can briefly win that race on a freshly created file and cause the creating JVM to fail its own flock. The same locking protocol is also used to distinguish stale files from live JVMs in other pid namespaces when multiple containers share the same /tmp.

To reduce this race window, I have made the cleanup path more conservative with file removals, and the startup path more tolerant of collisions. The cleanup sweep now checks file modification time and avoids touching files that were recently modified, so we do not try to flock and remove files that are likely still in use or have just been created. In addition, if create_sharedmem_file() loses the initial flock with EWOULDBLOCK, it now retries a small number of times with a short delay before giving up, which gives a concurrent cleanup JVM time to release the lock.

Together these changes make the race much less likely to affect startup. Files that were just created are left out of the cleanup sweep, and even if they were to get probed, the creating JVM would still eventually get to flock it. If a stale file that should have been removed isn't, it is still going to get cleaned up eventually with the next JVM start. I also updated the ShareTmpDir.java test, as with these changes we do not try and remove the file, which results in a different log output.

Testing:

Oracle tiers 1-5
Multiple Oracle tiers 1-5 on linux with extra asserts added to ensure we can always create and flock the hsperfdata file

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8380103: Perfdata shared memory file flock failures (Bug - P3)

Reviewers

David Holmes (@dholmes-ora - Reviewer)
Anton Artemov (@toxaart - Committer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30537/head:pull/30537
$ git checkout pull/30537

Update a local copy of the PR:
$ git checkout pull/30537
$ git pull https://git.openjdk.org/jdk.git pull/30537/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30537

View PR using the GUI difftool:
$ git pr show -t 30537

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30537.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2026-04-01T13:02:42Z

👋 Welcome back cnorrbin! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2026-04-01T13:05:44Z

@caspernorrbin This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8380103: Perfdata shared memory file flock failures

Reviewed-by: dholmes, aartemov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 86 new commits pushed to the master branch:

262f574: 8381487: Replace threadDump.schema.json with document to describe format
a506853: 8374020: Inconsistent handling of type updates in typeWithAnnotations
fa5ec62: 8378950: Repeated warnings when annotation processing is happening
... and 83 more: https://git.openjdk.org/jdk/compare/299452402551d5387eb41ad799ce6a05c05237b9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2026-04-01T13:07:01Z

@caspernorrbin The following label will be automatically applied to this pull request:

hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2026-04-01T13:10:40Z

Webrevs

01: Full - Incremental (86354961)
00: Full (1b8aab95)

dholmes-ora

This seems quite reasonable to me. Thanks for fixing this nuisance issue.

One minor suggestion below, and one issue in the test.

src/hotspot/os/posix/perfMemory_posix.cpp

dholmes-ora · 2026-04-02T05:36:34Z

test/hotspot/jtreg/containers/docker/ShareTmpDir.java

+                               out2.getStdout().contains(s2) ||
+                               out2.getStdout().contains(s2));


Should one of those be out1?

Yes it definitely should! Fixed now. The reason it worked is because out2 is from the second process which is almost always the one that contains the output.

toxaart · 2026-04-02T07:21:05Z

src/hotspot/os/posix/perfMemory_posix.cpp

+// still be starting up and are therefore not candidates for stale-file
+// cleanup. This avoids racing a concurrent JVM startup while scanning the
+// hsperfdata directory.
+static const time_t cleanup_grace_period_seconds = 5;


A drive-by comment, how did you determine this value?

Very arbitrarily. It could probably be a bit lower, but I wanted to give a bit of a buffer in case some weird behaviour happens when we start a lot of VMs concurrently.

I understand that, so by the log message if needed we'll be able to tell if it sufficient in most cases. If nothing pops up, then yes.

toxaart

LGTM

dholmes-ora

Thanks for the updates.

flock race fix

1b8aab9

openjdk bot changed the title ~~8380103~~ 8380103: Perfdata shared memory file flock failures Apr 1, 2026

openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Apr 1, 2026

openjdk bot added the rfr Pull request is ready for review label Apr 1, 2026

dholmes-ora reviewed Apr 2, 2026

View reviewed changes

toxaart reviewed Apr 2, 2026

View reviewed changes

feedback fixes

8635496

toxaart approved these changes Apr 2, 2026

View reviewed changes

dholmes-ora approved these changes Apr 2, 2026

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8380103: Perfdata shared memory file flock failures#30537

8380103: Perfdata shared memory file flock failures#30537
caspernorrbin wants to merge 2 commits intoopenjdk:masterfrom
caspernorrbin:perfdata-flock-race

caspernorrbin commented Apr 1, 2026 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Apr 1, 2026

Uh oh!

openjdk bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

openjdk bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

mlbridge bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

dholmes-ora left a comment

Uh oh!

Uh oh!

dholmes-ora Apr 2, 2026

Uh oh!

caspernorrbin Apr 2, 2026

Uh oh!

toxaart Apr 2, 2026

Uh oh!

caspernorrbin Apr 2, 2026

Uh oh!

toxaart Apr 2, 2026

Uh oh!

toxaart left a comment

Uh oh!

dholmes-ora left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

		out2.getStdout().contains(s2) \|\|
		out2.getStdout().contains(s2));

Conversation

caspernorrbin commented Apr 1, 2026 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Apr 1, 2026

Uh oh!

openjdk bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dholmes-ora Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

caspernorrbin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

toxaart Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

caspernorrbin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

toxaart Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

toxaart left a comment

Choose a reason for hiding this comment

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

caspernorrbin commented Apr 1, 2026 •

edited by openjdk bot

Loading

openjdk bot commented Apr 1, 2026 •

edited

Loading

openjdk bot commented Apr 1, 2026 •

edited

Loading

mlbridge bot commented Apr 1, 2026 •

edited

Loading