Skip to content

fix: safe opcache_reset#2073

Open
AlliBalliBaba wants to merge 22 commits intomainfrom
fix/opcache-safe-reset
Open

fix: safe opcache_reset#2073
AlliBalliBaba wants to merge 22 commits intomainfrom
fix/opcache-safe-reset

Conversation

@AlliBalliBaba
Copy link
Contributor

@AlliBalliBaba AlliBalliBaba commented Dec 14, 2025

Idea to fix #1737

Just a WIP, to test in CI.

@alexandre-daubois
Copy link
Member

Hi @AlliBalliBaba, any news on this side? 🙂
I think I stumbled across the very same issue with #2265

@AlliBalliBaba
Copy link
Contributor Author

I experimented a lot with juggling thread states, but not sure it's possible to make a reset fully safe from our side alone.

There are 2 main race conditions:

  • A thread parses a script without having restarted after a opcache_reset has been triggered.
  • Multiple threads trigger an opcache_reset simultaneously.

The first race condition kind of gets mitigated by the changes in this PR. THe second one would need to somehow be fixed in php-src or requires a hook to reject opcache_resets.

@withinboredom
Copy link
Member

Can we try doing what we do with the environment functions and simply override/replace the existing one:

  1. take a global write lock (read lock taken before executing a php script)
  2. do the real reset
  3. release the lock

@AlliBalliBaba
Copy link
Contributor Author

Yeah I tried that as well. But an opcache_reset can also happen for other reasons, like after calling opcache_invalidate repeatedly because memory is filling up.

But maybe overwriting the user function would still be a good first step 👍

@henderkes
Copy link
Contributor

Can we try doing what we do with the environment functions and simply override/replace the existing one:

  1. take a global write lock (read lock taken before executing a php script)
  2. do the real reset
  3. release the lock

I tried this and it kind of works, but workers are an issue. Only got it working by locking new requests, draining all workers, doing the reset, restarting all workers and then unlocking.

@AlliBalliBaba
Copy link
Contributor Author

AlliBalliBaba commented Mar 13, 2026

Oh I just realized that overwriting opcache and restarting workers was also exactly what this branch does. Do you hard-link against the opcache extension in your solution @henderkes? When hard-linking and calling the reset directly, this can probably be done much more cleanly.
Simply calling the original reset function will make some Docker image builds still fail with segfaults.

@henderkes
Copy link
Contributor

Yes, I hard linked because opcache is always active in 8.5+ (which is where I test). I'll tidy up my changes and merge them on top of this branch when I get to it.

@AlliBalliBaba
Copy link
Contributor Author

Maybe we can make it a requirement also for other PHP versions, not sure how many installations there are without it.
Solution would still be the same, all threads need to stop completely, but we would force a flush instead of letting PHP schedule one.

@henderkes
Copy link
Contributor

henderkes commented Mar 13, 2026

You already had logic to check whether opcache exists in the branch so I only had to pick in a minor change here. Let's see if tests pass.

Edit: well, apparently not. I only tested with 8.5 locally and guess what's passing x)

Okay, I think I get the issue now. Hard linking to opcache won't help because the init chain for shared extensions is different than for static extensions, which is why it works for me locally, but not in CI in < 8.5. Need to wait with actually calling opcache reset until the extension is properly loaded.

@henderkes henderkes force-pushed the fix/opcache-safe-reset branch 2 times, most recently from e54cd96 to 5c8e340 Compare March 13, 2026 13:52
@henderkes henderkes force-pushed the fix/opcache-safe-reset branch from 5c8e340 to 1d75824 Compare March 13, 2026 13:56
@henderkes henderkes force-pushed the fix/opcache-safe-reset branch from e7bd25a to 0d87765 Compare March 13, 2026 15:44
@henderkes
Copy link
Contributor

Alright, we're down to php 8.2 segfaulting. I'm not terribly keep to find out why.

@henderkes
Copy link
Contributor

Down to only debian 8.2 failing...

@henderkes
Copy link
Contributor

Down to half the debian tests failing x)

@henderkes
Copy link
Contributor

At this point I have no idea what it could be, I'm out of ideas. I cannot reproduce the segfault on 8.2 locally.

@withinboredom
Copy link
Member

Maybe we just don't use this when building 8.2? 8.2 is already EOL, just security support at this point, and even that ends soon.

@henderkes henderkes force-pushed the fix/opcache-safe-reset branch from 15aae23 to 22c6ba6 Compare March 14, 2026 10:55
@henderkes henderkes force-pushed the fix/opcache-safe-reset branch from fc58c35 to e9533b8 Compare March 14, 2026 12:39
@henderkes
Copy link
Contributor

I've wasted too much time on this. I don't have the faintest idea why php 8.2 is failing on this branch. Behaviour should be identical to origin/main.

@henderkes henderkes force-pushed the fix/opcache-safe-reset branch 4 times, most recently from ecfb435 to eced1fd Compare March 14, 2026 15:34
@henderkes henderkes force-pushed the fix/opcache-safe-reset branch 2 times, most recently from 1092d25 to d88821a Compare March 14, 2026 16:10
@dunglas
Copy link
Member

dunglas commented Mar 14, 2026

@withinboredom's proposal looks the best forward to me.

If someone really needs 8.2, he can always provide a patch later 😅

@henderkes
Copy link
Contributor

I've been trying this, right now the branch should have zero behavioural changes compared to main, yet for some reason the 8.2 docker builds are segfaulting. I've guarded every piece of changed logic behind php >= 8.3, but that doesn't make a difference.

@henderkes
Copy link
Contributor

#2282 (comment)

So it actually seems completely unrelated to this branches changes...

@AlliBalliBaba
Copy link
Contributor Author

Ah sry @henderkes, thought you knew from this also happening in #2272. The opcache preload tests were always segfaulting in 8.2. The PR that added the tests (#2257) did not trigger the Docker Images build, so this went under the radar. Still haven't looked into why they are failing. Worst case scenario, we'll remove them again.

@AlliBalliBaba
Copy link
Contributor Author

AlliBalliBaba commented Mar 14, 2026

You can merge #2284 into this branch for test fixes.

Did I just spend 10 hours attempting to figure out why my other branch segfaults on 8.2 debian images and it's not actually related to my branch in the end?

I feel you, chasing race conditions in this branch also took me in a circle.

@henderkes
Copy link
Contributor

Ah sry @henderkes, thought you knew from this also happening in #2272. The opcache preload tests were always segfaulting in 8.2. The PR that added the tests (#2257) did not trigger the Docker Images build, so this went under the radar. Still haven't looked into why they are failing. Worst case scenario, we'll remove them again.

I looked at the last 3 runs on the main branch and they were green with the test passing, so I thought it had to be connected to this branch. Extremely annoying, but kind of my own fault for not remembering them.

I've removed the 8.2 specific workarounds here so it does the safe session reset for every php version. Only 9 months to go for 8.2 support anyway...

@henderkes henderkes marked this pull request as ready for review March 15, 2026 02:47
Comment on lines 175 to 177
func RestartWorkers() {
// disallow scaling threads while restarting workers
scalingMu.Lock()
defer scalingMu.Unlock()

threadsToRestart := drainWorkerThreads()

for _, thread := range threadsToRestart {
thread.drainChan = make(chan struct{})
thread.state.Set(state.Ready)
}
restartThreadsAndOpcacheReset(false)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if restartWorkers should also restart regular threads, I suspect that might be what's causing #2265

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to try under which conditions opcache_reset is not strictly necessary with restart_workers, something for a different PR.

Signed-off-by: Marc <m@pyc.ac>
@AlliBalliBaba
Copy link
Contributor Author

Final test for this branch would be the moodle installation from #1737 if you get around to try it. Otherwise I can try setting it up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FrankenPHP crashes with zend_mm_heap corrupted while running some wordpress sites

5 participants