-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread conflict in Puppet::Pops::Loaders #9252
Comments
This was originally reported in https://puppet.atlassian.net/browse/PUP-11324. That issue tracker is now read-only, so please continue discussion here. |
Thanks for letting us know @jstange We currently use the environment object to protect against concurrent access, such as puppet/lib/puppet/pops/loaders.rb Line 28 in b04e822
Loaders instance methods that mutate state, like:
def [](loader_name)
environment.lock.synchronize do
loader = @loaders_by_name[loader_name]
if loader.nil?
# Unable to find the module private loader. Try resolving the module
loader = private_loader_for_module(loader_name[0..-9]) if loader_name.end_with?(' private')
raise Puppet::ParseError, _("Unable to find loader named '%{loader_name}'") % { loader_name: loader_name } if loader.nil?
end
loader
end Alternatively you could only lock when def [](loader_name)
loader = @loaders_by_name[loader_name]
if loader.nil?
environment.lock.synchronize do
if loader.nil?
... We don't have any immediate plans for working on this, but we welcome pull requests. I'll put the |
After some poking I might have some vague idea how threading works in here. Seems like it's JRuby-specific, and in the case I care about it's probably instances of
I'll spend some time with my stack traces and see if there's anywhere else that looks thread-dangerous and deadlock-safe, wrap those in |
Migrated issue to PUP-12038 |
Finally got my corporate overlords to sign off on the CLA for the #9299. Our fix has been deployed locally for a while now and seems to solve the issue. We've recently identified a similar, but distinct, threading issue, this one in |
Describe the Bug
We observe the following error across ~5% of all Puppet agent runs in our organization against our Puppet 7.x masters when we set
multithreaded: true
, with any value formax-active-instances
(even2
):The error disappears if we disable threading, or if we hand-insert thread synchronization into the offending method or its caller. It is not consistent to any particular client node, and nodes on which agent runs throw the error will work fine the other ~95% of the time.
The source of the error,
add_loader_by_name
inpuppet/lib/ruby/vendor_ruby/puppet/pops/loaders.rb
, is fairly straightforward, adding a new key to an instance variable and throwing the error we observe if the key is already present:Interestingly, adding
Puppet.info
orPuppet.warn
output before theif
statement, to debug the call stack and thread state, causes the issue to disappear. I'd theorize that whatever output synchronization the logging system uses adds sort of back-door thread safety. Adding logging inside theif
statement doesn't do this, which makes sense.The calling method in all instances of this error we've logged to date has been
[]
. If we add naïve Ruby-native synchronization to that method as below, our problem disappears:That's presumably not a correct-for-this-codebase fix by itself, but does point in the general direction of "it's threads stepping on each other, add a sempahore somewhere."
Environment
Additional Context
It's not clear why our ecosystem in particular triggers this issue. While we're a medium-large organization, our Puppet masters are spread around to various regional collections of nodes. The one on which we did the above debugging manages only about a dozen nodes. The issue manifests on all of them.
Our module ecosystem is old growth forest, with a lot of interwoven dependency. It's possible that the trigger here is multiple references to the same dependent object being resolved in parallel.
It's of note that we have a number of modules with names of the form
*_private
(in a search path distinct from our "main" module repo). But this seems to occur with any module and does not occur with threading disabled, so is unlikely to be some kind of namespace collision.The text was updated successfully, but these errors were encountered: