-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock in Hbc::SystemCommand
#21665
Conversation
The current implementation of `Hbc::SystemCommand` uses the `Open3` library in a way that causes a deadlock in cases where `STDERR` grows beyond a certain size. This commit fixes the issue by following the practice suggested in the Ruby documentation, i. e. choosing a stream selection method that avoids deadlocks.
Great finding, and if I might add - one of the best, quickest and most substantial bug analysis I've seen on GH in a long time! Great work, thanks for spending your time on this! 💯 |
Awesome to have you here, @claui. LGTM 👍 |
Amazing detective work, here, and so detailed! Big thumbs up to this. |
Thanks so much @vitorgalvao! Do we tolerate the MIT thing as well? I’ve seen comments of yours suggesting between the lines that this project is not exactly in love with that license. 😄 |
I’m guessing you’re referring to #14877. To clarify, I’m not at all opposed to using or including projects licensed under the MIT or other permissive licenses, I just find them a bit silly since their terms sound specific but are actually vague (“At which point is the code so modified that it is no longer the original?”) and they’re so close to the public domain they could very well just be public domain and benefit everyone on a greater scale. I’d also like to be clear those views were my own and not necessarily the views of the project as a whole (after all, the issue was open as a question). That said, every other maintainer supported the notion, so in that sense I guess those are indeed the views of the project. But just to reiterate, no problem at all in using other people’s less permissive stuff, I’d just like for our stuff to be as permissive as possible. |
* Write failing tests for issue Homebrew#18638 * Fix `Hbc::SystemCommand` to avoid deadlocks; ref Homebrew#18638 The current implementation of `Hbc::SystemCommand` uses the `Open3` library in a way that causes a deadlock in cases where `STDERR` grows beyond a certain size. This commit fixes the issue by following the practice suggested in the Ruby documentation, i. e. choosing a stream selection method that avoids deadlocks.
Summary
The current implementation of
Hbc::SystemCommand
uses theOpen3
library in a way that causes deadlocks. In particular, a deadlock occurs wheneverSTDERR
grows beyond a certain size (> 5000 lines) on the default OS X system Ruby.This PR fixes the issue; it also provides failing tests to make the deadlock reproducible.
@caskroom/maintainers Please review! Thanks in advance 😊
Analysis
Apparently, the bug was well concealed; it was brought to my attention only today when user @winkelsdorf filed issue #18638 (thanks again!). The deadlock is triggered when
brew cask install parallels-desktop --force
is run with the app already installed; this triggers a/bin/chmod -R -- u+rwx '/Applications/Parallels Desktop.app'
, producing an extremely long result on standard error, which in turn triggers the bug.Deadlock
The Ruby documentation, section Open3 summarizes quite well why the deadlock occurs:
Root cause
The offending code snippet from
system_command.rb
is:The deadlock occurs during the first
while
loop whenraw_stderr
happens to grow beyond a certain size.In that particular case, both
raw_stdout
andraw_stderr
will block until someone reads fromraw_stderr
. In the code above, this condition is not met until the secondwhile
loop.In other words: things go south because the first
while
loop waits onraw_stdout
, which waits onraw_stderr
to be consumed, which in turn waits for thewhile
loop to finish; a deadlock.Triggering the failure
When Parallels Desktop.app is installed, the bundle is owned by
root
and read-only to everyone else. Whenbrew cask install --force
is subsequently called, HBC needs to remove the existing app. It is actually quite smart in handling this; as a first gentle attempt before escalating tosudo
, it tries to make the app world-writeable:In general, this particular
chmod
is expected to fail; it is only a first attempt to avoidsudo
whenever possible. In our example however, the failingchmod
causes a lot of noise on STDERR (> 5000 lines) which triggers the deadlock inHbc::SystemCommand
.The fix
Following the suggestion from the Ruby documentation, I have replaced the hard-coded order of streams in
Hbc::SystemCommand
by pollingIO.select
, which guarantees deadlock-free stream selection.To clean up the code a bit, I also extracted a few methods and ivars while at it.
Testing
Failing specs
I have included a spec which should detect, and fail on, this particular deadlock when run with the default OS X system Ruby. For details, see
spec/cask/system_command_spec.rb
.The new specs are in a separate commit within this PR. They be run against either the old and new implementation.
Additional test dependencies
To make this case testable, and allow for the spec to be more expressive, I have added two new test dependencies:
rspec-its
andrspec-wait
. Both gems are released under the MIT license (1 2). I strongly believe that the former package is going to benefit our specs greatly; the latter is needed for deadlock detecting. In my opinion, either package is well worth being included in our testing toolchain.Running the tests
To install the added test dependencies, you might want need to run
bundle install
once.The quickest way to run the specs is:
bundle exec rake test:rspec
If you want to watch the spec fail, simply
git reset
one commit back in history, and execute the specs there.