Split the download and install process of a gem#9381
Split the download and install process of a gem#9381Edouard-chin merged 2 commits intoruby:masterfrom
Conversation
| build_args: options[:build_args], | ||
| bundler_extension_cache_path: extension_cache_path(spec) | ||
| ) | ||
| installer = rubygems_gem_installer(spec, options) |
There was a problem hiding this comment.
The Bundler::RubyGemsGemInstaller object is memoized so that the install part doesn't have to re-instantiate one. The reason is because creating a new Bundler::RubyGemsGemInstaller is expensive due to the underling Package that reads the tarball and get the metadata file (as well as computing the gem digest).
rubygems/lib/rubygems/package.rb
Lines 606 to 607 in 381c4e3
fe24401 to
45644cf
Compare
a57af9e to
d58f1df
Compare
| @gem_installers_mutex.synchronize { @gem_installers[spec.name] } || begin | ||
| path = fetch_gem_if_possible(spec, options[:previous_spec]) | ||
| raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path | ||
|
|
||
| REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" } | ||
|
|
||
| installer = Bundler::RubyGemsGemInstaller.at( | ||
| path, | ||
| security_policy: Bundler.rubygems.security_policies[Bundler.settings["trust-policy"]], | ||
| install_dir: rubygems_dir.to_s, | ||
| bin_dir: Bundler.system_bindir.to_s, | ||
| ignore_dependencies: true, | ||
| wrappers: true, | ||
| env_shebang: true, | ||
| build_args: options[:build_args], | ||
| bundler_extension_cache_path: extension_cache_path(spec) | ||
| ) | ||
| @gem_installers_mutex.synchronize { @gem_installers[spec.name] ||= installer } | ||
| end | ||
| end |
There was a problem hiding this comment.
Why @gem_installers_mutex.synchronize { @gem_installers[spec.name] } || begin ... end not the following?
@gem_installers_mutex.synchronize do
@gem_installers[spec.name] ||= begin
...
end
endI think that the latter is better. If we use the former, begin ... end may be executed multiple times.
There was a problem hiding this comment.
I opted to go with this because I needed to lock the mutex for the shortest amount of time.
I originally used your suggestion but saw a net speed regression because the mutex was locked by a thread until it finished downloading a gem.
There was a problem hiding this comment.
I see.
How about adding a comment why we don't use the standard mutex.synchronized {cached ||= build} memoization idiom here?
There was a problem hiding this comment.
That's a good idea, thanks for the suggestion I added it !
| path = fetch_gem_if_possible(spec, options[:previous_spec]) | ||
| raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path | ||
|
|
||
| REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" } |
There was a problem hiding this comment.
Can we require rubygems_gem_installer in the top-level instead?
There was a problem hiding this comment.
Unfortunately, I don't think I can. I tried in https://github.com/ruby/rubygems/actions/runs/22919157297 but too many failures. The rubygems_gem_installer file itself requires multiple other files/default gem which then may conflict with the version requested from the Gemfile.
This Ruby 3.2 bug is really annoying, especially for Bundler/RG, because this codebase has to require most things as late as possible, to not interfere with gem requested by the user in its Gemfile
- ### Problem In ruby#9381, I explained the issue about the "tail latency". TL;DR When a gem with a native extensions is downloaded, the sooner we starts compiling, the sooner it gets installed. If a gem with a native extensions ends up at the end of the queue, the longer `bundle install` becomes. ### Solution I'd like to introduce a simple queue with priority. When a gem is downloaded, we check whether that gem has a native extension and if its dependencies are installed. If both conditions are met, we add the gem in the priority queue to be picked up as quickly as possible.
- ### Problem In ruby#9381, I explained the issue about the "tail latency". TL;DR When a gem with a native extensions is downloaded, the sooner we starts compiling, the sooner it gets installed. If a gem with a native extensions ends up at the end of the queue, the longer `bundle install` becomes. ### Solution I'd like to introduce a simple queue with priority. When a gem is downloaded, we check whether that gem has a native extension and if its dependencies are installed. If both conditions are met, we add the gem in the priority queue to be picked up as quickly as possible.
- ### Problem In ruby#9381, I explained the issue about the "tail latency". TL;DR When a gem with a native extensions is downloaded, the sooner we starts compiling, the sooner it gets installed. If a gem with a native extensions ends up at the end of the queue, the longer `bundle install` becomes. ### Solution I'd like to introduce a simple queue with priority. When a gem is downloaded, we check whether that gem has a native extension and if its dependencies are installed. If both conditions are met, we add the gem in the priority queue to be picked up as quickly as possible.
I completely agree and think the performance gains from this direction make sense. I was just researching ways to speed up other package managers, and I found out that they parallelize not only installation but also downloads. I was thinking of trying it out, and it seems to have already been achieved with pull requests, so I'm really looking forward to it. |
- ### Problem In ruby#9381, I explained the issue about the "tail latency". TL;DR When a gem with a native extensions is downloaded, the sooner we starts compiling, the sooner it gets installed. If a gem with a native extensions ends up at the end of the queue, the longer `bundle install` becomes. ### Solution I'd like to introduce a simple queue with priority. When a gem is downloaded, we check whether that gem has a native extension and if its dependencies are installed. If both conditions are met, we add the gem in the priority queue to be picked up as quickly as possible.
- ### Problem In ruby#9381, I explained the issue about the "tail latency". TL;DR When a gem with a native extensions is downloaded, the sooner we starts compiling, the sooner it gets installed. If a gem with a native extensions ends up at the end of the queue, the longer `bundle install` becomes. ### Solution I'd like to introduce a simple queue with priority. When a gem is downloaded, we check whether that gem has a native extension and if its dependencies are installed. If both conditions are met, we add the gem in the priority queue to be picked up as quickly as possible.
| path = fetch_gem_if_possible(spec, options[:previous_spec]) | ||
| raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path | ||
|
|
||
| REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" } |
| @gem_installers_mutex.synchronize { @gem_installers[spec.name] } || begin | ||
| path = fetch_gem_if_possible(spec, options[:previous_spec]) | ||
| raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path | ||
|
|
||
| REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" } | ||
|
|
||
| installer = Bundler::RubyGemsGemInstaller.at( | ||
| path, | ||
| security_policy: Bundler.rubygems.security_policies[Bundler.settings["trust-policy"]], | ||
| install_dir: rubygems_dir.to_s, | ||
| bin_dir: Bundler.system_bindir.to_s, | ||
| ignore_dependencies: true, | ||
| wrappers: true, | ||
| env_shebang: true, | ||
| build_args: options[:build_args], | ||
| bundler_extension_cache_path: extension_cache_path(spec) | ||
| ) | ||
| @gem_installers_mutex.synchronize { @gem_installers[spec.name] ||= installer } | ||
| end | ||
| end |
There was a problem hiding this comment.
I see.
How about adding a comment why we don't use the standard mutex.synchronized {cached ||= build} memoization idiom here?
- ### Problem
Bundler awaits for the dependencies of a gem to be download and installed
before it proceeds to downloading and installing the dependency itself.
This creates a bottleneck at the end of the installation process and
block a thread unnecessarily.
### Details
The installation strategy in Bundler is to await for "leaf gems" to
be download/installed before the "root gem" is processed.
For instance, in this scenario:
- Gem "foo" has a dependency on "bar" (We can call this the "root gem")
- Gem "bar" has no dependency (We call cal this the "leaf gems")
- A Gemfile adds "gem 'foo'"
In this case, only a single thread will have work to do, that is
because Bundler will queue the gem "bar" to be downloaded and
installed, and only when "bar" is finished, then Bundler will queue
work for "foo".
For **pure ruby gems**, this strategy is a waste of time because
during the installation, a gem's code is not evaluated and no
"require" statement is evaluated.
For gems with native extensions, this strategy make sense. When the
`extconf.rb` of a gem is evaluated, it's possible that the extconf
requires a dependency and therefore Bundler needs to wait for those
dependencies to be installed before it can execute the extconf.
A typical example is a native extension that require 'mini_portile2'.
### Solution
From the explanation above, I'd like to split the download from
the installation of a gem.
The tricky aspect is that there is no RubyGems API to know whether
a gem has a native extension. The only way to know is after we
download the gem and read the `metadata` from the tarball.
So the solution in this patch is as follow:
1. We download all gems without doing any checks.
2. Once the gems are downloaded, we now know whether a gem has a
native extensions.
3. If a gem is a pure ruby gems, we install it without waiting.
4. If a gem has a native extension, we check whether its dependencies
are installed. If a dependency is not yet installed, we put back
the gem in the queue. It will be dequeued later on and we'll redo
this logic.
### Performance gain
The speed gain highly depends on how deep the dependency tree is.
E.g. bar depends on foo which depends on baz which depends on jane ...
Previously we'd proceed to installing gems just one by one and only
a single thread would be used.
In a freshly generated Rails application, the speed gain is not that
important. And the reason is because we are having a "tail latency"
issue. Around the end of the installation process, the remaining
gems to be installed are the one with native extensions, since we
had to wait for for their dependencies to be installed. Compiling
them is the slowest part, and since we are doing it at the end then
the speed gain is not that noticeable.
### Misc
Another advantage of this change is to be able to more easily
implement this kind of feature ruby#9138
to let users solely download gems, without installing them .
- There are failing tests on Ruby 3.2 due to an expecation that checks wheter stderr is empty. In this case, stderr outputs a warning from Ruby "warning: loading in progress, circular require considered harmful", but this is not true. This is a bug in Ruby 3.2 that I also encountered in ruby#9100 (comment) The problem is that Ruby 3.2 wrongly output this warning when multiple threads try to require a file at the same time. See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby 3.2.2. I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this warning output.
d58f1df to
713dd5c
Compare
- ### Problem In ruby#9381, I explained the issue about the "tail latency". TL;DR When a gem with a native extensions is downloaded, the sooner we starts compiling, the sooner it gets installed. If a gem with a native extensions ends up at the end of the queue, the longer `bundle install` becomes. ### Solution I'd like to introduce a simple queue with priority. When a gem is downloaded, we check whether that gem has a native extension and if its dependencies are installed. If both conditions are met, we add the gem in the priority queue to be picked up as quickly as possible.
What was the end-user or developer problem that led to this PR?
Bundler awaits for the dependencies of a gem to be download and installed before it proceeds to downloading and installing the dependency itself. This creates a bottleneck at the end of the installation process and block a thread unnecessarily.
Before
After
What is your fix for the problem, implemented in this PR?
Details
The installation strategy in Bundler is to await for "leaf gems" to be download/installed before the "root gem" is processed.
For instance, in this scenario:
In this case, only a single thread will have work to do, that is because Bundler will queue the gem "bar" to be downloaded and installed, and only when "bar" is finished, then Bundler will queue work for "foo".
For pure ruby gems, this strategy is a waste of time because during the installation, a gem's code is not evaluated and no "require" statement is evaluated.
For gems with native extensions, this strategy make sense. When the
extconf.rbof a gem is evaluated, it's possible that the extconf requires a dependency and therefore Bundler needs to wait for those dependencies to be installed before it can execute the extconf.A typical example is a native extension that require 'mini_portile2'.
Solution
From the explanation above, I'd like to split the download from the installation of a gem.
The tricky aspect is that there is no RubyGems API to know whether a gem has a native extension. The only way to know is after we download the gem and read the
metadatafrom the tarball.So the solution in this patch is as follow:
Performance gain
The speed gain highly depends on how deep the dependency tree is.
E.g. bar depends on foo which depends on baz which depends on jane ...
Previously we'd proceed to installing gems just one by one and only a single thread would be used.
In a freshly generated Rails application, the speed gain is not that important. And the reason is because we are having a "tail latency" issue. Around the end of the installation process, the remaining gems to be installed are the one with native extensions, since we
had to wait for for their dependencies to be installed. Compiling them is the slowest part, and since we are doing it at the end then the speed gain is not that noticeable.
You can see this in the second screenshot "after". All threads have finished their work and are just waiting for the last remaining thread that compiles the very last gem.
Misc
Another advantage of this change is to be able to more easily implement this kind of feature #9138 to let users solely download gems, without installing them .
Make sure the following tasks are checked