Skip to content

Split the download and install process of a gem#9381

Merged
Edouard-chin merged 2 commits intoruby:masterfrom
Shopify:ec-split-download-install
Mar 12, 2026
Merged

Split the download and install process of a gem#9381
Edouard-chin merged 2 commits intoruby:masterfrom
Shopify:ec-split-download-install

Conversation

@Edouard-chin
Copy link
Collaborator

@Edouard-chin Edouard-chin commented Mar 10, 2026

What was the end-user or developer problem that led to this PR?

Bundler awaits for the dependencies of a gem to be download and installed before it proceeds to downloading and installing the dependency itself. This creates a bottleneck at the end of the installation process and block a thread unnecessarily.

Before

Bundler-waiting

After

no-wait

What is your fix for the problem, implemented in this PR?

Details

The installation strategy in Bundler is to await for "leaf gems" to be download/installed before the "root gem" is processed.

For instance, in this scenario:

  • Gem "foo" has a dependency on "bar" (We can call this the "root gem")
  • Gem "bar" has no dependency (We call cal this the "leaf gems")
  • A Gemfile adds "gem 'foo'"

In this case, only a single thread will have work to do, that is because Bundler will queue the gem "bar" to be downloaded and installed, and only when "bar" is finished, then Bundler will queue work for "foo".

For pure ruby gems, this strategy is a waste of time because during the installation, a gem's code is not evaluated and no "require" statement is evaluated.

For gems with native extensions, this strategy make sense. When the extconf.rb of a gem is evaluated, it's possible that the extconf requires a dependency and therefore Bundler needs to wait for those dependencies to be installed before it can execute the extconf.
A typical example is a native extension that require 'mini_portile2'.

Solution

From the explanation above, I'd like to split the download from the installation of a gem.

The tricky aspect is that there is no RubyGems API to know whether a gem has a native extension. The only way to know is after we download the gem and read the metadata from the tarball.

So the solution in this patch is as follow:

  1. We download all gems without doing any checks.
  2. Once the gems are downloaded, we now know whether a gem has a native extensions.
  3. If a gem is a pure ruby gems, we install it without waiting.
  4. If a gem has a native extension, we check whether its dependencies are installed. If a dependency is not yet installed, we put back the gem in the queue. It will be dequeued later on and we'll redo this logic.

Performance gain

The speed gain highly depends on how deep the dependency tree is.
E.g. bar depends on foo which depends on baz which depends on jane ...
Previously we'd proceed to installing gems just one by one and only a single thread would be used.

In a freshly generated Rails application, the speed gain is not that important. And the reason is because we are having a "tail latency" issue. Around the end of the installation process, the remaining gems to be installed are the one with native extensions, since we
had to wait for for their dependencies to be installed. Compiling them is the slowest part, and since we are doing it at the end then the speed gain is not that noticeable.
You can see this in the second screenshot "after". All threads have finished their work and are just waiting for the last remaining thread that compiles the very last gem.

Misc

Another advantage of this change is to be able to more easily implement this kind of feature #9138 to let users solely download gems, without installing them .

Make sure the following tasks are checked

build_args: options[:build_args],
bundler_extension_cache_path: extension_cache_path(spec)
)
installer = rubygems_gem_installer(spec, options)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Bundler::RubyGemsGemInstaller object is memoized so that the install part doesn't have to re-instantiate one. The reason is because creating a new Bundler::RubyGemsGemInstaller is expensive due to the underling Package that reads the tarball and get the metadata file (as well as computing the gem digest).

def spec
verify unless @spec

Comment on lines +515 to +534
@gem_installers_mutex.synchronize { @gem_installers[spec.name] } || begin
path = fetch_gem_if_possible(spec, options[:previous_spec])
raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path

REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" }

installer = Bundler::RubyGemsGemInstaller.at(
path,
security_policy: Bundler.rubygems.security_policies[Bundler.settings["trust-policy"]],
install_dir: rubygems_dir.to_s,
bin_dir: Bundler.system_bindir.to_s,
ignore_dependencies: true,
wrappers: true,
env_shebang: true,
build_args: options[:build_args],
bundler_extension_cache_path: extension_cache_path(spec)
)
@gem_installers_mutex.synchronize { @gem_installers[spec.name] ||= installer }
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why @gem_installers_mutex.synchronize { @gem_installers[spec.name] } || begin ... end not the following?

@gem_installers_mutex.synchronize do
  @gem_installers[spec.name] ||= begin
    ...
  end
end

I think that the latter is better. If we use the former, begin ... end may be executed multiple times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to go with this because I needed to lock the mutex for the shortest amount of time.

I originally used your suggestion but saw a net speed regression because the mutex was locked by a thread until it finished downloading a gem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

How about adding a comment why we don't use the standard mutex.synchronized {cached ||= build} memoization idiom here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, thanks for the suggestion I added it !

path = fetch_gem_if_possible(spec, options[:previous_spec])
raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path

REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we require rubygems_gem_installer in the top-level instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't think I can. I tried in https://github.com/ruby/rubygems/actions/runs/22919157297 but too many failures. The rubygems_gem_installer file itself requires multiple other files/default gem which then may conflict with the version requested from the Gemfile.

This Ruby 3.2 bug is really annoying, especially for Bundler/RG, because this codebase has to require most things as late as possible, to not interfere with gem requested by the user in its Gemfile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand.

Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Mar 12, 2026
- ### Problem

  In ruby#9381, I explained the
  issue about the "tail latency".

  TL;DR When a gem with a native extensions is downloaded, the sooner
  we starts compiling, the sooner it gets installed.
  If a gem with a native extensions ends up at the end of the queue,
  the longer `bundle install` becomes.

  ### Solution

  I'd like to introduce a simple queue with priority. When a gem is
  downloaded, we check whether that gem has a native extension and
  if its dependencies are installed. If both conditions are met,
  we add the gem in the priority queue to be picked up as quickly
  as possible.
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Mar 12, 2026
- ### Problem

  In ruby#9381, I explained the
  issue about the "tail latency".

  TL;DR When a gem with a native extensions is downloaded, the sooner
  we starts compiling, the sooner it gets installed.
  If a gem with a native extensions ends up at the end of the queue,
  the longer `bundle install` becomes.

  ### Solution

  I'd like to introduce a simple queue with priority. When a gem is
  downloaded, we check whether that gem has a native extension and
  if its dependencies are installed. If both conditions are met,
  we add the gem in the priority queue to be picked up as quickly
  as possible.
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Mar 12, 2026
- ### Problem

  In ruby#9381, I explained the
  issue about the "tail latency".

  TL;DR When a gem with a native extensions is downloaded, the sooner
  we starts compiling, the sooner it gets installed.
  If a gem with a native extensions ends up at the end of the queue,
  the longer `bundle install` becomes.

  ### Solution

  I'd like to introduce a simple queue with priority. When a gem is
  downloaded, we check whether that gem has a native extension and
  if its dependencies are installed. If both conditions are met,
  we add the gem in the priority queue to be picked up as quickly
  as possible.
@hsbt
Copy link
Member

hsbt commented Mar 12, 2026

Another advantage of this change is to be able to more easily implement this kind of feature #9138 to let users solely download gems, without installing them .

I completely agree and think the performance gains from this direction make sense.

I was just researching ways to speed up other package managers, and I found out that they parallelize not only installation but also downloads. I was thinking of trying it out, and it seems to have already been achieved with pull requests, so I'm really looking forward to it.

Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Mar 12, 2026
- ### Problem

  In ruby#9381, I explained the
  issue about the "tail latency".

  TL;DR When a gem with a native extensions is downloaded, the sooner
  we starts compiling, the sooner it gets installed.
  If a gem with a native extensions ends up at the end of the queue,
  the longer `bundle install` becomes.

  ### Solution

  I'd like to introduce a simple queue with priority. When a gem is
  downloaded, we check whether that gem has a native extension and
  if its dependencies are installed. If both conditions are met,
  we add the gem in the priority queue to be picked up as quickly
  as possible.
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Mar 12, 2026
- ### Problem

  In ruby#9381, I explained the
  issue about the "tail latency".

  TL;DR When a gem with a native extensions is downloaded, the sooner
  we starts compiling, the sooner it gets installed.
  If a gem with a native extensions ends up at the end of the queue,
  the longer `bundle install` becomes.

  ### Solution

  I'd like to introduce a simple queue with priority. When a gem is
  downloaded, we check whether that gem has a native extension and
  if its dependencies are installed. If both conditions are met,
  we add the gem in the priority queue to be picked up as quickly
  as possible.
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

path = fetch_gem_if_possible(spec, options[:previous_spec])
raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path

REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand.

Comment on lines +515 to +534
@gem_installers_mutex.synchronize { @gem_installers[spec.name] } || begin
path = fetch_gem_if_possible(spec, options[:previous_spec])
raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path

REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" }

installer = Bundler::RubyGemsGemInstaller.at(
path,
security_policy: Bundler.rubygems.security_policies[Bundler.settings["trust-policy"]],
install_dir: rubygems_dir.to_s,
bin_dir: Bundler.system_bindir.to_s,
ignore_dependencies: true,
wrappers: true,
env_shebang: true,
build_args: options[:build_args],
bundler_extension_cache_path: extension_cache_path(spec)
)
@gem_installers_mutex.synchronize { @gem_installers[spec.name] ||= installer }
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

How about adding a comment why we don't use the standard mutex.synchronized {cached ||= build} memoization idiom here?

- ### Problem

  Bundler awaits for the dependencies of a gem to be download and installed
  before it proceeds to downloading and installing the dependency itself.
  This creates a bottleneck at the end of the installation process and
  block a thread unnecessarily.

  ### Details

  The installation strategy in Bundler is to await for "leaf gems" to
  be download/installed before the "root gem" is processed.

  For instance, in this scenario:

  - Gem "foo" has a dependency on "bar" (We can call this the "root gem")
  - Gem "bar" has no dependency (We call cal this the "leaf gems")
  - A Gemfile adds "gem 'foo'"

  In this case, only a single thread will have work to do, that is
  because Bundler will queue the gem "bar" to be downloaded and
  installed, and only when "bar" is finished, then Bundler will queue
  work for "foo".

  For **pure ruby gems**, this strategy is a waste of time because
  during the installation, a gem's code is not evaluated and no
  "require" statement is evaluated.

  For gems with native extensions, this strategy make sense. When the
  `extconf.rb` of a gem is evaluated, it's possible that the extconf
  requires a dependency and therefore Bundler needs to wait for those
  dependencies to be installed before it can execute the extconf.
  A typical example is a native extension that require 'mini_portile2'.

  ### Solution

  From the explanation above, I'd like to split the download from
  the installation of a gem.

  The tricky aspect is that there is no RubyGems API to know whether
  a gem has a native extension. The only way to know is after we
  download the gem and read the `metadata` from the tarball.

  So the solution in this patch is as follow:

  1. We download all gems without doing any checks.
  2. Once the gems are downloaded, we now know whether a gem has a
     native extensions.
  3. If a gem is a pure ruby gems, we install it without waiting.
  4. If a gem has a native extension, we check whether its dependencies
     are installed. If a dependency is not yet installed, we put back
     the gem in the queue. It will be dequeued later on and we'll redo
     this logic.

  ### Performance gain

  The speed gain highly depends on how deep the dependency tree is.
  E.g. bar depends on foo which depends on baz which depends on jane ...
  Previously we'd proceed to installing gems just one by one and only
  a single thread would be used.

  In a freshly generated Rails application, the speed gain is not that
  important. And the reason is because we are having a "tail latency"
  issue. Around the end of the installation process, the remaining
  gems to be installed are the one with native extensions, since we
  had to wait for for their dependencies to be installed. Compiling
  them is the slowest part, and since we are doing it at the end then
  the speed gain is not that noticeable.

  ### Misc

  Another advantage of this change is to be able to more easily
  implement this kind of feature ruby#9138
  to let users solely download gems, without installing them .
- There are failing tests on Ruby 3.2 due to an expecation that
  checks wheter stderr is empty. In this case, stderr outputs a
  warning from Ruby "warning: loading in progress, circular require
  considered harmful", but this is not true.

  This is a bug in Ruby 3.2 that I also encountered in ruby#9100 (comment)
  The problem is that Ruby 3.2 wrongly output this warning when
  multiple threads try to require a file at the same time.
  See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby
  3.2.2.

  I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this
  warning output.
@Edouard-chin Edouard-chin force-pushed the ec-split-download-install branch from d58f1df to 713dd5c Compare March 12, 2026 11:13
@Edouard-chin Edouard-chin merged commit bcc4469 into ruby:master Mar 12, 2026
95 checks passed
@Edouard-chin Edouard-chin deleted the ec-split-download-install branch March 12, 2026 11:46
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Mar 12, 2026
- ### Problem

  In ruby#9381, I explained the
  issue about the "tail latency".

  TL;DR When a gem with a native extensions is downloaded, the sooner
  we starts compiling, the sooner it gets installed.
  If a gem with a native extensions ends up at the end of the queue,
  the longer `bundle install` becomes.

  ### Solution

  I'd like to introduce a simple queue with priority. When a gem is
  downloaded, we check whether that gem has a native extension and
  if its dependencies are installed. If both conditions are met,
  we add the gem in the priority queue to be picked up as quickly
  as possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants