From e5e8c0a50700a397152643f050caa78eda898601 Mon Sep 17 00:00:00 2001 From: Edouard CHIN Date: Tue, 10 Mar 2026 02:13:37 +0100 Subject: [PATCH 1/2] Split the download and install process of a gem: - ### Problem Bundler awaits for the dependencies of a gem to be download and installed before it proceeds to downloading and installing the dependency itself. This creates a bottleneck at the end of the installation process and block a thread unnecessarily. ### Details The installation strategy in Bundler is to await for "leaf gems" to be download/installed before the "root gem" is processed. For instance, in this scenario: - Gem "foo" has a dependency on "bar" (We can call this the "root gem") - Gem "bar" has no dependency (We call cal this the "leaf gems") - A Gemfile adds "gem 'foo'" In this case, only a single thread will have work to do, that is because Bundler will queue the gem "bar" to be downloaded and installed, and only when "bar" is finished, then Bundler will queue work for "foo". For **pure ruby gems**, this strategy is a waste of time because during the installation, a gem's code is not evaluated and no "require" statement is evaluated. For gems with native extensions, this strategy make sense. When the `extconf.rb` of a gem is evaluated, it's possible that the extconf requires a dependency and therefore Bundler needs to wait for those dependencies to be installed before it can execute the extconf. A typical example is a native extension that require 'mini_portile2'. ### Solution From the explanation above, I'd like to split the download from the installation of a gem. The tricky aspect is that there is no RubyGems API to know whether a gem has a native extension. The only way to know is after we download the gem and read the `metadata` from the tarball. So the solution in this patch is as follow: 1. We download all gems without doing any checks. 2. Once the gems are downloaded, we now know whether a gem has a native extensions. 3. If a gem is a pure ruby gems, we install it without waiting. 4. If a gem has a native extension, we check whether its dependencies are installed. If a dependency is not yet installed, we put back the gem in the queue. It will be dequeued later on and we'll redo this logic. ### Performance gain The speed gain highly depends on how deep the dependency tree is. E.g. bar depends on foo which depends on baz which depends on jane ... Previously we'd proceed to installing gems just one by one and only a single thread would be used. In a freshly generated Rails application, the speed gain is not that important. And the reason is because we are having a "tail latency" issue. Around the end of the installation process, the remaining gems to be installed are the one with native extensions, since we had to wait for for their dependencies to be installed. Compiling them is the slowest part, and since we are doing it at the end then the speed gain is not that noticeable. ### Misc Another advantage of this change is to be able to more easily implement this kind of feature https://github.com/ruby/rubygems/issues/9138 to let users solely download gems, without installing them . --- .../lib/bundler/installer/gem_installer.rb | 14 ++++ .../bundler/installer/parallel_installer.rb | 76 ++++++++++++++----- bundler/lib/bundler/plugin/api/source.rb | 8 ++ bundler/lib/bundler/plugin/installer.rb | 3 +- bundler/lib/bundler/self_manager.rb | 1 + bundler/lib/bundler/source.rb | 2 + bundler/lib/bundler/source/rubygems.rb | 73 +++++++++++------- 7 files changed, 133 insertions(+), 44 deletions(-) diff --git a/bundler/lib/bundler/installer/gem_installer.rb b/bundler/lib/bundler/installer/gem_installer.rb index 5c4fa7825325..f3b43c31ee09 100644 --- a/bundler/lib/bundler/installer/gem_installer.rb +++ b/bundler/lib/bundler/installer/gem_installer.rb @@ -25,6 +25,20 @@ def install_from_spec [false, specific_failure_message(e)] end + def download + spec.source.download( + spec, + force: force, + local: local, + build_args: Array(spec_settings), + previous_spec: previous_spec, + ) + + [true, nil] + rescue Bundler::BundlerError => e + [false, specific_failure_message(e)] + end + private def specific_failure_message(e) diff --git a/bundler/lib/bundler/installer/parallel_installer.rb b/bundler/lib/bundler/installer/parallel_installer.rb index d10e5ec92403..79489a6a6c95 100644 --- a/bundler/lib/bundler/installer/parallel_installer.rb +++ b/bundler/lib/bundler/installer/parallel_installer.rb @@ -32,6 +32,12 @@ def ready_to_enqueue? state == :none end + def ready_to_install?(installed_specs) + return false unless state == :downloaded + + spec.extensions.none? || dependencies_installed?(installed_specs) + end + def has_post_install_message? !post_install_message.empty? end @@ -84,6 +90,7 @@ def initialize(installer, all_specs, size, standalone, force, local: false, skip def call if @rake + do_download(@rake, 0) do_install(@rake, 0) Gem::Specification.reset end @@ -107,26 +114,54 @@ def failed_specs end def install_with_worker - enqueue_specs - process_specs until finished_installing? + installed_specs = {} + enqueue_specs(installed_specs) + + process_specs(installed_specs) until finished_installing? end def install_serially until finished_installing? raise "failed to find a spec to enqueue while installing serially" unless spec_install = @specs.find(&:ready_to_enqueue?) spec_install.state = :enqueued + do_download(spec_install, 0) do_install(spec_install, 0) end end def worker_pool @worker_pool ||= Bundler::Worker.new @size, "Parallel Installer", lambda {|spec_install, worker_num| - do_install(spec_install, worker_num) + case spec_install.state + when :enqueued + do_download(spec_install, worker_num) + when :installable + do_install(spec_install, worker_num) + else + spec_install + end } end - def do_install(spec_install, worker_num) + def do_download(spec_install, worker_num) Plugin.hook(Plugin::Events::GEM_BEFORE_INSTALL, spec_install) + + gem_installer = Bundler::GemInstaller.new( + spec_install.spec, @installer, @standalone, worker_num, @force, @local + ) + + success, message = gem_installer.download + + if success + spec_install.state = :downloaded + else + spec_install.error = "#{message}\n\n#{require_tree_for_spec(spec_install.spec)}" + spec_install.state = :failed + end + + spec_install + end + + def do_install(spec_install, worker_num) gem_installer = Bundler::GemInstaller.new( spec_install.spec, @installer, @standalone, worker_num, @force, @local ) @@ -147,9 +182,19 @@ def do_install(spec_install, worker_num) # Some specs might've had to wait til this spec was installed to be # processed so the call to `enqueue_specs` is important after every # dequeue. - def process_specs - worker_pool.deq - enqueue_specs + def process_specs(installed_specs) + spec = worker_pool.deq + + if spec.installed? + installed_specs[spec.name] = true + return + elsif spec.failed? + return + elsif spec.ready_to_install?(installed_specs) + spec.state = :installable + end + + worker_pool.enq(spec) end def finished_installing? @@ -185,18 +230,15 @@ def require_tree_for_spec(spec) # Later we call this lambda again to install specs that depended on # previously installed specifications. We continue until all specs # are installed. - def enqueue_specs - installed_specs = {} - @specs.each do |spec| - next unless spec.installed? - installed_specs[spec.name] = true - end - + def enqueue_specs(installed_specs) @specs.each do |spec| - if spec.ready_to_enqueue? && spec.dependencies_installed?(installed_specs) - spec.state = :enqueued - worker_pool.enq spec + if spec.installed? + installed_specs[spec.name] = true + next end + + spec.state = :enqueued + worker_pool.enq spec end end end diff --git a/bundler/lib/bundler/plugin/api/source.rb b/bundler/lib/bundler/plugin/api/source.rb index 6c888d037350..798326673ad7 100644 --- a/bundler/lib/bundler/plugin/api/source.rb +++ b/bundler/lib/bundler/plugin/api/source.rb @@ -74,6 +74,14 @@ def options_to_lock {} end + # Download the gem specified by the spec at appropriate path. + # + # A source plugin can implement this method to split the download and the + # installation of a gem. + # + # @return [Boolean] Whether the download of the gem succeeded. + def download(spec, opts); end + # Install the gem specified by the spec at appropriate path. # `install_path` provides a sufficient default, if the source can only # satisfy one gem, but is not binding. diff --git a/bundler/lib/bundler/plugin/installer.rb b/bundler/lib/bundler/plugin/installer.rb index 853ad9edca4f..9be8b36843bb 100644 --- a/bundler/lib/bundler/plugin/installer.rb +++ b/bundler/lib/bundler/plugin/installer.rb @@ -110,7 +110,8 @@ def install_from_specs(specs) paths = {} specs.each do |spec| - spec.source.install spec + spec.source.download(spec) + spec.source.install(spec) paths[spec.name] = spec end diff --git a/bundler/lib/bundler/self_manager.rb b/bundler/lib/bundler/self_manager.rb index 1db77fd46b3f..82efbf56a4fb 100644 --- a/bundler/lib/bundler/self_manager.rb +++ b/bundler/lib/bundler/self_manager.rb @@ -63,6 +63,7 @@ def install_and_restart_with(version) end def install(spec) + spec.source.download(spec) spec.source.install(spec) end diff --git a/bundler/lib/bundler/source.rb b/bundler/lib/bundler/source.rb index 2b90a0eff1bf..cf71be880125 100644 --- a/bundler/lib/bundler/source.rb +++ b/bundler/lib/bundler/source.rb @@ -31,6 +31,8 @@ def version_message(spec, locked_spec = nil) message end + def download(*); end + def can_lock?(spec) spec.source == self end diff --git a/bundler/lib/bundler/source/rubygems.rb b/bundler/lib/bundler/source/rubygems.rb index e679010e97dc..d1882f6768cf 100644 --- a/bundler/lib/bundler/source/rubygems.rb +++ b/bundler/lib/bundler/source/rubygems.rb @@ -21,6 +21,8 @@ def initialize(options = {}) @allow_local = options["allow_local"] || false @prefer_local = false @checksum_store = Checksum::Store.new + @gem_installers = {} + @gem_installers_mutex = Mutex.new Array(options["remotes"]).reverse_each {|r| add_remote(r) } @@ -162,49 +164,40 @@ def specs end end - def install(spec, options = {}) + def download(spec, options = {}) if (spec.default_gem? && !cached_built_in_gem(spec, local: options[:local])) || (installed?(spec) && !options[:force]) - print_using_message "Using #{version_message(spec, options[:previous_spec])}" - return nil # no post-install message + return true end - path = fetch_gem_if_possible(spec, options[:previous_spec]) - raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path - - return if Bundler.settings[:no_install] - - install_path = rubygems_dir - bin_path = Bundler.system_bindir - - require_relative "../rubygems_gem_installer" - - installer = Bundler::RubyGemsGemInstaller.at( - path, - security_policy: Bundler.rubygems.security_policies[Bundler.settings["trust-policy"]], - install_dir: install_path.to_s, - bin_dir: bin_path.to_s, - ignore_dependencies: true, - wrappers: true, - env_shebang: true, - build_args: options[:build_args], - bundler_extension_cache_path: extension_cache_path(spec) - ) + installer = rubygems_gem_installer(spec, options) if spec.remote s = begin installer.spec rescue Gem::Package::FormatError - Bundler.rm_rf(path) + Bundler.rm_rf(installer.gem) raise rescue Gem::Security::Exception => e raise SecurityError, - "The gem #{File.basename(path, ".gem")} can't be installed because " \ + "The gem #{installer.gem} can't be installed because " \ "the security policy didn't allow it, with the message: #{e.message}" end spec.__swap__(s) end + spec + end + + def install(spec, options = {}) + if (spec.default_gem? && !cached_built_in_gem(spec, local: options[:local])) || (installed?(spec) && !options[:force]) + print_using_message "Using #{version_message(spec, options[:previous_spec])}" + return nil # no post-install message + end + + return if Bundler.settings[:no_install] + + installer = rubygems_gem_installer(spec, options) spec.source.checksum_store.register(spec, installer.gem_checksum) message = "Installing #{version_message(spec, options[:previous_spec])}" @@ -516,6 +509,34 @@ def extension_cache_slug(spec) return unless remote = spec.remote remote.cache_slug end + + # We are using a mutex to reaed and write from/to the hash. + # The reason this double synchronization was added is for performance + # and lock the mutex for the shortest possible amount of time. Otherwise, + # all threads are fighting over this mutex and when it gets acquired it gets locked + # until a threads finishes downloading a gem, leaving the other threads waiting + # doing nothing. + def rubygems_gem_installer(spec, options) + @gem_installers_mutex.synchronize { @gem_installers[spec.name] } || begin + path = fetch_gem_if_possible(spec, options[:previous_spec]) + raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path + + require_relative "../rubygems_gem_installer" + + installer = Bundler::RubyGemsGemInstaller.at( + path, + security_policy: Bundler.rubygems.security_policies[Bundler.settings["trust-policy"]], + install_dir: rubygems_dir.to_s, + bin_dir: Bundler.system_bindir.to_s, + ignore_dependencies: true, + wrappers: true, + env_shebang: true, + build_args: options[:build_args], + bundler_extension_cache_path: extension_cache_path(spec) + ) + @gem_installers_mutex.synchronize { @gem_installers[spec.name] ||= installer } + end + end end end end From 713dd5c0e1cd165020f774933de43d842890b8d0 Mon Sep 17 00:00:00 2001 From: Edouard CHIN Date: Tue, 10 Mar 2026 22:19:40 +0100 Subject: [PATCH 2/2] Add a Mutex to prevent a bug on Ruby 3.2: - There are failing tests on Ruby 3.2 due to an expecation that checks wheter stderr is empty. In this case, stderr outputs a warning from Ruby "warning: loading in progress, circular require considered harmful", but this is not true. This is a bug in Ruby 3.2 that I also encountered in https://github.com/ruby/rubygems/pull/9100#issuecomment-3555671184 The problem is that Ruby 3.2 wrongly output this warning when multiple threads try to require a file at the same time. See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby 3.2.2. I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this warning output. --- bundler/lib/bundler/source/rubygems.rb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/bundler/lib/bundler/source/rubygems.rb b/bundler/lib/bundler/source/rubygems.rb index d1882f6768cf..877343b5fb4d 100644 --- a/bundler/lib/bundler/source/rubygems.rb +++ b/bundler/lib/bundler/source/rubygems.rb @@ -9,6 +9,7 @@ class Rubygems < Source # Ask for X gems per API request API_REQUEST_SIZE = 100 + REQUIRE_MUTEX = Mutex.new attr_accessor :remotes @@ -521,7 +522,7 @@ def rubygems_gem_installer(spec, options) path = fetch_gem_if_possible(spec, options[:previous_spec]) raise GemNotFound, "Could not find #{spec.file_name} for installation" unless path - require_relative "../rubygems_gem_installer" + REQUIRE_MUTEX.synchronize { require_relative "../rubygems_gem_installer" } installer = Bundler::RubyGemsGemInstaller.at( path,