Description
Describe the bug
When running Nix with a chroot store, ie. using Linux user namespaces, I occasionally get deadlocks just as the build is about to start.
I was able to capture stack traces by attaching gdb to the stuck process.
There are two nix processes, one that I had originally run and one fork of it.
The parent process is stuck waiting for the child:
#0 0x00000000015ff945 in __cp_end ()
#1 0x00000000015fcd39 in __syscall_cp_c ()
#2 0x00000000015f2a32 in waitpid ()
#3 0x0000000000895423 in nix::Pid::wait() ()
#4 0x0000000000863bbd in nix::userNamespacesSupported()::{lambda()#1}::operator()() const [clone .isra.0] ()
#5 0x0000000000863dcd in nix::userNamespacesSupported() ()
#6 0x0000000000863ea8 in nix::mountAndPidNamespacesSupported() ()
#7 0x0000000000f5c9eb in nix::LocalDerivationGoal::tryLocalBuild(nix::LocalDerivationGoal::tryLocalBuild()::_ZN3nix19LocalDerivationGoal13tryLocalBuildEv.Frame*) [clone .actor] ()
#8 0x0000000000e14f08 in nix::Goal::work() ()
#9 0x0000000000e24254 in nix::Worker::run(std::set<std::shared_ptr<nix::Goal>, nix::CompareGoalPtrs, std::allocator<std::shared_ptr<nix::Goal> > > const&) ()
#10 0x0000000000e0f52b in nix::Store::buildPathsWithResults(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, nix::BuildMode, std::shared_ptr<nix::Store>) ()
#11 0x00000000014ea379 in nix::Installable::build2(nix::ref<nix::Store>, nix::ref<nix::Store>, nix::Realise, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > > const&, nix::BuildMode) ()
#12 0x00000000014ec260 in nix::Installable::build(nix::ref<nix::Store>, nix::ref<nix::Store>, nix::Realise, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > > const&, nix::BuildMode) ()
#13 0x0000000000584337 in CmdBuild::run(nix::ref<nix::Store>, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > >&&) ()
#14 0x00000000014e80b2 in nix::InstallablesCommand::run(nix::ref<nix::Store>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&&) ()
#15 0x00000000014dbcc2 in nix::RawInstallablesCommand::run(nix::ref<nix::Store>) ()
#16 0x00000000014bdff7 in nix::StoreCommand::run() ()
#17 0x000000000062fb67 in nix::mainWrapped(int, char**) ()
#18 0x000000000147b5bc in nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) ()
#19 0x00000000004cf5d7 in main ()
And the child is stuck acquiring locks for memory allocation:
#0 0x00000000015fc86b in __lock ()
#1 0x00000000015e7472 in __libc_malloc_impl ()
#2 0x000000000152526c in operator new(unsigned long) ()
#3 0x0000000000864f34 in nix::makeSimpleLogger(bool) ()
#4 0x0000000000895d2a in std::_Function_handler<void (), nix::startProcess(std::function<void ()>, nix::ProcessOptions const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#5 0x0000000000893d2e in nix::childEntry(void*) ()
#6 0x00000000015ff912 in __clone ()
#7 0x0000000028744f18 in ?? ()
#8 0x0086e00000000000 in ?? ()
#9 0x0000000000000000 in ?? ()
My guess is that this is a classic case of thread A acquires a lock (here from malloc), thread B forks, thread A releases the lock. The child process tries to acquire the lock however in this process thread A does not exist, so the lock is never released.
Metadata
$ nix --version
nix (Nix) 2.24.10
I'm running the static build of the nix command, since my host does not have a Nix store.
The version I used to capture those stack traces is a little outdated, but the relevant code (libutil/unix/processes.cc
and libutil/linux/namespaces.cc
) has barely changed since that version.
I've had a look through the Git log and open issues and did not find any mention of this.
Add 👍 to issues you find important.
Metadata
Metadata
Assignees
Type
Projects
Status