You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running Nix with a chroot store, ie. using Linux user namespaces, I occasionally get deadlocks just as the build is about to start.
I was able to capture stack traces by attaching gdb to the stuck process.
There are two nix processes, one that I had originally run and one fork of it.
The parent process is stuck waiting for the child:
#0 0x00000000015ff945 in __cp_end ()
#1 0x00000000015fcd39 in __syscall_cp_c ()
#2 0x00000000015f2a32 in waitpid ()
#3 0x0000000000895423 in nix::Pid::wait() ()
#4 0x0000000000863bbd in nix::userNamespacesSupported()::{lambda()#1}::operator()() const [clone .isra.0] ()
#5 0x0000000000863dcd in nix::userNamespacesSupported() ()
#6 0x0000000000863ea8 in nix::mountAndPidNamespacesSupported() ()
#7 0x0000000000f5c9eb in nix::LocalDerivationGoal::tryLocalBuild(nix::LocalDerivationGoal::tryLocalBuild()::_ZN3nix19LocalDerivationGoal13tryLocalBuildEv.Frame*) [clone .actor] ()
#8 0x0000000000e14f08 in nix::Goal::work() ()
#9 0x0000000000e24254 in nix::Worker::run(std::set<std::shared_ptr<nix::Goal>, nix::CompareGoalPtrs, std::allocator<std::shared_ptr<nix::Goal> > > const&) ()
#10 0x0000000000e0f52b in nix::Store::buildPathsWithResults(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, nix::BuildMode, std::shared_ptr<nix::Store>) ()
#11 0x00000000014ea379 in nix::Installable::build2(nix::ref<nix::Store>, nix::ref<nix::Store>, nix::Realise, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > > const&, nix::BuildMode) ()
#12 0x00000000014ec260 in nix::Installable::build(nix::ref<nix::Store>, nix::ref<nix::Store>, nix::Realise, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > > const&, nix::BuildMode) ()
#13 0x0000000000584337 in CmdBuild::run(nix::ref<nix::Store>, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > >&&) ()
#14 0x00000000014e80b2 in nix::InstallablesCommand::run(nix::ref<nix::Store>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&&) ()
#15 0x00000000014dbcc2 in nix::RawInstallablesCommand::run(nix::ref<nix::Store>) ()
#16 0x00000000014bdff7 in nix::StoreCommand::run() ()
#17 0x000000000062fb67 in nix::mainWrapped(int, char**) ()
#18 0x000000000147b5bc in nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) ()
#19 0x00000000004cf5d7 in main ()
And the child is stuck acquiring locks for memory allocation:
#0 0x00000000015fc86b in __lock ()
#1 0x00000000015e7472 in __libc_malloc_impl ()
#2 0x000000000152526c in operator new(unsigned long) ()
#3 0x0000000000864f34 in nix::makeSimpleLogger(bool) ()
#4 0x0000000000895d2a in std::_Function_handler<void (), nix::startProcess(std::function<void ()>, nix::ProcessOptions const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#5 0x0000000000893d2e in nix::childEntry(void*) ()
#6 0x00000000015ff912 in __clone ()
#7 0x0000000028744f18 in ?? ()
#8 0x0086e00000000000 in ?? ()
#9 0x0000000000000000 in ?? ()
My guess is that this is a classic case of thread A acquires a lock (here from malloc), thread B forks, thread A releases the lock. The child process tries to acquire the lock however in this process thread A does not exist, so the lock is never released.
Metadata
$ nix --version
nix (Nix) 2.24.10
I'm running the static build of the nix command, since my host does not have a Nix store.
The version I used to capture those stack traces is a little outdated, but the relevant code (libutil/unix/processes.cc and libutil/linux/namespaces.cc) has barely changed since that version.
I've had a look through the Git log and open issues and did not find any mention of this.
That being said, I was able to find the .drv for this particular build of my nix binary, and it was built with musl 1.2.5, so it should patch in it, so I am not sure what might be causing this.
Describe the bug
When running Nix with a chroot store, ie. using Linux user namespaces, I occasionally get deadlocks just as the build is about to start.
I was able to capture stack traces by attaching gdb to the stuck process.
There are two nix processes, one that I had originally run and one fork of it.
The parent process is stuck waiting for the child:
And the child is stuck acquiring locks for memory allocation:
My guess is that this is a classic case of thread A acquires a lock (here from malloc), thread B forks, thread A releases the lock. The child process tries to acquire the lock however in this process thread A does not exist, so the lock is never released.
Metadata
I'm running the static build of the nix command, since my host does not have a Nix store.
The version I used to capture those stack traces is a little outdated, but the relevant code (
libutil/unix/processes.cc
andlibutil/linux/namespaces.cc
) has barely changed since that version.I've had a look through the Git log and open issues and did not find any mention of this.
Add 👍 to issues you find important.
The text was updated successfully, but these errors were encountered: