Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when using user namespace #12514

Open
plietar opened this issue Feb 18, 2025 · 1 comment
Open

Deadlock when using user namespace #12514

plietar opened this issue Feb 18, 2025 · 1 comment
Labels
bug derivation-build The process of building an individual derivation (see also sandbox label)

Comments

@plietar
Copy link

plietar commented Feb 18, 2025

Describe the bug

When running Nix with a chroot store, ie. using Linux user namespaces, I occasionally get deadlocks just as the build is about to start.

I was able to capture stack traces by attaching gdb to the stuck process.
There are two nix processes, one that I had originally run and one fork of it.

The parent process is stuck waiting for the child:

#0  0x00000000015ff945 in __cp_end ()
#1  0x00000000015fcd39 in __syscall_cp_c ()
#2  0x00000000015f2a32 in waitpid ()
#3  0x0000000000895423 in nix::Pid::wait() ()
#4  0x0000000000863bbd in nix::userNamespacesSupported()::{lambda()#1}::operator()() const [clone .isra.0] ()
#5  0x0000000000863dcd in nix::userNamespacesSupported() ()
#6  0x0000000000863ea8 in nix::mountAndPidNamespacesSupported() ()
#7  0x0000000000f5c9eb in nix::LocalDerivationGoal::tryLocalBuild(nix::LocalDerivationGoal::tryLocalBuild()::_ZN3nix19LocalDerivationGoal13tryLocalBuildEv.Frame*) [clone .actor] ()
#8  0x0000000000e14f08 in nix::Goal::work() ()
#9  0x0000000000e24254 in nix::Worker::run(std::set<std::shared_ptr<nix::Goal>, nix::CompareGoalPtrs, std::allocator<std::shared_ptr<nix::Goal> > > const&) ()
#10 0x0000000000e0f52b in nix::Store::buildPathsWithResults(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, nix::BuildMode, std::shared_ptr<nix::Store>) ()
#11 0x00000000014ea379 in nix::Installable::build2(nix::ref<nix::Store>, nix::ref<nix::Store>, nix::Realise, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > > const&, nix::BuildMode) ()
#12 0x00000000014ec260 in nix::Installable::build(nix::ref<nix::Store>, nix::ref<nix::Store>, nix::Realise, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > > const&, nix::BuildMode) ()
#13 0x0000000000584337 in CmdBuild::run(nix::ref<nix::Store>, std::vector<nix::ref<nix::Installable>, std::allocator<nix::ref<nix::Installable> > >&&) ()
#14 0x00000000014e80b2 in nix::InstallablesCommand::run(nix::ref<nix::Store>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&&) ()
#15 0x00000000014dbcc2 in nix::RawInstallablesCommand::run(nix::ref<nix::Store>) ()
#16 0x00000000014bdff7 in nix::StoreCommand::run() ()
#17 0x000000000062fb67 in nix::mainWrapped(int, char**) ()
#18 0x000000000147b5bc in nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) ()
#19 0x00000000004cf5d7 in main ()

And the child is stuck acquiring locks for memory allocation:

#0  0x00000000015fc86b in __lock ()
#1  0x00000000015e7472 in __libc_malloc_impl ()
#2  0x000000000152526c in operator new(unsigned long) ()
#3  0x0000000000864f34 in nix::makeSimpleLogger(bool) ()
#4  0x0000000000895d2a in std::_Function_handler<void (), nix::startProcess(std::function<void ()>, nix::ProcessOptions const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#5  0x0000000000893d2e in nix::childEntry(void*) ()
#6  0x00000000015ff912 in __clone ()
#7  0x0000000028744f18 in ?? ()
#8  0x0086e00000000000 in ?? ()
#9  0x0000000000000000 in ?? ()

My guess is that this is a classic case of thread A acquires a lock (here from malloc), thread B forks, thread A releases the lock. The child process tries to acquire the lock however in this process thread A does not exist, so the lock is never released.

Metadata

$ nix --version
nix (Nix) 2.24.10

I'm running the static build of the nix command, since my host does not have a Nix store.

The version I used to capture those stack traces is a little outdated, but the relevant code (libutil/unix/processes.cc and libutil/linux/namespaces.cc) has barely changed since that version.

I've had a look through the Git log and open issues and did not find any mention of this.


Add 👍 to issues you find important.

@plietar plietar added the bug label Feb 18, 2025
@plietar
Copy link
Author

plietar commented Feb 18, 2025

There's a good chance this behaviour might be specific to the musl build of the nix binary.

It would seem like glibc tries to prevent this kind of issue by acquiring the malloc lock just before forking (ensuring no other thread holds onto it) and releasing it in both the parent and child processes: https://github.com/bminor/glibc/blob/0242c9f9e606ade838651dadea13c251e3cc4ac2/malloc/arena.c#L159-L165

This thread about musl appears to be relevant https://www.openwall.com/lists/musl/2020/08/14/3, although it is not clear that there is any resolution about it. The thread led to a commit that landed in musl 1.2.2, supposed to resolve that issue: https://git.musl-libc.org/cgit/musl/commit/?id=167390f05564e0a4d3fcb4329377fd7743267560.

That being said, I was able to find the .drv for this particular build of my nix binary, and it was built with musl 1.2.5, so it should patch in it, so I am not sure what might be causing this.

@roberth roberth added this to Nix team Feb 20, 2025
@github-project-automation github-project-automation bot moved this to To triage in Nix team Feb 20, 2025
@roberth roberth added the derivation-build The process of building an individual derivation (see also sandbox label) label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug derivation-build The process of building an individual derivation (see also sandbox label)
Projects
Status: To triage
Development

No branches or pull requests

2 participants