Does the linker always resolve from the first library? #1314

fefe17 · 2024-07-22T15:16:30Z

fefe17
Jul 22, 2024

I have a C library and a libpthread in it, and they will export partially the same symbols. In particular, there is an internal initialization function for FILE* structs for fopen. The pthread variant will also initialize a mutex.

Building a test program that calls fopen without pthread will pull fdglue2.o from libc.a, which exports the function.
Building the same test program with -pthread tells the linker to link in libpthread.a first and then libc.a.

With GNU ld, the internal function is pulled from libc.a even though libpthread.a exports it and is named first on the command line.

I developed this code many years ago when there was no mold yet. I'm pretty sure I tested this and it worked with GNU ld, which was and still is my default system linker. Using -fuse-ld=lld or -fuse-ld=mold will pull the symbol from libpthread.a as expected (by me at least).

Now, is this a bug in GNU ld? Or have I been relying on undefined behavior for over a decade and now it bites me in the ass?

The .o file that pulls in the function is in libc.a. Maybe GNU ld tries to satisfy the reference from the same library first? That sounds vaguely plausible but breaks my assumptions. I opened a ticket with GNU binutils but I wonder whether there is a specification that actually describes how a linker is supposed to resolve symbols if there is ambiguity.

rui314 · 2024-07-22T20:55:39Z

rui314
Jul 22, 2024
Maintainer

A section in our manual page may answer to your question. Quote from https://github.com/rui314/mold/blob/main/docs/mold.md#archive-symbol-resolution

Archive symbol resolution

Traditionally, Unix linkers are sensitive to the order in which input files appear on the command line. They process input files from the first (leftmost) file to the last (rightmost) file one-by-one. While reading input files, they maintain sets of defined and undefined symbols. When visiting an archive file (.a files), they pull out object files to resolve as many undefined symbols as possible and move on to the next input file. Object files that weren't pulled out will never have a chance for a second look.

Due to this behavior, you usually have to add archive files at the end of a command line, so that when a linker reaches archive files, it knows what symbols remain as undefined.

If you put archive files at the beginning of a command line, a linker doesn't have any undefined symbols, and thus no object files will be pulled out from archives. You can change the processing order by using the --start-group and --end-group options, though they make a linker slower.

mold, as well as the LLVM lld(1) linker, takes a different approach. They remember which symbols can be resolved from archive files instead of forgetting them after processing each archive. Therefore, mold and lld(1) can "go back" in a command line to pull out object files from archives if they are needed to resolve remaining undefined symbols. They are not sensitive to the input file order.

--start-group and --end-group are still accepted by mold and lld(1) for compatibility with traditional linkers, but they are silently ignored.

2 replies

fefe17 Jul 23, 2024
Author

Let me be more clear about what my problem is.

Scenario 1: Main program calls fputc(), which is defined both in libc.a and libpthread.a. The libpthread.a one does locking, so it needs to get priority over the one in libc. Solution: gcc -o main main.c -lc -lpthread.

Scenario 2: Main program calls fopen(), which is defined in libc.a, but calls __init_FILE() which is defined in both libc.a and libpthread.a. The libpthread.a one mallocs additional space for the mutex and initiatlizes it. We need the libpthread definition to always overrule the libc one. Solution: ?

I wrote a small test program to demonstrate the problem. You might be interested in looking at https://sourceware.org/bugzilla/show_bug.cgi?id=32006 for details. The resolution at least with GNU ld depends on whether the symbol is referenced from main() or from a file in one of the libraries. That doesn't work for me. I need the locking version from libpthread to always win, no matter who called it.

I bring this up here because the fine folks from GNU binutils told me to put the library last that I want to win, and that clearly does not work with lld or mold. But it also does not work with GNU ld as my example shows.

Is there a spec for how a linker is supposed to behave or is everybody just trying to emulate apocryphal linker versions from back in the day?

rui314 Jul 27, 2024
Maintainer

In mold, a library that appears at the beginning of the command line takes precedence over the later ones. So you want to try gcc -o main main.c -lpthread -lc instead of gcc -o main main.c -lc -lpthread. Note that -lc is included by the compiler by default, so I'm not sure if that would work. You want to run gcc -o main main.c -lpthread -lc -### to see if the command line order is as you expected.

There's no spec or something like that about which library file takes precedence over the other. If two or more libraries define the same symbol with different implementations, arguably it could be considered as a bug in your program, because it's an One Definition Rule violation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the linker always resolve from the first library? #1314

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Does the linker always resolve from the first library? #1314

fefe17 Jul 22, 2024

Replies: 1 comment · 2 replies

rui314 Jul 22, 2024 Maintainer

Archive symbol resolution

fefe17 Jul 23, 2024 Author

rui314 Jul 27, 2024 Maintainer

fefe17
Jul 22, 2024

Replies: 1 comment 2 replies

rui314
Jul 22, 2024
Maintainer

fefe17 Jul 23, 2024
Author

rui314 Jul 27, 2024
Maintainer