Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary linked with mold has a higher VmHWM value at runtime #1357

Open
aallrd opened this issue Oct 10, 2024 · 6 comments
Open

Binary linked with mold has a higher VmHWM value at runtime #1357

aallrd opened this issue Oct 10, 2024 · 6 comments

Comments

@aallrd
Copy link

aallrd commented Oct 10, 2024

Hello,
I am assessing mold to replace gold for linking a C++ binary.
While running benchmarks, I have noticed that the VmHWM value from cat /proc/<PID>/status is higher (2,5%) at startup with the mold produced binary than with the gold produced one.
I am trying to understand what could be the reason for this increase, do you have an idea?

@rui314
Copy link
Owner

rui314 commented Oct 10, 2024

Can you paste the output of readelf -WSl <gold-linked-binary> <mold-linked-binary>?

@aallrd
Copy link
Author

aallrd commented Oct 11, 2024

The binaries are built on Red Hat Enterprise Linux 8.6 using GCC 11.2 and gold 2.42 / mold 2.34.0.
The runtime system is Red Hat Enterprise Linux 9.2.

$ readelf -p .comment binary.mold

String dump of section '.comment':
  [     0]  GCC: (GNU) 8.5.0 20210514 (Red Hat 8.5.0-22)
  [    2e]  GCC: (GNU) 11.2.1 20220127 (Red Hat 11.2.1-9)
  [    5c]  mold 2.34.0 (ed7cc1b85aed2ba14fdcf868228e2f704d45ae6c; compatible with GNU ld)

$ readelf -p .comment binary.gold

String dump of section '.comment':
  [     1]  GCC: (GNU) 8.5.0 20210514 (Red Hat 8.5.0-22)
  [    2e]  GCC: (GNU) 11.2.1 20220127 (Red Hat 11.2.1-9)

$ readelf -WSl binary.gold binary.mold

File: binary.gold
There are 42 section headers, starting at offset 0x1d7555f8:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        0000000000400270 000270 00001c 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            000000000040028c 00028c 000020 00   A  0   0  4
  [ 3] .note.gnu.build-id NOTE            00000000004002ac 0002ac 000024 00   A  0   0  4
  [ 4] .dynsym           DYNSYM          00000000004002d0 0002d0 248a00 18   A  5   1  8
  [ 5] .dynstr           STRTAB          0000000000648cd0 248cd0 8e630e 00   A  0   0  1
  [ 6] .gnu.hash         GNU_HASH        0000000000f2efe0 b2efe0 04eb6c 00   A  4   0  8
  [ 7] .gnu.version      VERSYM          0000000000f7db4c b7db4c 030b80 02   A  4   0  2
  [ 8] .gnu.version_r    VERNEED         0000000000fae6cc bae6cc 0001f0 00   A  5   6  4
  [ 9] .rela.dyn         RELA            0000000000fae8c0 bae8c0 122fd0 18   A  4   0  8
  [10] .rela.plt         RELA            00000000010d1890 cd1890 150e28 18  AI  4  12  8
  [11] .init             PROGBITS        00000000012226b8 e226b8 00001b 00  AX  0   0  4
  [12] .plt              PROGBITS        00000000012226e0 e226e0 0e0980 10  AX  0   0 16
  [13] .text             PROGBITS        0000000001303060 f03060 f97b855 00  AX  0   0 16
  [14] .fini             PROGBITS        0000000010c7e8b8 1087e8b8 00000d 00  AX  0   0  4
  [15] .rodata           PROGBITS        0000000010c7e8e0 1087e8e0 1176561 00   A  0   0 32
  [16] .gcc_except_table PROGBITS        0000000011df4e44 119f4e44 b76236 00   A  0   0  4
  [17] .eh_frame         X86_64_UNWIND   000000001296b080 1256b080 1f90abc 00   A  0   0  8
  [18] .eh_frame_hdr     X86_64_UNWIND   00000000148fbb3c 144fbb3c 68045c 00   A  0   0  4
  [19] .tbss             NOBITS          0000000014f7d920 14b7c920 000008 00 WAT  0   0  8
  [20] .data.rel.ro.local PROGBITS        0000000014f7d920 14b7c920 04edc8 00  WA  0   0 32
  [21] .fini_array       FINI_ARRAY      0000000014fcc6e8 14bcb6e8 000008 08  WA  0   0  8
  [22] .init_array       INIT_ARRAY      0000000014fcc6f0 14bcb6f0 030ae8 08  WA  0   0  8
  [23] .data.rel.ro      PROGBITS        0000000014ffd1e0 14bfc1e0 443970 00  WA  0   0 32
  [24] .dynamic          DYNAMIC         0000000015440b50 1503fb50 0025e0 10  WA  5   0  8
  [25] .got              PROGBITS        0000000015443130 15042130 00deb8 00  WA  0   0  8
  [26] .got.plt          PROGBITS        0000000015450fe8 1504ffe8 0704d0 00  WA  0   0  8
  [27] .data             PROGBITS        00000000154c14c0 150c04c0 1c1048 00  WA  0   0 32
  [28] .tm_clone_table   PROGBITS        0000000015682508 15281508 000000 00  WA  0   0  8
  [29] .bss              NOBITS          0000000015682520 15281508 3e4d88 00  WA  0   0 32
  [30] .comment          PROGBITS        0000000000000000 15281508 00005c 01  MS  0   0  1
  [31] .gnu.build.attributes NOTE            0000000000000000 15281564 001df4 00      0   0  4
  [32] .debug_info       PROGBITS        0000000000000000 1d756078 1185585a 00   C  0   0  1
  [33] .debug_abbrev     PROGBITS        0000000000000000 2efab8d2 403cd1 00   C  0   0  1
  [34] .debug_ranges     PROGBITS        0000000000000000 2f3af5a3 158d235 00   C  0   0  1
  [35] .debug_line       PROGBITS        0000000000000000 3093c7d8 4891ef5 00   C  0   0  1
  [36] .debug_str        PROGBITS        0000000000000000 351ce6cd 4e94a2e 01 MSC  0   0  1
  [37] .gdb_index        PROGBITS        0000000000000000 3a0630fc 103cf8ab 00      0   0  4
  [38] .note.gnu.gold-version NOTE            0000000000000000 15283358 00001c 00      0   0  4
  [39] .symtab           SYMTAB          0000000000000000 15283378 1fbb550 18     40 679560  8
  [40] .strtab           STRTAB          0000000000000000 1723e8c8 6516d2d 00      0   0  1
  [41] .shstrtab         STRTAB          0000000000000000 4a4329a7 0001c2 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

Elf file type is EXEC (Executable file)
Entry point 0x13054b0
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x000230 0x000230 R   0x8
  INTERP         0x000270 0x0000000000400270 0x0000000000400270 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x14b7bf98 0x14b7bf98 R E 0x1000
  LOAD           0x14b7c920 0x0000000014f7d920 0x0000000014f7d920 0x704be8 0xae9988 RW  0x1000
  DYNAMIC        0x1503fb50 0x0000000015440b50 0x0000000015440b50 0x0025e0 0x0025e0 RW  0x8
  NOTE           0x00028c 0x000000000040028c 0x000000000040028c 0x000044 0x000044 R   0x4
  GNU_EH_FRAME   0x144fbb3c 0x00000000148fbb3c 0x00000000148fbb3c 0x68045c 0x68045c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  TLS            0x14b7c920 0x0000000014f7d920 0x0000000014f7d920 0x000000 0x000008 R   0x8
  GNU_RELRO      0x14b7c920 0x0000000014f7d920 0x0000000014f7d920 0x4d36e0 0x4d36e0 RW  0x20

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .dynsym .dynstr .gnu.hash .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .gcc_except_table .eh_frame .eh_frame_hdr
   03     .data.rel.ro.local .fini_array .init_array .data.rel.ro .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06     .eh_frame_hdr
   07
   08     .tbss
   09     .data.rel.ro.local .fini_array .init_array .data.rel.ro .dynamic .got

File: binary.mold
There are 52 section headers, starting at offset 0x44d529b8:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        00000000002002e0 0002e0 00001c 00   A  0   0  1
  [ 2] .note.gnu.build-id NOTE            00000000002002fc 0002fc 000024 00   A  0   0  4
  [ 3] .note.ABI-tag     NOTE            0000000000200320 000320 000020 00   A  0   0  4
  [ 4] .gnu.hash         GNU_HASH        0000000000200340 000340 02e368 00   A  5   0  8
  [ 5] .dynsym           DYNSYM          000000000022e6a8 02e6a8 209340 18   A  6   1  8
  [ 6] .dynstr           STRTAB          00000000004379e8 2379e8 76b8d9 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          0000000000ba32c2 9a32c2 02b6f0 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         0000000000bce9b8 9ce9b8 0001f0 00   A  6   6  8
  [ 9] .rela.dyn         RELA            0000000000bceba8 9ceba8 231c18 18   A  5   0  8
  [10] .rela.plt         RELA            0000000000e007c0 c007c0 148500 18   A  5  33  8
  [11] .eh_frame         PROGBITS        0000000000f48cc0 d48cc0 1dd7494 00   A  0   0  8
  [12] .eh_frame_hdr     PROGBITS        0000000002d20154 2b20154 680454 00   A  0   0  4
  [13] .gcc_except_table PROGBITS        00000000033a05a8 31a05a8 b7623a 00   A  0   0  4
  [14] .rodata           PROGBITS        0000000003f16800 3d16800 686a81 00   A  0   0 32
  [15] .rodata.cst16     PROGBITS        000000000459d290 439d290 000090 10  AM  0   0 16
  [16] .rodata.cst4      PROGBITS        000000000459d320 439d320 00002c 04  AM  0   0  4
  [17] .rodata.cst8      PROGBITS        000000000459d350 439d350 001230 08  AM  0   0  8
  [18] .rodata.str1.1    PROGBITS        000000000459e580 439e580 2d2454 01 AMS  0   0  1
  [19] .rodata.str1.8    PROGBITS        00000000048709d8 46709d8 81c3d0 01 AMS  0   0  8
  [20] .fini             PROGBITS        000000000508dda8 4e8cda8 00000d 00  AX  0   0  4
  [21] .init             PROGBITS        000000000508ddb8 4e8cdb8 00001b 00  AX  0   0  4
  [22] .plt              PROGBITS        000000000508dde0 4e8cde0 0dae20 00  AX  0   0 16
  [23] .plt.got          PROGBITS        0000000005168c00 4f67c00 000668 00  AX  0   0 16
  [24] .text             PROGBITS        0000000005169270 4f68270 f97b745 00  AX  0   0 16
  [25] .tbss             NOBITS          0000000014ae59b8 148e39b5 000008 00 WAT  0   0  8
  [26] .data.rel.ro      PROGBITS        0000000014ae59c0 148e39c0 492738 00  WA  0   0 32
  [27] .dynamic          DYNAMIC         0000000014f780f8 14d760f8 0025f0 10  WA  6   0  8
  [28] .fini_array       FINI_ARRAY      0000000014f7a6e8 14d786e8 000008 00  WA  0   0  8
  [29] .init_array       INIT_ARRAY      0000000014f7a6f0 14d786f0 030ae8 00  WA  0   0  8
  [30] .got              PROGBITS        0000000014fab1d8 14da91d8 0c1a28 00  WA  0   0  8
  [31] .relro_padding    NOBITS          000000001506cc00 14e6ac00 000400 00  WA  0   0  1
  [32] .data             PROGBITS        000000001506dc00 14e6ac00 1c1048 00  WA  0   0 32
  [33] .got.plt          PROGBITS        000000001522ec48 1502bc48 06d718 00  WA  0   0  8
  [34] .tm_clone_table   PROGBITS        000000001529c360 15099360 000000 00  WA  0   0  8
  [35] .bss              NOBITS          000000001529c360 15099360 3e4da8 00  WA  0   0 32
  [36] .gnu.build.attributes NOTE            0000000000000000 15099360 000804 00      0   0  4
  [37] .gnu.build.attributes.exit NOTE            0000000000000000 15099b64 00057c 00      0   0  4
  [38] .gnu.build.attributes.hot NOTE            0000000000000000 1509a0e0 00057c 00      0   0  4
  [39] .gnu.build.attributes.startup NOTE            0000000000000000 1509a65c 00057c 00      0   0  4
  [40] .gnu.build.attributes.unlikely NOTE            0000000000000000 1509abd8 00057c 00      0   0  4
  [41] .comment          PROGBITS        0000000000000000 1509b154 0000ab 01  MS  0   0  1
  [42] .debug_abbrev     PROGBITS        0000000000000000 1509b1ff 406298 00   C  0   0  1
  [43] .debug_aranges    PROGBITS        0000000000000000 154a1497 1e2e39 00   C  0   0  1
  [44] .debug_info       PROGBITS        0000000000000000 156842d0 11b600b3 00   C  0   0  1
  [45] .debug_line       PROGBITS        0000000000000000 271e4383 4878368 00   C  0   0  1
  [46] .debug_ranges     PROGBITS        0000000000000000 2ba5c6eb fe1b6b 00   C  0   0  1
  [47] .debug_str        PROGBITS        0000000000000000 2ca3e256 bcbe80b 01 MSC  0   0  1
  [48] .shstrtab         STRTAB          0000000000000000 386fca61 00027c 00      0   0  1
  [49] .strtab           STRTAB          0000000000000000 386fccdd 8c758f0 00      0   0  1
  [50] .symtab           SYMTAB          0000000000000000 413725d0 39e03e8 18     49 2439688  8
  [51] .gdb_index        PROGBITS        0000000000000000 44d536b8 4a4dbc 00      0   0  4
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

Elf file type is EXEC (Executable file)
Entry point 0x5169270
There are 12 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0002a0 0x0002a0 R   0x8
  INTERP         0x0002e0 0x00000000002002e0 0x00000000002002e0 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  NOTE           0x0002fc 0x00000000002002fc 0x00000000002002fc 0x000044 0x000044 R   0x4
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x4e8cda8 0x4e8cda8 R   0x1000
  LOAD           0x4e8cda8 0x000000000508dda8 0x000000000508dda8 0xfa56c0d 0xfa56c0d R E 0x1000
  LOAD           0x148e39c0 0x0000000014ae59c0 0x0000000014ae59c0 0x587240 0x587640 RW  0x1000
  LOAD           0x14e6ac00 0x000000001506dc00 0x000000001506dc00 0x22e760 0x613508 RW  0x1000
  TLS            0x0009b8 0x0000000014ae59b8 0x0000000014ae59b8 0x000000 0x000008 R   0x8
  DYNAMIC        0x14d760f8 0x0000000014f780f8 0x0000000014f780f8 0x0025f0 0x0025f0 RW  0x8
  GNU_EH_FRAME   0x2b20154 0x0000000002d20154 0x0000000002d20154 0x680454 0x680454 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x1
  GNU_RELRO      0x148e39c0 0x0000000014ae59c0 0x0000000014ae59c0 0x587240 0x587640 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .note.gnu.build-id .note.ABI-tag
   03     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .eh_frame .eh_frame_hdr .gcc_except_table .rodata .rodata.cst16 .rodata.cst4 .rodata.cst8 .rodata.str1.1 .rodata.str1.8
   04     .fini .init .plt .plt.got .text
   05     .data.rel.ro .dynamic .fini_array .init_array .got .relro_padding
   06     .data .got.plt .bss
   07     .tbss
   08     .dynamic
   09     .eh_frame_hdr
   10
   11     .data.rel.ro .dynamic .fini_array .init_array .got .relro_padding

@rui314
Copy link
Owner

rui314 commented Oct 11, 2024

Nothing seems to be particularly wrong to me, so it may be just due to noise caused by a layout difference. I'd relink the program with -Wl,--shuffle-sections to randomize section order and get the memory usage number again to see if it makes a difference. If it's different every time you relink the binary, it's just noise and not something you need to worry about.

I also want to note that you generally do not have to worry too much about VmHWM after process startup, as what matters is the memory usage while your process is running.

@aallrd
Copy link
Author

aallrd commented Oct 11, 2024

Using binaries re-linked with mold and -Wl,--shuffle-sections I got bigger VmHWM value regression for the same test, about ~5%.
I'll run more tests with/without this linker option to see if this behavior is consistent, so far it seems to be.
I did not expect this runtime impact so I am a bit puzzled and I would like to understand why it happens.

@rui314
Copy link
Owner

rui314 commented Oct 12, 2024

Generally, the linker doesn't affect a linked program's performance or memory usage. However, there are some random factors, such as file layout. For example, if one linker happens to place initializer functions' machine code in a single page, while with another linker they span two pages, the resident set size of the latter linker will be one page larger than the former after process startup. It does not mean that the former linker works better than the latter; it's just randomness.

That randomness appears larger immediately after process startup, as the program doesn't use that memory after all. A small difference at that point shouldn't matter much.

What matters is memory usage while the program is actually being used. Did you observe any difference in that situation?

@aallrd
Copy link
Author

aallrd commented Oct 14, 2024

Thank you for your answer, I think this is what I observe for my program:

$ pmap -x MOLD_PID
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000200000   80444   31548       0 r---- binary.mold
000000000508f000  256352  247836       0 r-x-- binary.mold
0000000014ae7000    5664    5624    4680 r---- binary.mold
000000001506f000    2240    2052     992 rw--- binary.mold
[...]
---------------- ------- ------- -------
binary (kB)              287060
total kB         2693068 1072656  381656

$ pmap -x GOLD_PID
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000  339440  269240       0 r-x-- binary.gold
0000000014f7d000    4944    4428    3744 r---- binary.gold
0000000015451000    2248    1932     900 rw--- binary.gold
[...]
---------------- ------- ------- -------
binary (kB)              275600
total kB         2701128 1030044  372748

binary diff: 11MB
total diff: 41MB

The memory usage while the program is actually being used does not change, but there is this fixed memory increase that shows as a regression on all of the steps of my benchmarks.

Do you know if there is a way to optimize the linker's output to reduce the number of pages used ? Maybe even from the code itself ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants