Add the extra statistics of relative relocations in large binaries #1375
+144
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
I am using the
mold
linker to quickly link a large monolithic application (over 20 GB with debug information). The primary challenge with my binary is the constantly growing (and sometimes uncontrolled) portion of the business logic. Furthermore, the structure of the application's dependencies is highly heterogeneous, and I lack the ability to control how they were compiled — whether withPIE
or without, and whether with-mcmodel=large
or not. This leads to unpredictable issues during linking; for example, certain relocations (e.g., PC-relative) cannot be resolved because sections containing business logic have become too large (e.g., R_X86_64_32S allows for offsets less than ±2GB).Using the
-mcmodel=large
and producing only absolute relocations for all components of the binary is not feasible in my case. Therefore, I need a method to detect the relocations nearest to overflowing. Based on your design principles of determinism and build reproducibility, I can rely on the fact that the resulting binary structure will not change significantly from one build to another.Solution
For each architecture, there are
apply_reloc_alloc
andapply_reloc_nonalloc
methods inInputSection
where thecheck
routine verifies the relocation range depending on the relocation type. We can update thecheck
routine to record the minimum distance to the upper and lower bounds of the range for the current section. After processing all relocation entries of a section, we can update the global minimums in the context.As a result, we will obtain two new metrics:
relative_relocations_offset_infimum
andrelative_relocations_offset_supremum
, which can be interpreted as indicators of "how much space is still available" in the large binary. Although these aren't universal indicators, they may be extremely helpful for managing large monolithic applications.Impact
--stats
option is enabled.