Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
We are considering three use cases here. 1. A true large code model needs to support more than 2 GiB of text; data accesses are out of scope for this change but jumps and calls across a range of more than 2 GiB are needed. Most users of a large model will have more than 2 GiB of data but small text, or text with a highly local call pattern, so we want most calls to be able to use the auipc+jalr sequence. This would normally call for relaxation, but relaxation requires object files to contain the longest possible sequence, of which several are possible. Instead, keep the sequences the same and allow thunk insertion. 2. There have been requests from developers of linkers other than ld.bfd to avoid the use of length-changing relaxation, since it is a unique linking step for RISC-V among common large systems architectures, can be computationally expensive, and requires a substantial number of additional relocations in object files. In an environment which makes limited use of global data, most of the code size benefits of relaxation come from call->jal relaxation. If the compiler is modified to generate jal instructions instead of call instructions, the code size benefits can be achieved without relaxation at all, but this requires JAL_THUNK to avoid relocation errors at a 1 MiB limit. 3. If a function has many static call sites in a large binary but is known to be dynamically cold, due to a function attribute or PGO, the call sites can be replaced with jal instructions, sharing a single thunk between all call sites within a 2 MiB text region. This saves code size at small runtime cost. Restricting the register usage of the thunks is an intentional feature copied from the Go 1.15 toolchain, where every non-leaf function requires a conditional call to runtime.morestack in the prologue; since ra cannot be saved before the stack frame is allocated, the call is performed using t0 as the return register. Range extension thunks use t1 and t2 as temporary registers, while traditional practice in PLT entries is to use t1 and t3. This is a necessary change to support the use of software-guarded branches when Zicfilp support is added. Thunk insertion is forbidden inside a single section, since relaxation assumes that sections cannot grow and to ensure that intra-function branches do not impact register allocation. Thunks may be inserted on any cross-section jump; while linkers should endeavor to minimize the number of thunks used, this is a complex optimization problem with a quality/time tradeoff and mandating any algorithm would be inappropriate. Since this change redefines the existing relocation types, it is a **backwards incompatible ABI change** if object files expect t1 and t2 to retain their values across jumps that cross section boundaries. In practice, cross-section JAL relocations are not generated by current compilers and rarely appear in handwritten assembly, and linkers are not expected to generate thunks for CALL and CALL_PLT until the output approaches the 2 GiB limit of auipc+jalr sequences. Signed-off-by: Stefan O'Rear <[email protected]>
- Loading branch information