From 9ee88051801c8bfcab5ae581da1129a843d5b7d6 Mon Sep 17 00:00:00 2001 From: Stefan O'Rear Date: Tue, 20 Feb 2024 14:35:53 -0500 Subject: [PATCH] Range extension thunks We are considering three use cases here. 1. A true large code model needs to support more than 2 GiB of text; data accesses are out of scope for this change but jumps and calls across a range of more than 2 GiB are needed. Most users of a large model will have more than 2 GiB of data but small text, or text with a highly local call pattern, so we want most calls to be able to use the auipc+jalr sequence. This would normally call for relaxation, but relaxation requires object files to contain the longest possible sequence, of which several are possible. Instead, keep the sequences the same and allow thunk insertion. 2. There have been requests from developers of linkers other than ld.bfd to avoid the use of length-changing relaxation, since it is a unique linking step for RISC-V among common large systems architectures, can be computationally expensive, and requires a substantial number of additional relocations in object files. In an environment which makes limited use of global data, most of the code size benefits of relaxation come from call->jal relaxation. If the compiler is modified to generate jal instructions instead of call instructions, the code size benefits can be achieved without relaxation at all, but this requires JAL_THUNK to avoid relocation errors at a 1 MiB limit. 3. If a function has many static call sites in a large binary but is known to be dynamically cold, due to a function attribute or PGO, the call sites can be replaced with jal instructions, sharing a single thunk between all call sites within a 2 MiB text region. This saves code size at small runtime cost. Restricting the register usage of the thunks is an intentional feature copied from the Go 1.15 toolchain, where every non-leaf function requires a conditional call to runtime.morestack in the prologue; since ra cannot be saved before the stack frame is allocated, the call is performed using t0 as the return register. Range extension thunks use t1 and t2 as temporary registers, while traditional practice in PLT entries is to use t1 and t3. This is a necessary change to support the use of software-guarded branches when Zicfilp support is added. Thunk insertion is forbidden inside a single section, since relaxation assumes that sections cannot grow and to ensure that intra-function branches do not impact register allocation. Thunks may be inserted on any cross-section jump; while linkers should endeavor to minimize the number of thunks used, this is a complex optimization problem with a quality/time tradeoff and mandating any algorithm would be inappropriate. Since this change redefines the existing relocation types, it is a **backwards incompatible ABI change** if object files expect t1 and t2 to retain their values across jumps that cross section boundaries. In practice, cross-section JAL relocations are not generated by current compilers and rarely appear in handwritten assembly, and linkers are not expected to generate thunks for CALL and CALL_PLT until the output approaches the 2 GiB limit of auipc+jalr sequences. Signed-off-by: Stefan O'Rear --- riscv-elf.adoc | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index d5e560a7..2587dda8 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -735,6 +735,56 @@ that can represent an even signed 21-bit offset (-1MiB to +1MiB-2). Branch (SB-Type) instructions have a `R_RISCV_BRANCH` relocation that can represent an even signed 13-bit offset (-4096 to +4094). +==== Range Extension Thunks + +`R_RISCV_JAL`, `R_RISCV_CALL`, and `R_RISCV_CALL_PLT` relocations to targets in +other input sections may be resolved by the linker to point to a range +extension thunk instead of the target symbol. Range extension thunks will +eventually transfer control to the target symbol, and preserve the contents of +memory and all registers except for `t1` and `t2`. + +[NOTE] +.Suggested forms of range extension thunks +==== +20-bit range: + +[,asm] +---- + jal zero, +---- + +32-bit range: + +[,asm] +---- + auipc t2, + jalr zero, t2, +---- + +64-bit range, position dependent: + +[,asm] +---- + auipc t2, + ld t2, (t2) + jalr zero, t2, 0 OR c.jr t2 + ... + .quad 0 +---- + +64-bit range, position independent: + +[,asm] +---- + auipc t1, + ld t2, (t1) + add t2, t2, t1 OR c.add t2, t1 + jalr zero, t2, 0 OR c.jr t2 + ... + .quad +---- +==== + ==== PC-Relative Symbol Addresses 32-bit PC-relative relocations for symbol addresses on sequences of