Skip to content

Commit

Permalink
Range extension thunks
Browse files Browse the repository at this point in the history
We are considering three use cases here.

1. A true large code model needs to support more than 2 GiB of text;
   data accesses are out of scope for this change but jumps and calls
   across a range of more than 2 GiB are needed. Most users of a large
   model will have more than 2 GiB of data but small text, or text with
   a highly local call pattern, so we want most calls to be able to use
   the auipc+jalr sequence. This would normally call for relaxation, but
   relaxation requires object files to contain the longest possible
   sequence, of which several are possible. Instead, keep the sequences
   the same and allow thunk insertion.

2. There have been requests from developers of linkers other than ld.bfd
   to avoid the use of length-changing relaxation, since it is a unique
   linking step for RISC-V among common large systems architectures, can
   be computationally expensive, and requires a substantial number of
   additional relocations in object files.

   In an environment which makes limited use of global data, most of the
   code size benefits of relaxation come from call->jal relaxation. If
   the compiler is modified to generate jal instructions instead of call
   instructions, the code size benefits can be achieved without
   relaxation at all, but this requires JAL_THUNK to avoid relocation
   errors at a 1 MiB limit.

3. If a function has many static call sites in a large binary but is
   known to be dynamically cold, due to a function attribute or PGO, the
   call sites can be replaced with jal instructions, sharing a single
   thunk between all call sites within a 2 MiB text region. This saves
   code size at small runtime cost.

Restricting the register usage of the thunks is an intentional feature
copied from the Go 1.15 toolchain, where every non-leaf function
requires a conditional call to runtime.morestack in the prologue; since
ra cannot be saved before the stack frame is allocated, the call is
performed using t0 as the return register.

Range extension thunks use t1 and t2 as temporary registers, while
traditional practice in PLT entries is to use t1 and t3. This is a
necessary change to support the use of software-guarded branches when
Zicfilp support is added.

Thunk insertion is forbidden inside a single section, since relaxation
assumes that sections cannot grow and to ensure that intra-function
branches do not impact register allocation. Thunks may be inserted on
any cross-section jump; while linkers should endeavor to minimize the
number of thunks used, this is a complex optimization problem with a
quality/time tradeoff and mandating any algorithm would be
inappropriate.

Since this change redefines the existing relocation types, it is a
**backwards incompatible ABI change** if object files expect t1 and t2
to retain their values across jumps that cross section boundaries. In
practice, cross-section JAL relocations are not generated by current
compilers and rarely appear in handwritten assembly, and linkers are not
expected to generate thunks for CALL and CALL_PLT until the output
approaches the 2 GiB limit of auipc+jalr sequences.

Signed-off-by: Stefan O'Rear <[email protected]>
  • Loading branch information
sorear committed Feb 26, 2024
1 parent 5ffe5b5 commit 9ee8805
Showing 1 changed file with 50 additions and 0 deletions.
50 changes: 50 additions & 0 deletions riscv-elf.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -735,6 +735,56 @@ that can represent an even signed 21-bit offset (-1MiB to +1MiB-2).
Branch (SB-Type) instructions have a `R_RISCV_BRANCH` relocation that
can represent an even signed 13-bit offset (-4096 to +4094).

==== Range Extension Thunks

`R_RISCV_JAL`, `R_RISCV_CALL`, and `R_RISCV_CALL_PLT` relocations to targets in
other input sections may be resolved by the linker to point to a range
extension thunk instead of the target symbol. Range extension thunks will
eventually transfer control to the target symbol, and preserve the contents of
memory and all registers except for `t1` and `t2`.

[NOTE]
.Suggested forms of range extension thunks
====
20-bit range:
[,asm]
----
jal zero, <offset to target>
----
32-bit range:
[,asm]
----
auipc t2, <high offset to target>
jalr zero, t2, <low offset to target>
----
64-bit range, position dependent:
[,asm]
----
auipc t2, <high offset to literal>
ld t2, <low offset to literal>(t2)
jalr zero, t2, 0 OR c.jr t2
...
.quad 0
----
64-bit range, position independent:
[,asm]
----
auipc t1, <high offset to literal>
ld t2, <low offset to literal>(t1)
add t2, t2, t1 OR c.add t2, t1
jalr zero, t2, 0 OR c.jr t2
...
.quad <offset to target from auipc result>
----
====

==== PC-Relative Symbol Addresses

32-bit PC-relative relocations for symbol addresses on sequences of
Expand Down

0 comments on commit 9ee8805

Please sign in to comment.