Skip to content

Commit

Permalink
Range extension thunks
Browse files Browse the repository at this point in the history
We are considering three use cases here.

1. A true large code model needs to support more than 2 GiB of text;
   data accesses are out of scope for this change but jumps and calls
   across a range of more than 2 GiB are needed.  Most users of a large
   model will have more than 2 GiB of data but small text, or text with
   a highly local call pattern, so we want most calls to be able to use
   the auipc+jalr sequence.  This would normally call for relaxation,
   but relaxation requires object files to contain the longest possible
   sequence, of which several are possible.  Instead, keep the sequences
   the same and allow thunk insertion.

2. For executables and shared objects in a Unix environment, most of the
   code size benefits of relaxation come from call->jal relaxation, not
   data or TLS relaxation.  If the compiler is modified to generate jal
   instructions instead of call instructions, the code size benefits can
   be achieved without relaxation at all, but this requires JAL_THUNK to
   avoid relocation errors at a 1 MiB limit.

3. If a function has many static call sites in a large binary but is
   known to be dynamically cold, due to a function attribute or PGO, the
   call sites can be replaced with jal instructions, sharing a single
   thunk between all call sites within a 2 MiB text region.  This saves
   code size at small runtime cost.

Restricting the register usage of the thunks is an intentional feature
copied from the Go 1.15 toolchain, where every non-leaf function
requires a conditional call to runtime.morestack in the prologue; since
ra cannot be saved before the stack frame is allocated, the call is
performed using t0 as the return register.
  • Loading branch information
sorear committed Feb 20, 2024
1 parent 5ffe5b5 commit 96bdae0
Showing 1 changed file with 82 additions and 27 deletions.
109 changes: 82 additions & 27 deletions riscv-elf.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,11 @@ Description:: Additional information about the relocation
<| S - P
.2+| 65 .2+| TLSDESC_CALL .2+| Static | .2+| Annotate call to TLS descriptor resolver function, `%tlsdesc_call(address of %tlsdesc_hi)`, for relaxation purposes only
<|
.2+| 66-191 .2+| *Reserved* .2+| - | .2+| Reserved for future standard use
.2+| 66 .2+| JAL_THUNK .2+| Static | _J-Type_ .2+| 20-bit PC-relative jump, allowed to use a range extension thunk
<| S + A - P
.2+| 67 .2+| CALL_THUNK .2+| Static | _U+I-Type_ .2+| 32-bit PC-relative function call, allowed to use a range extension thunk
<| S + A - P
.2+| 68-191 .2+| *Reserved* .2+| - | .2+| Reserved for future standard use
<|
.2+| 192-255 .2+| *Reserved* .2+| - | .2+| Reserved for nonstandard ABI extensions
<|
Expand Down Expand Up @@ -688,16 +692,17 @@ and fills in the GOT entry for subsequent calls to the function:

==== Procedure Calls

`R_RISCV_CALL` and `R_RISCV_CALL_PLT` relocations are associated with
pairs of instructions (`AUIPC+JALR`) generated by the `CALL` or `TAIL`
pseudoinstructions. Originally, these relocations had slightly different
behavior, but that has turned out to be unnecessary, and they are now
interchangeable, `R_RISCV_CALL` is deprecated, suggest using `R_RISCV_CALL_PLT`
instead.
`R_RISCV_CALL`, `R_RISCV_CALL_PLT`, and `R_RISCV_CALL_THUNK` relocations are
associated with pairs of instructions (`AUIPC+JALR`) generated by the `CALL` or
`TAIL` pseudoinstructions. Originally, these relocations had slightly
different behavior, but that has turned out to be unnecessary, and they are now
interchangeable, `R_RISCV_CALL` is deprecated, suggest using
`R_RISCV_CALL_PLT` instead.

With linker relaxation enabled, the `AUIPC` instruction in the `AUIPC+JALR` pair has
both a `R_RISCV_CALL` or `R_RISCV_CALL_PLT` relocation and an `R_RISCV_RELAX`
relocation indicating the instruction sequence can be relaxed during linking.
With linker relaxation enabled, the `AUIPC` instruction in the `AUIPC+JALR`
pair has both a `R_RISCV_CALL`, `R_RISCV_CALL_PLT`, or `R_RISCV_CALL_THUNK`
relocation and an `R_RISCV_RELAX` relocation indicating the instruction
sequence can be relaxed during linking.

Procedure call linker relaxation allows the `AUIPC+JALR` pair to be relaxed
to the `JAL` instruction when the procedure or PLT entry is within (-1MiB to
Expand Down Expand Up @@ -735,6 +740,55 @@ that can represent an even signed 21-bit offset (-1MiB to +1MiB-2).
Branch (SB-Type) instructions have a `R_RISCV_BRANCH` relocation that
can represent an even signed 13-bit offset (-4096 to +4094).

==== Range Extension Thunks

`R_RISCV_JAL_THUNK` and `R_RISCV_CALL_THUNK` relocations may be resolved by the
linker to point to a range extension thunk instead of the target symbol. Range
extension thunks will eventually transfer control to the target symbol, and
preserve the contents of memory and all registers except for `t1` and `t2`.

[NOTE]
.Suggested forms of range extension thunks
====
20-bit range:
[,asm]
----
jal zero, <offset to target>
----
32-bit range:
[,asm]
----
auipc t2, <high offset to target>
jalr zero, t2, <low offset to target>
----
64-bit range, position dependent:
[,asm]
----
auipc t2, <high offset to literal>
ld t2, <low offset to literal>(t2)
jalr zero, t2, 0 OR c.jr t2
...
.quad 0
----
64-bit range, position independent:
[,asm]
----
auipc t1, <high offset to literal>
ld t2, <low offset to literal>(t1)
add t2, t2, t1 OR c.add t2, t1
jalr zero, t2, 0 OR c.jr t2
...
.quad <offset to target from auipc result>
----
====

==== PC-Relative Symbol Addresses

32-bit PC-relative relocations for symbol addresses on sequences of
Expand Down Expand Up @@ -1454,17 +1508,17 @@ which made the load instruction reference to an unspecified address.

==== Function Call Relaxation

Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT.
Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_CALL_THUNK.

Description:: This relaxation type can relax `AUIPC+JALR` into `JAL`.

Condition:: The offset between the location of relocation and target symbol or
the PLT stub of the target symbol is within +-1MiB.

Relaxation::
- Instruction sequence associated with `R_RISCV_CALL` or `R_RISCV_CALL_PLT`
can be rewritten to a single JAL instruction with the offset between the
location of relocation and target symbol.
- Instruction sequence associated with `R_RISCV_CALL`, `R_RISCV_CALL_PLT`,
or `R_RISCV_CALL_THUNK` can be rewritten to a single JAL instruction with
the offset between the location of relocation and target symbol.

Example::
+
Expand All @@ -1490,7 +1544,7 @@ symbol.
[[compress-func-call-relax]]
==== Compressed Function Call Relaxation

Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT.
Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_CALL_THUNK.

Description:: This relaxation type can relax `AUIPC+JALR` into `C.JAL`
instruction sequence.
Expand All @@ -1500,9 +1554,9 @@ symbol.
instruction in the instruction sequence is `X1`/`RA` and if it is RV32.

Relaxation::
- Instruction sequence associated with `R_RISCV_CALL` or `R_RISCV_CALL_PLT`
can be rewritten to a single `C.JAL` instruction with the offset between the
location of relocation and target symbol.
- Instruction sequence associated with `R_RISCV_CALL`, `R_RISCV_CALL_PLT`,
or `R_RISCV_CALL_THUNK` can be rewritten to a single `C.JAL` instruction with
the offset between the location of relocation and target symbol.

Example::
+
Expand All @@ -1524,7 +1578,7 @@ Relaxation result:
[[compress-tailcall-relax]]
==== Compressed Tail Call Relaxation

Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT.
Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_CALL_THUNK

Description:: This relaxation type can relax `AUIPC+JALR` into `C.J`
instruction sequence.
Expand All @@ -1534,9 +1588,9 @@ Relaxation result:
instruction in the instruction sequence is `X0`.

Relaxation::
- Instruction sequence associated with `R_RISCV_CALL` or `R_RISCV_CALL_PLT`
can be rewritten to a single `C.J` instruction with the offset between the
location of relocation and target symbol.
- Instruction sequence associated with `R_RISCV_CALL`, `R_RISCV_CALL_PLT`,
or `R_RISCV_CALL_THUNK` can be rewritten to a single `C.J` instruction with
the offset between the location of relocation and target symbol.

Example::
+
Expand Down Expand Up @@ -1912,7 +1966,8 @@ Relaxation result (short form):

==== Table Jump Relaxation

Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_JAL.
Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_CALL_THUNK,
R_RISCV_JAL, R_RISCV_JAL_THUNK.

Description:: This relaxation type can relax a function call or jump
instruction into a single table jump instruction with the index of the target
Expand All @@ -1933,10 +1988,10 @@ Relaxation result (short form):
is `X0` or `RA`.

Relaxation::
- Instruction sequence associated with `R_RISCV_CALL` or `R_RISCV_CALL_PLT`
can be rewritten to a table jump instruction.
- Instruction associated with `R_RISCV_JAL` can be rewritten to a table
jump instruction.
- Instruction sequence associated with `R_RISCV_CALL`, `R_RISCV_CALL_PLT`,
or `R_RISCV_CALL_THUNK` can be rewritten to a table jump instruction.
- Instruction associated with `R_RISCV_JAL` or `R_RISCV_JAL_THUNK` can be
rewritten to a table jump instruction.
Example::
+
--
Expand Down

0 comments on commit 96bdae0

Please sign in to comment.