From 96bdae06b59f9bb015cba80c03e9b53be69c5518 Mon Sep 17 00:00:00 2001 From: Stefan O'Rear Date: Tue, 20 Feb 2024 14:35:53 -0500 Subject: [PATCH] Range extension thunks We are considering three use cases here. 1. A true large code model needs to support more than 2 GiB of text; data accesses are out of scope for this change but jumps and calls across a range of more than 2 GiB are needed. Most users of a large model will have more than 2 GiB of data but small text, or text with a highly local call pattern, so we want most calls to be able to use the auipc+jalr sequence. This would normally call for relaxation, but relaxation requires object files to contain the longest possible sequence, of which several are possible. Instead, keep the sequences the same and allow thunk insertion. 2. For executables and shared objects in a Unix environment, most of the code size benefits of relaxation come from call->jal relaxation, not data or TLS relaxation. If the compiler is modified to generate jal instructions instead of call instructions, the code size benefits can be achieved without relaxation at all, but this requires JAL_THUNK to avoid relocation errors at a 1 MiB limit. 3. If a function has many static call sites in a large binary but is known to be dynamically cold, due to a function attribute or PGO, the call sites can be replaced with jal instructions, sharing a single thunk between all call sites within a 2 MiB text region. This saves code size at small runtime cost. Restricting the register usage of the thunks is an intentional feature copied from the Go 1.15 toolchain, where every non-leaf function requires a conditional call to runtime.morestack in the prologue; since ra cannot be saved before the stack frame is allocated, the call is performed using t0 as the return register. --- riscv-elf.adoc | 109 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 82 insertions(+), 27 deletions(-) diff --git a/riscv-elf.adoc b/riscv-elf.adoc index d5e560a7..72caa90f 100644 --- a/riscv-elf.adoc +++ b/riscv-elf.adoc @@ -503,7 +503,11 @@ Description:: Additional information about the relocation <| S - P .2+| 65 .2+| TLSDESC_CALL .2+| Static | .2+| Annotate call to TLS descriptor resolver function, `%tlsdesc_call(address of %tlsdesc_hi)`, for relaxation purposes only <| -.2+| 66-191 .2+| *Reserved* .2+| - | .2+| Reserved for future standard use +.2+| 66 .2+| JAL_THUNK .2+| Static | _J-Type_ .2+| 20-bit PC-relative jump, allowed to use a range extension thunk + <| S + A - P +.2+| 67 .2+| CALL_THUNK .2+| Static | _U+I-Type_ .2+| 32-bit PC-relative function call, allowed to use a range extension thunk + <| S + A - P +.2+| 68-191 .2+| *Reserved* .2+| - | .2+| Reserved for future standard use <| .2+| 192-255 .2+| *Reserved* .2+| - | .2+| Reserved for nonstandard ABI extensions <| @@ -688,16 +692,17 @@ and fills in the GOT entry for subsequent calls to the function: ==== Procedure Calls -`R_RISCV_CALL` and `R_RISCV_CALL_PLT` relocations are associated with -pairs of instructions (`AUIPC+JALR`) generated by the `CALL` or `TAIL` -pseudoinstructions. Originally, these relocations had slightly different -behavior, but that has turned out to be unnecessary, and they are now -interchangeable, `R_RISCV_CALL` is deprecated, suggest using `R_RISCV_CALL_PLT` -instead. +`R_RISCV_CALL`, `R_RISCV_CALL_PLT`, and `R_RISCV_CALL_THUNK` relocations are +associated with pairs of instructions (`AUIPC+JALR`) generated by the `CALL` or +`TAIL` pseudoinstructions. Originally, these relocations had slightly +different behavior, but that has turned out to be unnecessary, and they are now +interchangeable, `R_RISCV_CALL` is deprecated, suggest using +`R_RISCV_CALL_PLT` instead. -With linker relaxation enabled, the `AUIPC` instruction in the `AUIPC+JALR` pair has -both a `R_RISCV_CALL` or `R_RISCV_CALL_PLT` relocation and an `R_RISCV_RELAX` -relocation indicating the instruction sequence can be relaxed during linking. +With linker relaxation enabled, the `AUIPC` instruction in the `AUIPC+JALR` +pair has both a `R_RISCV_CALL`, `R_RISCV_CALL_PLT`, or `R_RISCV_CALL_THUNK` +relocation and an `R_RISCV_RELAX` relocation indicating the instruction +sequence can be relaxed during linking. Procedure call linker relaxation allows the `AUIPC+JALR` pair to be relaxed to the `JAL` instruction when the procedure or PLT entry is within (-1MiB to @@ -735,6 +740,55 @@ that can represent an even signed 21-bit offset (-1MiB to +1MiB-2). Branch (SB-Type) instructions have a `R_RISCV_BRANCH` relocation that can represent an even signed 13-bit offset (-4096 to +4094). +==== Range Extension Thunks + +`R_RISCV_JAL_THUNK` and `R_RISCV_CALL_THUNK` relocations may be resolved by the +linker to point to a range extension thunk instead of the target symbol. Range +extension thunks will eventually transfer control to the target symbol, and +preserve the contents of memory and all registers except for `t1` and `t2`. + +[NOTE] +.Suggested forms of range extension thunks +==== +20-bit range: + +[,asm] +---- + jal zero, +---- + +32-bit range: + +[,asm] +---- + auipc t2, + jalr zero, t2, +---- + +64-bit range, position dependent: + +[,asm] +---- + auipc t2, + ld t2, (t2) + jalr zero, t2, 0 OR c.jr t2 + ... + .quad 0 +---- + +64-bit range, position independent: + +[,asm] +---- + auipc t1, + ld t2, (t1) + add t2, t2, t1 OR c.add t2, t1 + jalr zero, t2, 0 OR c.jr t2 + ... + .quad +---- +==== + ==== PC-Relative Symbol Addresses 32-bit PC-relative relocations for symbol addresses on sequences of @@ -1454,7 +1508,7 @@ which made the load instruction reference to an unspecified address. ==== Function Call Relaxation - Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT. + Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_CALL_THUNK. Description:: This relaxation type can relax `AUIPC+JALR` into `JAL`. @@ -1462,9 +1516,9 @@ which made the load instruction reference to an unspecified address. the PLT stub of the target symbol is within +-1MiB. Relaxation:: - - Instruction sequence associated with `R_RISCV_CALL` or `R_RISCV_CALL_PLT` - can be rewritten to a single JAL instruction with the offset between the - location of relocation and target symbol. + - Instruction sequence associated with `R_RISCV_CALL`, `R_RISCV_CALL_PLT`, + or `R_RISCV_CALL_THUNK` can be rewritten to a single JAL instruction with + the offset between the location of relocation and target symbol. Example:: + @@ -1490,7 +1544,7 @@ symbol. [[compress-func-call-relax]] ==== Compressed Function Call Relaxation - Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT. + Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_CALL_THUNK. Description:: This relaxation type can relax `AUIPC+JALR` into `C.JAL` instruction sequence. @@ -1500,9 +1554,9 @@ symbol. instruction in the instruction sequence is `X1`/`RA` and if it is RV32. Relaxation:: - - Instruction sequence associated with `R_RISCV_CALL` or `R_RISCV_CALL_PLT` - can be rewritten to a single `C.JAL` instruction with the offset between the - location of relocation and target symbol. + - Instruction sequence associated with `R_RISCV_CALL`, `R_RISCV_CALL_PLT`, + or `R_RISCV_CALL_THUNK` can be rewritten to a single `C.JAL` instruction with + the offset between the location of relocation and target symbol. Example:: + @@ -1524,7 +1578,7 @@ Relaxation result: [[compress-tailcall-relax]] ==== Compressed Tail Call Relaxation - Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT. + Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_CALL_THUNK Description:: This relaxation type can relax `AUIPC+JALR` into `C.J` instruction sequence. @@ -1534,9 +1588,9 @@ Relaxation result: instruction in the instruction sequence is `X0`. Relaxation:: - - Instruction sequence associated with `R_RISCV_CALL` or `R_RISCV_CALL_PLT` - can be rewritten to a single `C.J` instruction with the offset between the - location of relocation and target symbol. + - Instruction sequence associated with `R_RISCV_CALL`, `R_RISCV_CALL_PLT`, + or `R_RISCV_CALL_THUNK` can be rewritten to a single `C.J` instruction with + the offset between the location of relocation and target symbol. Example:: + @@ -1912,7 +1966,8 @@ Relaxation result (short form): ==== Table Jump Relaxation - Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_JAL. + Target Relocation::: R_RISCV_CALL, R_RISCV_CALL_PLT, R_RISCV_CALL_THUNK, + R_RISCV_JAL, R_RISCV_JAL_THUNK. Description:: This relaxation type can relax a function call or jump instruction into a single table jump instruction with the index of the target @@ -1933,10 +1988,10 @@ Relaxation result (short form): is `X0` or `RA`. Relaxation:: - - Instruction sequence associated with `R_RISCV_CALL` or `R_RISCV_CALL_PLT` - can be rewritten to a table jump instruction. - - Instruction associated with `R_RISCV_JAL` can be rewritten to a table - jump instruction. + - Instruction sequence associated with `R_RISCV_CALL`, `R_RISCV_CALL_PLT`, + or `R_RISCV_CALL_THUNK` can be rewritten to a table jump instruction. + - Instruction associated with `R_RISCV_JAL` or `R_RISCV_JAL_THUNK` can be + rewritten to a table jump instruction. Example:: + --