From 154136304afa5256283094ba00b280dff5f7a4c6 Mon Sep 17 00:00:00 2001 From: Lucas Date: Thu, 3 Oct 2024 13:56:11 -0300 Subject: [PATCH 01/11] SBPF Static Syscalls --- proposals/0176-static-syscalls.md | 87 +++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 proposals/0176-static-syscalls.md diff --git a/proposals/0176-static-syscalls.md b/proposals/0176-static-syscalls.md new file mode 100644 index 000000000..f02a262fe --- /dev/null +++ b/proposals/0176-static-syscalls.md @@ -0,0 +1,87 @@ +--- +simd: '0176' +title: SBPF Static Syscalls +authors: + - Alessandro Decina + - Alexander Meißner + - Lucas Steuernagel +category: Standard +type: Core +status: Draft +created: 2024-09-27 +--- + +## Summary + +This SIMD introduces a new instruction syscall in the SBPF instruction set to +represent syscalls. Such a change aims to remove relocations when resolving +syscalls and simplify the instruction set, allowing for the straightforward +differentiation between external and internal calls. + +## Motivation + +The resolution of syscalls during ELF loading requires relocating addresses, +which is a performance burden for the validator. Relocations require an entire +copy of the ELF file in memory to either relocate addresses we fetch from the +symbol table or offset addresses to after the start of the virtual machine’s +memory. Moreover, relocations pose security concerns, as they allow the +arbitrary modification of program headers and programs sections. A new +separate opcode for syscalls modifies the behavior of the ELF loader, allowing +us to resolve syscalls without relocations. + +## New Terminology + +None. + +## Detailed Design + +The following must go into effect if and only if a program indicates the SBPF +version XX or higher in its ELF header e_flags field, according to the +specification of SIMD-0161. + +### New syscall instruction + +We introduce a new instruction in the SBPF instruction set, which we call +`syscall`. It must be associated with all syscalls in the SBPF format. Its +encoding consists of an opcode `0x95` and an immediate, which must refer to a +previously registered syscall. For more reference on the SBF ISA format, see +the +[spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md). + +For simplicity, syscalls must be represented as a natural number greater than +zero, so that they can be organized in a lookup table. This choice allows for +quick retrieval of syscall information from integer indexes. An instruction +`syscall 2` must represent a call to the function registered at position two +in the lookup table. + +Consequently, system calls in the Solana SDK and in any related compiler tools +must be registered as function pointers, whose address is a natural number +greater than zero, representing their position in a syscall lookup table. The +verifier must enforce that the immediate of a syscall instruction points to a +valid syscall, and throw `VerifierError::InvalidFunction` otherwise. + +This new instruction comes together with modifications in the verification +phase. `call imm` (opcode `0x85`) instructions must only refer to internal +calls and its immediate field must only be interpreted as a relative address +to jump from the program counter. + +### Change of opcode for the exit instrcution + +The opcode `0x9D` must represent the exit instruction, while the old opcode +`0x95` must now be assigned to the new syscall instruction. + +## Alternatives Considered + +None. + +## Impact + +The changes proposed in this SIMD are transparent to dApp developers. The +compiler toolchain will emit correct code for the specified SBF version. +Static syscalls obviate relocations for call instructions and move the virtual +machine closer to eliminating relocations altogether, which can bring +considerable performance improvements. + +## Security Considerations + +None. From 13b36f52bd0541ba0527e84b3260586443ec94c0 Mon Sep 17 00:00:00 2001 From: Lucas Date: Thu, 3 Oct 2024 13:57:43 -0300 Subject: [PATCH 02/11] Bump SIMD number --- proposals/0176-static-syscalls.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0176-static-syscalls.md b/proposals/0176-static-syscalls.md index f02a262fe..254189522 100644 --- a/proposals/0176-static-syscalls.md +++ b/proposals/0176-static-syscalls.md @@ -1,5 +1,5 @@ --- -simd: '0176' +simd: '0178' title: SBPF Static Syscalls authors: - Alessandro Decina From 8a3932addb77f2a7987b9464a7efdbeabd825324 Mon Sep 17 00:00:00 2001 From: Lucas Date: Thu, 3 Oct 2024 14:49:01 -0300 Subject: [PATCH 03/11] Fix typo --- proposals/0176-static-syscalls.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0176-static-syscalls.md b/proposals/0176-static-syscalls.md index 254189522..06f3c8347 100644 --- a/proposals/0176-static-syscalls.md +++ b/proposals/0176-static-syscalls.md @@ -65,7 +65,7 @@ phase. `call imm` (opcode `0x85`) instructions must only refer to internal calls and its immediate field must only be interpreted as a relative address to jump from the program counter. -### Change of opcode for the exit instrcution +### Change of opcode for the exit instruction The opcode `0x9D` must represent the exit instruction, while the old opcode `0x95` must now be assigned to the new syscall instruction. From 4d78d7da614a4a5718920438c45d8b55a17dcca2 Mon Sep 17 00:00:00 2001 From: Lucas Date: Tue, 8 Oct 2024 18:01:53 -0300 Subject: [PATCH 04/11] Rename exit instruction to return --- proposals/0176-static-syscalls.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/proposals/0176-static-syscalls.md b/proposals/0176-static-syscalls.md index 06f3c8347..dc4eac751 100644 --- a/proposals/0176-static-syscalls.md +++ b/proposals/0176-static-syscalls.md @@ -16,7 +16,8 @@ created: 2024-09-27 This SIMD introduces a new instruction syscall in the SBPF instruction set to represent syscalls. Such a change aims to remove relocations when resolving syscalls and simplify the instruction set, allowing for the straightforward -differentiation between external and internal calls. +differentiation between external and internal calls. In addition, it proposes +a new `return` instruction to supersede the `exit` instruction. ## Motivation @@ -65,10 +66,16 @@ phase. `call imm` (opcode `0x85`) instructions must only refer to internal calls and its immediate field must only be interpreted as a relative address to jump from the program counter. -### Change of opcode for the exit instruction +### New return instruction -The opcode `0x9D` must represent the exit instruction, while the old opcode -`0x95` must now be assigned to the new syscall instruction. +The opcode `0x9D` must represent the return instruction, which supersedes the +`exit` instruction. The opcode (opcode `0x95`), previously assigned to the +`exit` instruction, must now be interpreted as the new syscall instruction. + +The verifier must detect an SBPF V1 program containing the `0x9D` opcode and +throw a `VerifierError::UnknowOpCode`. Likewise, if, by any means, a V1 +program reaches the execution stage containing the `0x9D` opcode, an +`EbpfError::UnsupportedInstruction` must be raised. ## Alternatives Considered From 1732281dd7953b0567081a25b19e593d8aadf59e Mon Sep 17 00:00:00 2001 From: Lucas Steuernagel Date: Wed, 16 Oct 2024 12:41:11 -0300 Subject: [PATCH 05/11] Add syscall numbering table --- ...ic-syscalls.md => 0178-static-syscalls.md} | 51 +++++++++++++++++++ 1 file changed, 51 insertions(+) rename proposals/{0176-static-syscalls.md => 0178-static-syscalls.md} (58%) diff --git a/proposals/0176-static-syscalls.md b/proposals/0178-static-syscalls.md similarity index 58% rename from proposals/0176-static-syscalls.md rename to proposals/0178-static-syscalls.md index dc4eac751..c21226aaa 100644 --- a/proposals/0176-static-syscalls.md +++ b/proposals/0178-static-syscalls.md @@ -77,6 +77,57 @@ throw a `VerifierError::UnknowOpCode`. Likewise, if, by any means, a V1 program reaches the execution stage containing the `0x9D` opcode, an `EbpfError::UnsupportedInstruction` must be raised. +### Syscall numbering convention + +Syscalls must be represented by a unique integer to maintain a dense lookup +table data structure for indexing and dispatch. For a clear correlation +between the existing syscalls and their respective identification number, +syscalls must strictly follow the numbering below. + +| Syscall name | Number | +|------------------------------------------|----------| +| abort | 1 | +| sol_panic_ | 2 | +| sol_memcpy_ | 3 | +| sol_memmove_ | 4 | +| sol_memset_ | 5 | +| sol_memcmp_ | 6 | +| sol_log | 7 | +| sol_log_64 | 8 | +| sol_log_pubkey | 9 | +| sol_log_compute_units_ | 10 | +| sol_alloc_free_ | 11 | +| sol_invoke_signed_c | 12 | +| sol_invoke_signed_rust | 13 | +| sol_set_return_data | 14 | +| sol_get_return_data | 15 | +| sol_log_data | 16 | +| sol_sha256 | 17 | +| sol_keccak256 | 18 | +| sol_secp256k1_recover | 19 | +| sol_blake3 | 20 | +| sol_poseidon | 21 | +| sol_get_processed_sibling_instruction | 22 | +| sol_get_stack_height | 23 | +| sol_curve_validate_point | 24 | +| sol_curve_group_op | 25 | +| sol_curve_multiscalar_mul | 26 | +| sol_curve_pairing_map | 27 | +| sol_alt_bn128_group_op | 28 | +| sol_alt_bn128_compression | 29 | +| sol_big_mod_exp | 30 | +| sol_remaining_compute_units | 31 | +| sol_create_program_address | 32 | +| sol_try_find_program_address | 33 | +| sol_get_sysvar | 34 | +| sol_get_epoch_stake | 35 | +| sol_get_clock_sysvar | 36 | +| sol_get_epoch_schedule_sysvar | 37 | +| sol_get_last_restart_slot | 38 | +| sol_get_epoch_rewards_slot | 39 | +| sol_get_fees_sysvar | 40 | +|------------------------------------------|----------| + ## Alternatives Considered None. From df95922a96d97a29fa57a82aa2b9749a3423252b Mon Sep 17 00:00:00 2001 From: Lucas Steuernagel Date: Wed, 16 Oct 2024 12:42:39 -0300 Subject: [PATCH 06/11] Update status --- proposals/0178-static-syscalls.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0178-static-syscalls.md b/proposals/0178-static-syscalls.md index c21226aaa..2b6d16f2d 100644 --- a/proposals/0178-static-syscalls.md +++ b/proposals/0178-static-syscalls.md @@ -7,7 +7,7 @@ authors: - Lucas Steuernagel category: Standard type: Core -status: Draft +status: Review created: 2024-09-27 --- From 13ce52d388d1aca7293efe8a06c5a42c2ff3da65 Mon Sep 17 00:00:00 2001 From: Lucas Steuernagel Date: Sun, 20 Oct 2024 09:23:24 -0300 Subject: [PATCH 07/11] Update table --- proposals/0178-static-syscalls.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/0178-static-syscalls.md b/proposals/0178-static-syscalls.md index 2b6d16f2d..963f54acc 100644 --- a/proposals/0178-static-syscalls.md +++ b/proposals/0178-static-syscalls.md @@ -126,6 +126,7 @@ syscalls must strictly follow the numbering below. | sol_get_last_restart_slot | 38 | | sol_get_epoch_rewards_slot | 39 | | sol_get_fees_sysvar | 40 | +| sol_get_rent_sysvar | 41 | |------------------------------------------|----------| ## Alternatives Considered From ca19222e2b689312cef6081fb21b5de19b6f0168 Mon Sep 17 00:00:00 2001 From: Lucas Steuernagel Date: Tue, 22 Oct 2024 18:29:13 -0300 Subject: [PATCH 08/11] Update error message --- proposals/0178-static-syscalls.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0178-static-syscalls.md b/proposals/0178-static-syscalls.md index 963f54acc..c002af4c0 100644 --- a/proposals/0178-static-syscalls.md +++ b/proposals/0178-static-syscalls.md @@ -59,7 +59,7 @@ Consequently, system calls in the Solana SDK and in any related compiler tools must be registered as function pointers, whose address is a natural number greater than zero, representing their position in a syscall lookup table. The verifier must enforce that the immediate of a syscall instruction points to a -valid syscall, and throw `VerifierError::InvalidFunction` otherwise. +valid syscall, and throw `VerifierError::InvalidSyscall` otherwise. This new instruction comes together with modifications in the verification phase. `call imm` (opcode `0x85`) instructions must only refer to internal From 0c1dc235da17ab307151baeaa668d2258c1c8d62 Mon Sep 17 00:00:00 2001 From: Lucas Date: Tue, 10 Dec 2024 12:40:41 -0300 Subject: [PATCH 09/11] Fix typo in sol_log syscall --- proposals/0178-static-syscalls.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0178-static-syscalls.md b/proposals/0178-static-syscalls.md index c002af4c0..eb1cf3a5a 100644 --- a/proposals/0178-static-syscalls.md +++ b/proposals/0178-static-syscalls.md @@ -92,8 +92,8 @@ syscalls must strictly follow the numbering below. | sol_memmove_ | 4 | | sol_memset_ | 5 | | sol_memcmp_ | 6 | -| sol_log | 7 | -| sol_log_64 | 8 | +| sol_log_ | 7 | +| sol_log_64_ | 8 | | sol_log_pubkey | 9 | | sol_log_compute_units_ | 10 | | sol_alloc_free_ | 11 | From 5d2ea35105add9acb056beaee592df92c56d26e2 Mon Sep 17 00:00:00 2001 From: Lucas Date: Mon, 16 Dec 2024 18:17:31 -0300 Subject: [PATCH 10/11] Fix syscall name and header flag --- proposals/0178-static-syscalls.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/0178-static-syscalls.md b/proposals/0178-static-syscalls.md index eb1cf3a5a..6f443da11 100644 --- a/proposals/0178-static-syscalls.md +++ b/proposals/0178-static-syscalls.md @@ -37,7 +37,7 @@ None. ## Detailed Design The following must go into effect if and only if a program indicates the SBPF -version XX or higher in its ELF header e_flags field, according to the +version `0x03` or higher in its ELF header e_flags field, according to the specification of SIMD-0161. ### New syscall instruction @@ -124,7 +124,7 @@ syscalls must strictly follow the numbering below. | sol_get_clock_sysvar | 36 | | sol_get_epoch_schedule_sysvar | 37 | | sol_get_last_restart_slot | 38 | -| sol_get_epoch_rewards_slot | 39 | +| sol_get_epoch_rewards_sysvar | 39 | | sol_get_fees_sysvar | 40 | | sol_get_rent_sysvar | 41 | |------------------------------------------|----------| From 16c0bed961a3372bc2d003471d4fc008bbdfce7b Mon Sep 17 00:00:00 2001 From: Lucas Date: Thu, 23 Jan 2025 18:03:00 -0300 Subject: [PATCH 11/11] Represent syscalls as the murmur32 hash of ther name --- proposals/0178-static-syscalls.md | 88 +++++++------------------------ 1 file changed, 20 insertions(+), 68 deletions(-) diff --git a/proposals/0178-static-syscalls.md b/proposals/0178-static-syscalls.md index 6f443da11..7ca5fa005 100644 --- a/proposals/0178-static-syscalls.md +++ b/proposals/0178-static-syscalls.md @@ -24,7 +24,7 @@ a new `return` instruction to supersede the `exit` instruction. The resolution of syscalls during ELF loading requires relocating addresses, which is a performance burden for the validator. Relocations require an entire copy of the ELF file in memory to either relocate addresses we fetch from the -symbol table or offset addresses to after the start of the virtual machine’s +symbol table or offset addresses to after the start of the virtual machine's memory. Moreover, relocations pose security concerns, as they allow the arbitrary modification of program headers and programs sections. A new separate opcode for syscalls modifies the behavior of the ELF loader, allowing @@ -45,26 +45,30 @@ specification of SIMD-0161. We introduce a new instruction in the SBPF instruction set, which we call `syscall`. It must be associated with all syscalls in the SBPF format. Its encoding consists of an opcode `0x95` and an immediate, which must refer to a -previously registered syscall. For more reference on the SBF ISA format, see -the +previously registered syscall hash code. For more reference on the SBF ISA +format, see the [spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md). -For simplicity, syscalls must be represented as a natural number greater than -zero, so that they can be organized in a lookup table. This choice allows for -quick retrieval of syscall information from integer indexes. An instruction -`syscall 2` must represent a call to the function registered at position two -in the lookup table. +We define the hash code for a syscall as the murmur32 hash of its respective +name. The 32-bit immediate value of the new `syscall` instruction must be the +integer representation of such a hash. For instance, the code for `abort` is +given by `murmur32("abort")`, so the instruction assembly should look like +`syscall 3069975057`. Consequently, system calls in the Solana SDK and in any related compiler tools -must be registered as function pointers, whose address is a natural number -greater than zero, representing their position in a syscall lookup table. The -verifier must enforce that the immediate of a syscall instruction points to a -valid syscall, and throw `VerifierError::InvalidSyscall` otherwise. +must be registered as function pointers, whose address is the murmur32 hash of +their name. The bytecode verifier must enforce that the immediate value of a +syscall instruction points to a valid syscall, and throw +`VerifierError::InvalidSyscall` otherwise. -This new instruction comes together with modifications in the verification -phase. `call imm` (opcode `0x85`) instructions must only refer to internal -calls and its immediate field must only be interpreted as a relative address -to jump from the program counter. +This new instruction comes together with modifications in the semantics of +`call imm` (opcode `0x85`) instructions, which must only refer to internal +calls and their immediate field must only be interpreted as a relative address +to jump from the program counter. + +Syscall names must NOT be present in the symbol table anymore, since the new +scheme does not require symbol relocations and obviates the need for symbols +to be referenced in the table. ### New return instruction @@ -77,58 +81,6 @@ throw a `VerifierError::UnknowOpCode`. Likewise, if, by any means, a V1 program reaches the execution stage containing the `0x9D` opcode, an `EbpfError::UnsupportedInstruction` must be raised. -### Syscall numbering convention - -Syscalls must be represented by a unique integer to maintain a dense lookup -table data structure for indexing and dispatch. For a clear correlation -between the existing syscalls and their respective identification number, -syscalls must strictly follow the numbering below. - -| Syscall name | Number | -|------------------------------------------|----------| -| abort | 1 | -| sol_panic_ | 2 | -| sol_memcpy_ | 3 | -| sol_memmove_ | 4 | -| sol_memset_ | 5 | -| sol_memcmp_ | 6 | -| sol_log_ | 7 | -| sol_log_64_ | 8 | -| sol_log_pubkey | 9 | -| sol_log_compute_units_ | 10 | -| sol_alloc_free_ | 11 | -| sol_invoke_signed_c | 12 | -| sol_invoke_signed_rust | 13 | -| sol_set_return_data | 14 | -| sol_get_return_data | 15 | -| sol_log_data | 16 | -| sol_sha256 | 17 | -| sol_keccak256 | 18 | -| sol_secp256k1_recover | 19 | -| sol_blake3 | 20 | -| sol_poseidon | 21 | -| sol_get_processed_sibling_instruction | 22 | -| sol_get_stack_height | 23 | -| sol_curve_validate_point | 24 | -| sol_curve_group_op | 25 | -| sol_curve_multiscalar_mul | 26 | -| sol_curve_pairing_map | 27 | -| sol_alt_bn128_group_op | 28 | -| sol_alt_bn128_compression | 29 | -| sol_big_mod_exp | 30 | -| sol_remaining_compute_units | 31 | -| sol_create_program_address | 32 | -| sol_try_find_program_address | 33 | -| sol_get_sysvar | 34 | -| sol_get_epoch_stake | 35 | -| sol_get_clock_sysvar | 36 | -| sol_get_epoch_schedule_sysvar | 37 | -| sol_get_last_restart_slot | 38 | -| sol_get_epoch_rewards_sysvar | 39 | -| sol_get_fees_sysvar | 40 | -| sol_get_rent_sysvar | 41 | -|------------------------------------------|----------| - ## Alternatives Considered None.