-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD-0189: SBPF stricter ELF headers #189
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
--- | ||
simd: '0189' | ||
title: SBPF stricter ELF headers | ||
authors: | ||
- Alexander Meißner | ||
category: Standard | ||
type: Core | ||
status: Idea | ||
created: 2024-10-21 | ||
feature: TBD | ||
extends: SIMD-0178, SIMD-0179 | ||
--- | ||
|
||
## Summary | ||
|
||
Imposes more restrictions on what is expected of ELF headers. | ||
|
||
## Motivation | ||
|
||
After the removal of relocations in SIMD-0178 the ELF layout could be massively | ||
be simplified by constraining it to a strict subset of what ELF otherwise | ||
allows. Doing so not only reduces the complexity of validator implementations | ||
but also reduces the attack surface. | ||
|
||
## Alternatives Considered | ||
|
||
Moving away from ELF as a container format altogether. However this would only | ||
gain a very small file size advantage but otherwise loose all tooling | ||
compatibility. | ||
|
||
## New Terminology | ||
|
||
None. | ||
|
||
## Detailed Design | ||
|
||
### File header | ||
|
||
The file size must not be less than `size_of::<Elf64Ehdr>()` (64 bytes), | ||
otherwise `ElfParserError::OutOfBounds` must be thrown. | ||
|
||
- `e_ident.ei_mag` must be `[0x7F, 0x45, 0x4C, 0x46]` | ||
- `e_ident.ei_class` must be `ELFCLASS64` (`0x02`) | ||
- `e_ident.ei_data` must be `ELFDATA2LSB` (`0x01`) | ||
- `e_ident.ei_version` must be `EV_CURRENT` (`0x01`) | ||
- `e_ident.ei_osabi` must be `ELFOSABI_NONE` (`0x00`) | ||
- `e_ident.ei_abiversion` must be `0x00` | ||
- `e_ident.ei_pad` must be `[0x00; 7]` | ||
- `e_type` must be `ET_DYN` (`0x0003`) | ||
- `e_machine` must be `EM_SBPF` (`0x0263`) | ||
- `e_version` must be `EV_CURRENT` (`0x00000001`) | ||
- `e_entry` is checked later (see dynamic symbol table) | ||
- `e_phoff` must be `size_of::<Elf64Ehdr>()` (64 bytes) | ||
- `e_shoff` is not checked | ||
- `e_flags` see SIMD-0161 | ||
- `e_ehsize` must be `size_of::<Elf64Ehdr>()` (64 bytes) | ||
- `e_phnum` must not be less than `0x0005` | ||
- `e_phoff + e_phnum * size_of::<Elf64Phdr>()` must be less than the file size | ||
- `e_phentsize` must be `size_of::<Elf64Phdr>()` (56 bytes) | ||
- `e_shnum` is not checked | ||
- `e_shentsize` must be `size_of::<Elf64Shdr>()` (64 bytes) | ||
- `e_shstrndx` must be less than `e_shnum` | ||
|
||
If any check fails `ElfParserError::InvalidFileHeader` must be thrown. | ||
|
||
### Program headers | ||
|
||
| purpose | p_type | p_flags | p_vaddr | | ||
| --------- | ------------ | ---------- | ------- | | ||
| bytecode | PT_LOAD | PF_X | 0 << 32 | | ||
| ro data | PT_LOAD | PF_R | 1 << 32 | | ||
| stack | PT_GNU_STACK | PF_R, PF_W | 2 << 32 | | ||
| heap | PT_LOAD | PF_R, PF_W | 3 << 32 | | ||
| symbols | PT_NONE | | 1 << 63 | | ||
|
||
For each of these predefined program headers: | ||
|
||
- `p_type` must match the `p_type` of the entry in the table above | ||
- `p_flags` must match the `p_flags` of the entry in the table above | ||
- `p_offset` must not be less than `e_phoff + e_phnum * size_of::<Elf64Phdr>()` | ||
- `p_offset` must be less than `file.len() as u64` | ||
- `p_offset` must be evenly divisible by 8 bytes, | ||
- `p_vaddr` must match the `p_vaddr` of the entry in the table above | ||
- `p_paddr` must match the `p_vaddr` of the entry in the table above | ||
- `p_filesz` must be: | ||
- `0` if the section is writable (the `PF_W` flag is set) | ||
- `p_memsz` otherwise (the `PF_W` flag is clear) | ||
- `p_filesz` must not be greater than `file.len() as u64 - p_offset` | ||
- `p_memsz` must fit in 32 bits / be less than `1 << 32` | ||
- `p_align` is ignored | ||
|
||
If any check fails `ElfParserError::InvalidProgramHeader` must be thrown. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mind specifying where in the program lifecycle this will be thrown? (i.e. will these new checks be enforced at deployment time? when loading into the program cache for execution? both?) What is the plan for currently deployed programs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or is this SIMD associated with an SBPF version? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is associated with a SBPF version (see title), though the specific one is not decided yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we specify in the SIMD where the checks will be performed, and what happens if they fail? |
||
|
||
### Dynamic symbol table | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you choose the dynamic symbol table entry There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
For each entry in the dynamic symbol table: | ||
|
||
- if `st_info` does not contain `STT_FUNC` it is ignored, otherwise: | ||
- the first `st_value` must be the start of the first program header, | ||
otherwise `ElfParserError::OutOfBounds` must be thrown | ||
- every subsequent `st_value` must be the end of the last | ||
(`st_value + st_size`), otherwise `ElfParserError::OutOfBounds` must be thrown | ||
- the `st_size` must be greater zero and evenly divisible by 8 bytes (the | ||
instruction size), otherwise `ElfParserError::InvalidSize` must be thrown | ||
- `st_value + st_size` must not be greater than the end of the first program | ||
header, otherwise `ElfParserError::OutOfBounds` must be thrown | ||
- the last `st_value + st_size` must end at the end of the first program | ||
header, otherwise `ElfParserError::OutOfBounds` must be thrown | ||
- the symbol is registered in the function registry for the subsequent | ||
bytecode verification pass | ||
|
||
In other words the `STT_FUNC` symbols must form an ordered | ||
[partition](https://en.wikipedia.org/wiki/Partition_of_a_set) of the virtual | ||
address space of the first program header. | ||
|
||
The `e_entry` filed of the file header must be a `STT_FUNC` entry in the | ||
dynamic symbol table, otherwise `ElfParserError::InvalidFileHeader` must be | ||
thrown. | ||
|
||
## Impact | ||
|
||
The toolchain linker will use a new linker script to adhere to these | ||
restrictions defined here and thus the change will be transparent to the dApp | ||
developers. | ||
|
||
The section headers are ignored so arbitrary metadata can continue to be | ||
encoded there. | ||
|
||
## Security Considerations | ||
|
||
None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we allow these headers? These regions are configured independently of the ELF. Do we allow these headers just so that we can reduce LLVM modifications? Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is so that the memory regions are correctly reflected in the ELF header. This makes it easier for tooling to detect issues (such as overlap in section allocation) and could simplify the validator implementation in the future.
Also, with the upcoming CPI syscall restrictions (necessary for direct mapping) we will limit pointers provided by the user to come from the stack and heap. Thus their layout will be further solidified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all programs make use of stack/heap/symbols. A noop program, for example, should be just 128 bytes (ELF header + 1 PT_LOAD PF_X + 1 exit insn). Enforcing 5 headers, 4 of which aren't even used, would bloat it to 352 bytes - even larger than the current mainnet-beta minimum of 336 bytes.
On the other hand, it's great to be able to handle for memory layouts that diverge from the default of a specific version without having to bump versions should the need arise. As such, I think it would be better not to strictly enforce these headers, but to instead opt for a solution like:
This solution would make the minimum program use 1 program header, the average program use 3 program headers, and if we ever had memory layout issues, they could be solvable for the small price of 1 or 2 extra program headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think that size savings here justify the added complexity and the security risks of them. In the past the ELF parser and loader have been some of the most error prone areas.
Also, keep in mind that you don't need sections and string tables anymore. In fact our new toolchain now produces somewhat smaller ELFs by default (after stripping debug stuff).