-
Couldn't load subscription status.
- Fork 774
Advanced relocation support for wat2wasm, wasm2wat, wasm-validate, wat-desugar #2649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…l info is now stored in IR
|
Wow, this is very impressive that you got all this to work @feedab1e. My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own |
I did not yet propose this anywhere else outside of my scope of work, however, I intend to use these annotations to output relocation information for my wasm backend in GCC (WIP). Also based on prior work with dynamic linking that wasm-tools proposed some time ago, I expect that they wouldn't be against this inclusion, but I'll ask |
Wow, a gcc backend! Thats is exciting! Are you sure it wouldn't make sense to use the
Just to be clear, the parts being proposed for addition here are specifically about object files and static linking. They only exist in the object file format, not the in executable or DSO fomat (Specifically the linking section, symbols and relocations). |
Well, for now my backend already outputs WAT which passes validation (although I still can't run it yet because of linking). That is different from LLVM, because LLVM never actually creates any text during compilation, and produces a binary directly. I cannot do that with GCC.
Yeah, that's true, but AFAIK those are still valid modules and I assume people would want to manipulate those too. |
|
The problem with |
LLVM can produce and consume Wasm assembly in the
This is true. |
The main reason its not documented and currently only used in LLVM is lacks documentation is that no other compiler has needed to use it yet. A GCC backend might be a great time and place to make it more official and documented. |
Well, that documentation would be nice in any case, but I suspect that someone from LLVM have to do it first before I can consider adopting |
On the issue for stabilizing inline asm support on wasm in rust I suggested using wat + some way to encode relocations instead of stabilizing LLVM's custom assembly format: rust-lang/rust#136382 (comment) If we get the format that this PR adds documented and eventually stabilized, that would make it feasible from a language perspective to use wat for inline asm in rust. But it would probably still be non-trivial to either add support for it to LLVM or to add a translation pass to rustc. The latter would add complexity to rustc, but I did personally feel a lot more comfortable stabilizing inline asm on wasm that way. If LLVM changes the assembly format we did only need to change the translation pass, not all user code. |
|
I would point out that using a particular assembly format is a user-visible contract in the case of a compiler like GCC due to inline/module-level assembly (it is also the reason why the |
|
I can perhaps lend some thoughts from a wasm-tools perspective -- this is a feature I've long wanted! I don't necessarily have a killer use case for this, though, and mostly historically for me it's been in the bucket of "it'd be neat to print the object-related custom sections with annotations". I would have no aspirations to supplant LLVM's assembly syntax and I'd understand that it'd be a perpetual game of catch-up if LLVM added new features. I've made half-hearted attempts to implement something like this in wasm-tools historically but the "known limitations" listed in the PR description here are mostly what stopped me, especially the one about relocations in data sections. I've had problems historically trying to retrofit s-expressions and the text format with relocations and I've found that it's not always the most suitable. For example Now that being said I wouldn't want to stop effort on implementing this! I'm all for having a shared convention amongst tools as much as anyone else, and wasm having an official text format I think is a great place to start from. One possibility is that if the official text format isn't amenable enough for relocations it might be possible to make offical changes to make it more amenable (iunno what these would be but I suspect the CG would be receptive to tweaks to the text format). I'd also have bikeshedding opinions about various syntaxes in play here, but I'll reserve those for a different time since it's always easy to tweak. |
The "known limitations" part of this PR is more about WABT itself rather than the syntax. As for addends, the issue there is that in the binary the relocation section comes after the code, so creating debug labels in an already formed IR would be a struggle. And for multiple symbols, it would be fairly easy to just support multiple
That's pretty much what I did for relocations. When the relocation is in the code section, its format is inferred from the instruction's operand being relocated, and when the relocation is in the data section, the user has to specify the relocation's |
This problem could be avoided by making it a requirement that the annotated constant instruction has the value 0x8000_0000, which will always take 5 bytes.
If somebody moved the Wat Numerical Values proposal forward from phase 2, then that would perhaps provide a nicer way to annotate data segments. :) |
Does GCC not have a kind of generic |
No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend. |
I see, that makes sense then. Sounds like with gcc there is way less motivation to make you assembly format resemble traditional formats since there is less code sharing between backends. However, I do think it might be worth while at least considering sharing an assembly format with LLVM since having a different assembly format for each compiler seems like maybe a bad outcome? |
As someone who has done a fair amount of work on the de-dacto assembly format in LLVM I'd be happen to help document the existing format and maybe even update it to remove any rough edges. |
Yeah, I think we can infer relocation types from those declarations too. However, I see that that proposal lacks a way to output a leb, which is needed for debug info, for example, but I think that can be simply added into the spec. One other thing is that I don't really know how would roundtrips look like with this proposal; I suspect that everything apart from relocations would have to just be text |
How complicated would it be to move LLVM to use WAT? My main concern here is that WAT consumers aren't rushing to adopt
@alexcrichton, would wasm-tools be willing to to implement a parser for |
Thats a good question. At the bare minimum it would require writing a completely new assembly parser and writer. But I don't know if there would be even farther reaching effects. LLVM backends share a lot of common code in this area (The MC layer) but I'm don't have enough knowledge to know how much work it would be or what the long term maintenance costs would be in diverging. |
Oh! Never mind me then. In that case this is definitely something I'd like to implement in wasm-tools eventually as well. If you're up for it I'd like to have a chance to bikeshed some syntax and such, so would you be up for sending a PR to
True! That's not always applicable, though, because for example This actually also reminds me @feedab1e that one other thing I ran into was that relocations would be on sub-parts of immediates of instructions and I wasn't sure how to represent that in the text syntax. For example with
Personally I wouldn't be too interested in implementing it myself at least, but I want to qualify this with some more words as well. For wasm-tools it's centered around the wasm binary and text format as defined in various specifications and is intended to provide ways to manipulate/inspect/debug these formats. The I also realize my opinion isn't necessarily being solicited here but if you'll indulge me I'll go ahead and give it anyway. On one hand there's no denying the current reality of the |
Sure, I'll do that.
The linking spec mandates the use of overlong lebs anyway in places where a relocation occurs, and that's also what
I don't think that this would be a huge issue for me, since in this implementation the actual relocation types are looked up by their
So the current situation is that there is no GCC tooling at all for the
Your opinion is totally welcome here. So, the thing is that at this point I think at least for me it would be easier to continue the development of my backend with WAT and not Of the features that probably exist in |
|
@alexcrichton here's the PR: WebAssembly/tool-conventions#258 |
Overview
This PR adds support for specifying symbol attributes on wasm entities (functions, globals, events, tables), and adds support for relocation attributes on instructions that accept relocatable operands (i{32,64}.const, i{32,64}.{load,store}) for better conformance to https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md.
This also addresses the old issue of #1199 (comment)
Symbol attributes
Symbol attributes are written using
@symannotation, contents of which are attributes for the symbol that correspond to flags in the symbol table:weakWASM_SYM_BINDING_WEAKstaticWASM_SYM_BINDING_LOCALhiddenWASM_SYM_VISIBILITY_HIDDENretainWASM_SYM_NO_STRIPIn addition, following parametric attributes are supported:
name="<name>"— setsWASM_SYM_EXPLICIT_NAMEand binds a name to the symbolpriority=<int>— adds a function to theinit_funclist with the specified priorityWASM entity symbols
Symbols corresponding to WASM entities are specified inline with their definitions, and the annotation is placed after the entity name.
Example:
will create a function with the name "initme" and put it into an init vector with priority 100.
Data symbols
Data symbols are specified inline with the string that defines the data. Unlike entity symbols, data symbols don't imbue the name of their corresponding entities, so a WASM var is assigned to every one.
For defined data symbols, attribute
size=<int>must be specified, which reflects the size of that symbol.Example
defines a symbol named "x" sized 1 byte.
Data imports
Some data symbols are not defined, so there isn't a place for them inside the data section declarations.
For that purpose, there is an option to use a
@sym.import.dataannotation where a module field would naturally occur, with syntax inside being exactly the same as inside of a@symannotation.Example
will declare an imported data symbol with the name "bar"
Relocations
Relocations are the other crucial part of this PR, as they actually the ones allowing people to write binaries that, for example, take an address of a function
Syntax
Relocations use the format
(@reloc <shape> <method> <symbol> <attr-opt>), meaning of which is described below.shapeencodes for the data type of relocation (i.e. how many bytes will be rewritten and in which format).It is one of
i32,i64,leb,sleb,leb64, orsleb64.methodencodes for the type of relocation, so what kind of symbol we are relocating against and how to interpret that symbol.<method>tagR_WASM_EVENT_INDEX_*tableR_WASM_TABLE_NUMBER_*globalR_WASM_GLOBAL_INDEX_*funcR_WASM_FUNCTION_INDEX_*functableR_WASM_TABLE_INDEX_*textR_WASM_FUNCTION_OFFSETsectionR_WASM_SECTION_OFFSETdataR_WASM_MEMORY_ADDR_*attr-optencodes for the additional attributes that a relocation might have.<attr-opt>picR_WASM_*_LOCREL_*,R_WASM_*_REL_*env.__memory_baseorenv.__table_base, used for dynamic linkingtlsR_WASM_*_TLS*env.__tls_base, used for thread-local storageIt is obvious that not every combination of relocation methods and relocation shapes exists, so for invalid ones an error will be raised.
Instruction relocations
For relocations targeting instruction operands, the need for relocation shapes is obviated, therefore they have the form
(@reloc <method> <symbol>).The only instructions that currently need explicit relocations are const expressions, and load/store expressions.
For const expressions their relocations target the constant operand and are written after that operand:
For load/store expressions, their relocations target their offsets, and are therefore written after the offset directive:
Data relocations
Like data symbols, data relocations are written inline with the data segment text, but unlike symbols, and like instruction relocations, they are written at the end of the byte sequence to rewrite.
For example:
will produce a data section relocation starting at offset 0.
Approach
Since the current binary reader is single-pass, during the parsing phase, store the references to all instructions that can potentially be relocated into their respective relocation queue (one per section). At the respective
relocsection, look up the target section's queue, and store all relocations into that queue. At the end of the module, go through each of the queues, and use relocations seen to link instructions/data segments to their respective symbols.Known limitations
This PR, while improving the relocation support significantly, still lacks full compatibility to the linking spec.
Multiple symbols per WebAssembly entity
In particular, it is not possible at at the moment to create several symbols referencing the same WASM entity, but symbols in this PR are directly tied to their respective entity. This is done to avoid having to explicitly annotate every non-memory mention of a WebAssembly entity to resolve which of the symbols is being referred to. Apart from that, the actual semantics that is implied by attaching several potentially conflicting symbols to a single entity is not really clear.
Offset relocations and their addends
Section/function offset relocations are crucial for emitting accurate debug info, for that Embedded DWARF uses relocations that reference functions/sections, and uses the addend field to specify an offset into its text. To accurately represent that In WAT we would like to have something like debug labels, that are inserted inline into the instruction stream or a custom section data buffer. Unfortunately, because the
BinaryReaderIRdoes not interface directly with the binary, but instead has to go throughBinaryReaderDelegate, it's not possible to accurately predict where an instruction starts, so it's not possible to accurately determine the place where that addend would usually point to, so it's not possible to reliably reconstruct the debug label.