Advanced relocation support for wat2wasm, wasm2wat, wasm-validate, wat-desugar #2649

feedab1e · 2025-10-13T20:32:38Z

Overview

This PR adds support for specifying symbol attributes on wasm entities (functions, globals, events, tables), and adds support for relocation attributes on instructions that accept relocatable operands (i{32,64}.const, i{32,64}.{load,store}) for better conformance to https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md.
This also addresses the old issue of #1199 (comment)

Symbol attributes

Symbol attributes are written using @sym annotation, contents of which are attributes for the symbol that correspond to flags in the symbol table:

Symbol attribute	Corresponding flag
`weak`	`WASM_SYM_BINDING_WEAK`
`static`	`WASM_SYM_BINDING_LOCAL`
`hidden`	`WASM_SYM_VISIBILITY_HIDDEN`
`retain`	`WASM_SYM_NO_STRIP`

In addition, following parametric attributes are supported:

name="<name>" — sets WASM_SYM_EXPLICIT_NAME and binds a name to the symbol
priority=<int> — adds a function to the init_func list with the specified priority

WASM entity symbols

Symbols corresponding to WASM entities are specified inline with their definitions, and the annotation is placed after the entity name.

Example:

(func $foobar (@sym name="initme" priority=100)
  (param i32) (result i32) (local i32) 
  (local.get 0))

will create a function with the name "initme" and put it into an init vector with priority 100.

Data symbols

Data symbols are specified inline with the string that defines the data. Unlike entity symbols, data symbols don't imbue the name of their corresponding entities, so a WASM var is assigned to every one.
For defined data symbols, attribute size=<int> must be specified, which reflects the size of that symbol.

Example

(data (i32.const 0) (@sym $foo size=1 name="x") "\0")

defines a symbol named "x" sized 1 byte.

Data imports

Some data symbols are not defined, so there isn't a place for them inside the data section declarations.
For that purpose, there is an option to use a @sym.import.data annotation where a module field would naturally occur, with syntax inside being exactly the same as inside of a @sym annotation.

Example

(module
  (@sym.import.data $foo name="bar"))

will declare an imported data symbol with the name "bar"

Relocations

Relocations are the other crucial part of this PR, as they actually the ones allowing people to write binaries that, for example, take an address of a function

Syntax

Relocations use the format (@reloc <shape> <method> <symbol> <attr-opt>), meaning of which is described below.

shape encodes for the data type of relocation (i.e. how many bytes will be rewritten and in which format).
It is one of i32, i64, leb, sleb, leb64, or sleb64.
method encodes for the type of relocation, so what kind of symbol we are relocating against and how to interpret that symbol.

`<method>`	symbol kind	corresponding relocation constants	interpretation
`tag`	event	`R_WASM_EVENT_INDEX_*`	Final WebAssembly event index
`table`	global	`R_WASM_TABLE_NUMBER_*`	Final WebAssembly table index (index of a table, not into one)
`global`	global	`R_WASM_GLOBAL_INDEX_*`	Final WebAssembly global index
`func`	function	`R_WASM_FUNCTION_INDEX_*`	Final WebAssembly function index
`functable`	function	`R_WASM_TABLE_INDEX_*`	Index into the dynamic function table, used for taking address of functions
`text`	function	`R_WASM_FUNCTION_OFFSET`	Offset into the function body, used for debugging
`section`	function	`R_WASM_SECTION_OFFSET`	Offset into a custom section, used for debugging
`data`	data	`R_WASM_MEMORY_ADDR_*`	WebAssembly linear memory address

attr-opt encodes for the additional attributes that a relocation might have.

`<attr-opt>`	corresponding relocation constants	interpretation
nothing	nothing	Normal relocation
`pic`	`R_WASM__LOCREL_`, `R_WASM__REL_`	Address relative to `env.__memory_base` or `env.__table_base`, used for dynamic linking
`tls`	`R_WASM__TLS`	Address relative to `env.__tls_base`, used for thread-local storage

It is obvious that not every combination of relocation methods and relocation shapes exists, so for invalid ones an error will be raised.

Instruction relocations

For relocations targeting instruction operands, the need for relocation shapes is obviated, therefore they have the form (@reloc <method> <symbol>).

The only instructions that currently need explicit relocations are const expressions, and load/store expressions.
For const expressions their relocations target the constant operand and are written after that operand:

(module
  (func $foo (result i32)
    i32.const 0 (@reloc functable $foo)))

For load/store expressions, their relocations target their offsets, and are therefore written after the offset directive:

(module
  (@sym.import.data $foo name="x")
  (func $foo (result i32)
    i32.const 0
    i32.load offset=0 (@reloc data $foo)))

Data relocations

Like data symbols, data relocations are written inline with the data segment text, but unlike symbols, and like instruction relocations, they are written at the end of the byte sequence to rewrite.
For example:

(module
  (import (func $foo (@sym name="foo")))
  (data (i32.const 0) (@sym (; 0 ;) name="foo_pfn" size=4) "\0\0\0\0" (@reloc i32 functable $foo)))

will produce a data section relocation starting at offset 0.

Approach

Since the current binary reader is single-pass, during the parsing phase, store the references to all instructions that can potentially be relocated into their respective relocation queue (one per section). At the respective reloc section, look up the target section's queue, and store all relocations into that queue. At the end of the module, go through each of the queues, and use relocations seen to link instructions/data segments to their respective symbols.

Known limitations

This PR, while improving the relocation support significantly, still lacks full compatibility to the linking spec.

Multiple symbols per WebAssembly entity

In particular, it is not possible at at the moment to create several symbols referencing the same WASM entity, but symbols in this PR are directly tied to their respective entity. This is done to avoid having to explicitly annotate every non-memory mention of a WebAssembly entity to resolve which of the symbols is being referred to. Apart from that, the actual semantics that is implied by attaching several potentially conflicting symbols to a single entity is not really clear.

Offset relocations and their addends

Section/function offset relocations are crucial for emitting accurate debug info, for that Embedded DWARF uses relocations that reference functions/sections, and uses the addend field to specify an offset into its text. To accurately represent that In WAT we would like to have something like debug labels, that are inserted inline into the instruction stream or a custom section data buffer. Unfortunately, because the BinaryReaderIR does not interface directly with the binary, but instead has to go through BinaryReaderDelegate, it's not possible to accurately predict where an instruction starts, so it's not possible to accurately determine the place where that addend would usually point to, so it's not possible to reliably reconstruct the debug label.

…l info is now stored in IR

feedab1e · 2025-10-16T14:28:29Z

@sbc100 @binji you might want to take a look at this

sbc100 · 2025-10-16T22:47:14Z

Wow, this is very impressive that you got all this to work @feedab1e.

My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own .s format for representing all of this. Have you proposed this elsewhere (e.g. wasm-tools) and found folks generally supportive of this kind of usage of wat?

@dschuff @tlively @kripken WDYT?

feedab1e · 2025-10-17T00:57:27Z

Wow, this is very impressive that you got all this to work @feedab1e.

My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own .s format for representing all of this. Have you proposed this elsewhere (e.g. wasm-tools) and found folks generally supportive of this kind of usage of wat?

@dschuff @tlively @kripken WDYT?

I did not yet propose this anywhere else outside of my scope of work, however, I intend to use these annotations to output relocation information for my wasm backend in GCC (WIP). Also based on prior work with dynamic linking that wasm-tools proposed some time ago, I expect that they wouldn't be against this inclusion, but I'll ask

sbc100 · 2025-10-17T01:57:28Z

Wow, this is very impressive that you got all this to work @feedab1e.
My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own .s format for representing all of this. Have you proposed this elsewhere (e.g. wasm-tools) and found folks generally supportive of this kind of usage of wat?
@dschuff @tlively @kripken WDYT?

I did not yet propose this anywhere else outside of my scope of work, however, I intend to use these annotations to output relocation information for my wasm backend in GCC (WIP).

Wow, a gcc backend! Thats is exciting! Are you sure it wouldn't make sense to use the .s format there, like the one that llvm uses? I assume that the gcc assembly and disassembly won't be outputtint wat ? Or if they did it would very different to the llvm approach.

Also based on prior work with dynamic linking that wasm-tools proposed some time ago, I expect that they wouldn't be against this inclusion, but I'll ask

Just to be clear, the parts being proposed for addition here are specifically about object files and static linking. They only exist in the object file format, not the in executable or DSO fomat (Specifically the linking section, symbols and relocations).

feedab1e · 2025-10-17T02:10:18Z

Wow, a gcc backend! Thats is exciting! Are you sure it wouldn't make sense to use the .s format there, like the one that llvm uses? I assume that the gcc assembly and disassembly won't be outputtint wat ? Or if they did it would very different to the llvm approach.

Well, for now my backend already outputs WAT which passes validation (although I still can't run it yet because of linking). That is different from LLVM, because LLVM never actually creates any text during compilation, and produces a binary directly. I cannot do that with GCC.

Just to be clear, the parts being proposed for addition here are specifically about object files and static linking. They only exist in the object file format, not the in executable or DSO fomat (Specifically the linking section, symbols and relocations).

Yeah, that's true, but AFAIK those are still valid modules and I assume people would want to manipulate those too.

feedab1e · 2025-10-17T02:23:04Z

The problem with .s for me is that I would be relying on what is essentially an implementation detail of LLVM, a format which is undocumented, and implementation of which exists (and will likely continue to exist) only in LLVM. And a hard dependency on LLVM is not a good look for GCC.

sbc100 · 2025-10-17T02:23:23Z

Wow, a gcc backend! Thats is exciting! Are you sure it wouldn't make sense to use the .s format there, like the one that llvm uses? I assume that the gcc assembly and disassembly won't be outputtint wat ? Or if they did it would very different to the llvm approach.

Well, for now my backend already outputs WAT which passes validation (although I still can't run it yet because of linking). That is different from LLVM, because LLVM never actually creates any text during compilation, and produces a binary directly. I cannot do that with GCC.

LLVM can produce and consume Wasm assembly in the .s format. You can see this if you pass -fno-integrated-as (maybe along with -save-temps?) The format used if very similar to existing architectures assembly formats.

Just to be clear, the parts being proposed for addition here are specifically about object files and static linking. They only exist in the object file format, not the in executable or DSO fomat (Specifically the linking section, symbols and relocations).

Yeah, that's true, but AFAIK those are still valid modules and I assume people would want to manipulate those too.

This is true.

sbc100 · 2025-10-17T02:25:54Z

The problem with .s for me is that I would be relying on what is essentially an implementation detail of LLVM, a format which is undocumented, and implementation of which exists (and will likely continue to exist) only in LLVM. And a hard dependency on LLVM is not a good look for GCC.

The main reason its not documented and currently only used in LLVM is lacks documentation is that no other compiler has needed to use it yet. A GCC backend might be a great time and place to make it more official and documented.

feedab1e · 2025-10-17T02:34:04Z

The problem with .s for me is that I would be relying on what is essentially an implementation detail of LLVM, a format which is undocumented, and implementation of which exists (and will likely continue to exist) only in LLVM. And a hard dependency on LLVM is not a good look for GCC.

The main reason its not documented and currently only used in LLVM is lacks documentation is that no other compiler has needed to use it yet. A GCC backend might be a great time and place to make it more official and documented.

Well, that documentation would be nice in any case, but I suspect that someone from LLVM have to do it first before I can consider adopting .s, and then I assume someone would also have to develop a parser for this format either here, or in binaryen, or in wasm-tools, so that I wouldn't need to rely on LLVM for compilation using GCC. Contrast to that, I found this to be fairly easily implementable here, and I assume that would also be the case for other WAT consumers (since implementation-wise this is fairly similar to code metadata)

bjorn3 · 2025-10-17T14:26:44Z

My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own .s format for representing all of this. Have you proposed this elsewhere (e.g. wasm-tools) and found folks generally supportive of this kind of usage of wat?

On the issue for stabilizing inline asm support on wasm in rust I suggested using wat + some way to encode relocations instead of stabilizing LLVM's custom assembly format: rust-lang/rust#136382 (comment) If we get the format that this PR adds documented and eventually stabilized, that would make it feasible from a language perspective to use wat for inline asm in rust. But it would probably still be non-trivial to either add support for it to LLVM or to add a translation pass to rustc. The latter would add complexity to rustc, but I did personally feel a lot more comfortable stabilizing inline asm on wasm that way. If LLVM changes the assembly format we did only need to change the translation pass, not all user code.

SingleAccretion · 2025-10-17T15:21:33Z

I would point out that using a particular assembly format is a user-visible contract in the case of a compiler like GCC due to inline/module-level assembly (it is also the reason why the .S format is not really 'LLVM internal' - it is part of the public interface). So if you want to compile a repository like emscripten which uses .S files, you need to support it. Introducing a separate format that is less powerful is a downside for compiler portability.

alexcrichton · 2025-10-17T15:32:37Z

I can perhaps lend some thoughts from a wasm-tools perspective -- this is a feature I've long wanted! I don't necessarily have a killer use case for this, though, and mostly historically for me it's been in the bucket of "it'd be neat to print the object-related custom sections with annotations". I would have no aspirations to supplant LLVM's assembly syntax and I'd understand that it'd be a perpetual game of catch-up if LLVM added new features.

I've made half-hearted attempts to implement something like this in wasm-tools historically but the "known limitations" listed in the PR description here are mostly what stopped me, especially the one about relocations in data sections. I've had problems historically trying to retrofit s-expressions and the text format with relocations and I've found that it's not always the most suitable. For example i32.const 0 (@reloc functable $foo) needs to sort of magically encode a 5-byte leb for the i32.const value. Relocations in data sections would need something like (data (i32.const ...) "..." (@reloc ...) "..." (@reloc ...) ...) or something like that. I more-or-less kept coming to the conclusion that LLVM's assembly format is quite suitable for these relocations and such but the standard wasm text format is much less amenable.

Now that being said I wouldn't want to stop effort on implementing this! I'm all for having a shared convention amongst tools as much as anyone else, and wasm having an official text format I think is a great place to start from. One possibility is that if the official text format isn't amenable enough for relocations it might be possible to make offical changes to make it more amenable (iunno what these would be but I suspect the CG would be receptive to tweaks to the text format). I'd also have bikeshedding opinions about various syntaxes in play here, but I'll reserve those for a different time since it's always easy to tweak.

feedab1e · 2025-10-17T16:18:40Z

I've made half-hearted attempts to implement something like this in wasm-tools historically but the "known limitations" listed in the PR description here are mostly what stopped me, especially the one about relocations in data sections.

The "known limitations" part of this PR is more about WABT itself rather than the syntax. As for addends, the issue there is that in the binary the relocation section comes after the code, so creating debug labels in an already formed IR would be a struggle. And for multiple symbols, it would be fairly easy to just support multiple @sym annotations on a symbol, it's just that the semantics of that is not really clear.
That said, data symbols and relocations work perfectly fine, I think, both in this syntax and in the implementation

I've had problems historically trying to retrofit s-expressions and the text format with relocations and I've found that it's not always the most suitable. For example i32.const 0 (@reloc functable $foo) needs to sort of magically encode a 5-byte leb for the i32.const value. Relocations in data sections would need something like (data (i32.const ...) "..." (@reloc ...) "..." (@reloc ...) ...) or something like that.

That's pretty much what I did for relocations. When the relocation is in the code section, its format is inferred from the instruction's operand being relocated, and when the relocation is in the data section, the user has to specify the relocation's shape so that the assembler can recognize the relocation type. And yes, relocations are spliced into the data, but I think that's just the only reasonable way to do it, LLVM syntax or otherwise

rossberg · 2025-10-17T17:03:02Z

For example i32.const 0 (@Reloc functable $foo) needs to sort of magically encode a 5-byte leb for the i32.const value.

This problem could be avoided by making it a requirement that the annotated constant instruction has the value 0x8000_0000, which will always take 5 bytes.

And yes, relocations are spliced into the data, but I think that's just the only reasonable way to do it, LLVM syntax or otherwise

If somebody moved the Wat Numerical Values proposal forward from phase 2, then that would perhaps provide a nicer way to annotate data segments. :)

sbc100 · 2025-10-17T17:05:07Z

Well, that documentation would be nice in any case, but I suspect that someone from LLVM have to do it first before I can consider adopting .s, and then I assume someone would also have to develop a parser for this format either here, or in binaryen, or in wasm-tools, so that I wouldn't need to rely on LLVM for compilation using GCC.

Does GCC not have a kind of generic .s file reader/write like LLVM does? At least it LLVM one of the main reasons for using the .s format was to be able to re-use all the existing machinery for dealing with assembly files. There was no need to write any new parser or writer at all.

feedab1e · 2025-10-17T17:11:28Z

Well, that documentation would be nice in any case, but I suspect that someone from LLVM have to do it first before I can consider adopting .s, and then I assume someone would also have to develop a parser for this format either here, or in binaryen, or in wasm-tools, so that I wouldn't need to rely on LLVM for compilation using GCC.

Does GCC not have a kind of generic .s file reader/write like LLVM does? At least it LLVM one of the main reasons for using the .s format was to be able to re-use all the existing machinery for dealing with assembly files. There was no need to write any new parser or writer at all.

No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend.

sbc100 · 2025-10-17T17:15:23Z

No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend.

I see, that makes sense then. Sounds like with gcc there is way less motivation to make you assembly format resemble traditional formats since there is less code sharing between backends. However, I do think it might be worth while at least considering sharing an assembly format with LLVM since having a different assembly format for each compiler seems like maybe a bad outcome?

sbc100 · 2025-10-17T17:16:50Z

No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend.

I see, that makes sense then. Sounds like with gcc there is way less motivation to make you assembly format resemble traditional formats since there is less code sharing between backends. However, I do think it might be worth while at least considering sharing an assembly format with LLVM since having a different assembly format for each compiler seems like maybe a bad outcome?

As someone who has done a fair amount of work on the de-dacto assembly format in LLVM I'd be happen to help document the existing format and maybe even update it to remove any rough edges.

feedab1e · 2025-10-17T17:19:28Z

And yes, relocations are spliced into the data, but I think that's just the only reasonable way to do it, LLVM syntax or otherwise

If somebody moved the Wat Numerical Values proposal forward from phase 2, then that would perhaps provide a nicer way to annotate data segments. :)

Yeah, I think we can infer relocation types from those declarations too. However, I see that that proposal lacks a way to output a leb, which is needed for debug info, for example, but I think that can be simply added into the spec. One other thing is that I don't really know how would roundtrips look like with this proposal; I suspect that everything apart from relocations would have to just be text

feedab1e · 2025-10-17T17:32:45Z

No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend.

I see, that makes sense then. Sounds like with gcc there is way less motivation to make you assembly format resemble traditional formats since there is less code sharing between backends. However, I do think it might be worth while at least considering sharing an assembly format with LLVM since having a different assembly format for each compiler seems like maybe a bad outcome?

How complicated would it be to move LLVM to use WAT? My main concern here is that WAT consumers aren't rushing to adopt .s, so if I am to adopt it, it would still mean that two projects out of the entire WASM ecosystem use their own oddball format, which would still be a bad outcome.

As someone who has done a fair amount of work on the de-dacto assembly format in LLVM I'd be happen to help document the existing format and maybe even update it to remove any rough edges.

@alexcrichton, would wasm-tools be willing to to implement a parser for .s if a spec for it lands?

sbc100 · 2025-10-17T17:44:00Z

How complicated would it be to move LLVM to use WAT?

Thats a good question. At the bare minimum it would require writing a completely new assembly parser and writer. But I don't know if there would be even farther reaching effects. LLVM backends share a lot of common code in this area (The MC layer) but I'm don't have enough knowledge to know how much work it would be or what the long term maintenance costs would be in diverging.

alexcrichton · 2025-10-17T18:50:58Z

The "known limitations" part of this PR is more about WABT itself rather than the syntax.

Oh! Never mind me then. In that case this is definitely something I'd like to implement in wasm-tools eventually as well. If you're up for it I'd like to have a chance to bikeshed some syntax and such, so would you be up for sending a PR to Linking.md with a section describing the text format? That'd be a good place to explain how all the various constructs translate to text and would also provide a good place IMO to do some minor bikeshedding. If that's more than you're wanted to bite off and chew though I understand.

This problem could be avoided by making it a requirement that the annotated constant instruction has the value 0x8000_0000, which will always take 5 bytes.

True! That's not always applicable, though, because for example call 0 also needs to be encoded as a 5-byte leb and call 0x8000_0000 would produce an invalid wasm.

This actually also reminds me @feedab1e that one other thing I ran into was that relocations would be on sub-parts of immediates of instructions and I wasn't sure how to represent that in the text syntax. For example with call_indirect there's one relocation for the table index and one relocation for the type index. This can work with enough "assume this reloc attached here means this" style logic but it was scenarios like this where I started to lose steam personally for implementing this historically.

@alexcrichton, would wasm-tools be willing to to implement a parser for .s if a spec for it lands?

Personally I wouldn't be too interested in implementing it myself at least, but I want to qualify this with some more words as well. For wasm-tools it's centered around the wasm binary and text format as defined in various specifications and is intended to provide ways to manipulate/inspect/debug these formats. The .s format would be more compiler-oriented in terms of LLVM or GCC and effectively wouldn't mesh well with what's already in wasm-tools. I suspect it'd be an entirely new crate/implementation/subcommands/etc, and at that point I'm not really sure what the benefit would be over using LLVM's or GCC's tooling that already exists for the .s format.

I also realize my opinion isn't necessarily being solicited here but if you'll indulge me I'll go ahead and give it anyway. On one hand there's no denying the current reality of the .s format naturally meshing well with LLVM and supporting all the various features necessary that LLVM needs. On the other hand though there's also no denying that wasm has a different, officially specified, text format that shares instruction names but not much else. Balancing these two is, in my opinion, not possible without elbow grease going somewhere. I wouldn't be surprised if updating LLVM was a major effort, but I also wouldn't be surprised if new users continued to be surprised that the .s format is different than the official text format. I suspect the most expedient way forward for GCC would be to implement LLVM's .s format, but I also don't know much about GCC backends so I don't know if that would be easier or harder than implementing the wasm standard text format.

feedab1e · 2025-10-17T23:18:08Z

The "known limitations" part of this PR is more about WABT itself rather than the syntax.

Oh! Never mind me then. In that case this is definitely something I'd like to implement in wasm-tools eventually as well. If you're up for it I'd like to have a chance to bikeshed some syntax and such, so would you be up for sending a PR to Linking.md with a section describing the text format? That'd be a good place to explain how all the various constructs translate to text and would also provide a good place IMO to do some minor bikeshedding. If that's more than you're wanted to bite off and chew though I understand.

Sure, I'll do that.

This problem could be avoided by making it a requirement that the annotated constant instruction has the value 0x8000_0000, which will always take 5 bytes.

True! That's not always applicable, though, because for example call 0 also needs to be encoded as a 5-byte leb and call 0x8000_0000 would produce an invalid wasm.

The linking spec mandates the use of overlong lebs anyway in places where a relocation occurs, and that's also what wat2wasm already does with the -r flag, so I see no issue here.

This actually also reminds me @feedab1e that one other thing I ran into was that relocations would be on sub-parts of immediates of instructions and I wasn't sure how to represent that in the text syntax. For example with call_indirect there's one relocation for the table index and one relocation for the type index. This can work with enough "assume this reloc attached here means this" style logic but it was scenarios like this where I started to lose steam personally for implementing this historically.

I don't think that this would be a huge issue for me, since in this implementation the actual relocation types are looked up by their shape and method, so it wouldn't be hard to just hardcode the method too for the cases where multiple symbols exist for the same wasm entity.

@alexcrichton, would wasm-tools be willing to to implement a parser for .s if a spec for it lands?

Personally I wouldn't be too interested in implementing it myself at least, but I want to qualify this with some more words as well. For wasm-tools it's centered around the wasm binary and text format as defined in various specifications and is intended to provide ways to manipulate/inspect/debug these formats. The .s format would be more compiler-oriented in terms of LLVM or GCC and effectively wouldn't mesh well with what's already in wasm-tools. I suspect it'd be an entirely new crate/implementation/subcommands/etc, and at that point I'm not really sure what the benefit would be over using LLVM's or GCC's tooling that already exists for the .s format.

So the current situation is that there is no GCC tooling at all for the .s format, it only exists in LLVM. Since I am the one writing the implementation of WASM in GCC, one of the main priorities for me is to be compatible with most of the ecosystem, so I'd rather choose whatever is most supported, which, in this case would be WAT, especially because the wider ecosystem isn't very receptive of the .s format.

I also realize my opinion isn't necessarily being solicited here but if you'll indulge me I'll go ahead and give it anyway. On one hand there's no denying the current reality of the .s format naturally meshing well with LLVM and supporting all the various features necessary that LLVM needs. On the other hand though there's also no denying that wasm has a different, officially specified, text format that shares instruction names but not much else. Balancing these two is, in my opinion, not possible without elbow grease going somewhere. I wouldn't be surprised if updating LLVM was a major effort, but I also wouldn't be surprised if new users continued to be surprised that the .s format is different than the official text format. I suspect the most expedient way forward for GCC would be to implement LLVM's .s format, but I also don't know much about GCC backends so I don't know if that would be easier or harder than implementing the wasm standard text format.

Your opinion is totally welcome here. So, the thing is that at this point I think at least for me it would be easier to continue the development of my backend with WAT and not .s, since I did my development against WAT initially, and at this point rewriting that to use the other syntax (and reimplementing the entirety of LLVM's assembler) would be more effort than continuing development against WAT and its standard assemblers. I think it will be possible to implement either syntax or even both, given time and effort (and a spec for the format). But then again, if no one except for LLVM (and possibly some time later binutils) would be able to parse that format, it would be suboptimal for both GCC and the wider ecosystem.

Of the features that probably exist in .s and are lacking in WAT are Linking and DWARF Debugging, but I think we can achieve format parity if both are to be implemented (and there will be more work in case of .s since there the entire assembler will have to be implemented from scratch, including linking and debug info)

feedab1e · 2025-10-18T09:53:16Z

@alexcrichton here's the PR: WebAssembly/tool-conventions#258

…ions

feedab1e added 30 commits October 5, 2025 05:20

Move SymbolTable out of binary writer

6fb2a4a

Add representation for relocations in instructions

30e3733

Add support for relocations in the binary reader

4aaed1e

Add support for relocations in the text writer

3c9615d

Add a check for token contents in ParseCodeMetadataAnnotation

6d48401

Small adjustments to reloc printing

aa75317

Make fields of SymbolCommon public

f15f4ce

Add support for relocations in wast-parser

0475fb1

Add size output for data symbols

f1b58e9

Remove 'export' in symbols as it doesn't make sense for data symbols

f3c53e9

Clean up the implementation for symbol parser

88015ed

Fix binding generation with invalid indices

391e056

Add export symbol flags

ff37487

Add name resolution for relocations

5e31c0d

Add name resolution for relocations in data segments

4bcc959

Fix parser setting invalid flags and types

40046a0

Add fixed 64 bit leb writers

beb3e96

Adjust symbol table implementation to account for the fact that symbo…

44d0f98

…l info is now stored in IR

Adjust binary writer to output more relocations

4aea26c

Prevent writing symbol metadata when no attributes need to be specified

8588622

Add handling for invalid symbol definitions and relocations

667ff47

Handle expressions appearing outside relocatable sections

c09b03a

Revert to older error message to pass more tests

0f3b61e

Fix inverted assetion condition

8b1afe0

Always print at least an empty string in data segments

2f9a0d6

Add validation for invalid init functions

52f74d3

Fix global symbol handler using an inappropriate index

219584c

Fix exports not being looked up correctly

a699ca7

Imply no_strip when exporting

eb99d50

Set WASM_EXPLICIT_NAME when exporting [old behavior compat]

3cbbf34

Exit early on invalid symtab

87259c4

feedab1e force-pushed the main branch from 787cc6b to 87259c4 Compare October 14, 2025 02:33

feedab1e mentioned this pull request Oct 18, 2025

Add text format specification for Linking.md WebAssembly/tool-conventions#258

Open

feedab1e added 3 commits October 19, 2025 03:18

Fix invalid treatment of data imports

5f7f384

Assume all leb relocs of primary shapes are valid in the code section

8c1da43

Skip raw output of linking and reloc sections when outputting relocat…

b61fa7a

…ions

Uh oh!

Advanced relocation support for wat2wasm, wasm2wat, wasm-validate, wat-desugar #2649

Are you sure you want to change the base?

Advanced relocation support for wat2wasm, wasm2wat, wasm-validate, wat-desugar #2649

Uh oh!

Conversation

feedab1e commented Oct 13, 2025

Overview

Symbol attributes

WASM entity symbols

Example:

Data symbols

Example

Data imports

Example

Relocations

Syntax

Instruction relocations

Data relocations

Approach

Known limitations

Multiple symbols per WebAssembly entity

Offset relocations and their addends

Uh oh!

feedab1e commented Oct 16, 2025

Uh oh!

sbc100 commented Oct 16, 2025

Uh oh!

feedab1e commented Oct 17, 2025

Uh oh!

sbc100 commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 17, 2025

Uh oh!

sbc100 commented Oct 17, 2025

Uh oh!

sbc100 commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 17, 2025

Uh oh!

bjorn3 commented Oct 17, 2025

Uh oh!

SingleAccretion commented Oct 17, 2025

Uh oh!

alexcrichton commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rossberg commented Oct 17, 2025

Uh oh!

sbc100 commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 17, 2025

Uh oh!

sbc100 commented Oct 17, 2025

Uh oh!

sbc100 commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 17, 2025

Uh oh!

sbc100 commented Oct 17, 2025

Uh oh!

alexcrichton commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 17, 2025

Uh oh!

feedab1e commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

feedab1e commented Oct 17, 2025 •

edited

Loading