Skip to content

Conversation

@feedab1e
Copy link

Overview

This PR adds support for specifying symbol attributes on wasm entities (functions, globals, events, tables), and adds support for relocation attributes on instructions that accept relocatable operands (i{32,64}.const, i{32,64}.{load,store}) for better conformance to https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md.
This also addresses the old issue of #1199 (comment)

Symbol attributes

Symbol attributes are written using @sym annotation, contents of which are attributes for the symbol that correspond to flags in the symbol table:

Symbol attribute Corresponding flag
weak WASM_SYM_BINDING_WEAK
static WASM_SYM_BINDING_LOCAL
hidden WASM_SYM_VISIBILITY_HIDDEN
retain WASM_SYM_NO_STRIP

In addition, following parametric attributes are supported:

  • name="<name>" — sets WASM_SYM_EXPLICIT_NAME and binds a name to the symbol
  • priority=<int> — adds a function to the init_func list with the specified priority

WASM entity symbols

Symbols corresponding to WASM entities are specified inline with their definitions, and the annotation is placed after the entity name.

Example:

(func $foobar (@sym name="initme" priority=100)
  (param i32) (result i32) (local i32) 
  (local.get 0))

will create a function with the name "initme" and put it into an init vector with priority 100.

Data symbols

Data symbols are specified inline with the string that defines the data. Unlike entity symbols, data symbols don't imbue the name of their corresponding entities, so a WASM var is assigned to every one.
For defined data symbols, attribute size=<int> must be specified, which reflects the size of that symbol.

Example

(data (i32.const 0) (@sym $foo size=1 name="x") "\0")

defines a symbol named "x" sized 1 byte.

Data imports

Some data symbols are not defined, so there isn't a place for them inside the data section declarations.
For that purpose, there is an option to use a @sym.import.data annotation where a module field would naturally occur, with syntax inside being exactly the same as inside of a @sym annotation.

Example

(module
  (@sym.import.data $foo name="bar"))

will declare an imported data symbol with the name "bar"

Relocations

Relocations are the other crucial part of this PR, as they actually the ones allowing people to write binaries that, for example, take an address of a function

Syntax

Relocations use the format (@reloc <shape> <method> <symbol> <attr-opt>), meaning of which is described below.

  • shape encodes for the data type of relocation (i.e. how many bytes will be rewritten and in which format).
    It is one of i32, i64, leb, sleb, leb64, or sleb64.

  • method encodes for the type of relocation, so what kind of symbol we are relocating against and how to interpret that symbol.

<method> symbol kind corresponding relocation constants interpretation
tag event R_WASM_EVENT_INDEX_* Final WebAssembly event index
table global R_WASM_TABLE_NUMBER_* Final WebAssembly table index (index of a table, not into one)
global global R_WASM_GLOBAL_INDEX_* Final WebAssembly global index
func function R_WASM_FUNCTION_INDEX_* Final WebAssembly function index
functable function R_WASM_TABLE_INDEX_* Index into the dynamic function table, used for taking address of functions
text function R_WASM_FUNCTION_OFFSET Offset into the function body, used for debugging
section function R_WASM_SECTION_OFFSET Offset into a custom section, used for debugging
data data R_WASM_MEMORY_ADDR_* WebAssembly linear memory address
  • attr-opt encodes for the additional attributes that a relocation might have.
<attr-opt> corresponding relocation constants interpretation
nothing nothing Normal relocation
pic R_WASM_*_LOCREL_*, R_WASM_*_REL_* Address relative to env.__memory_base or env.__table_base, used for dynamic linking
tls R_WASM_*_TLS* Address relative to env.__tls_base, used for thread-local storage

It is obvious that not every combination of relocation methods and relocation shapes exists, so for invalid ones an error will be raised.

Instruction relocations

For relocations targeting instruction operands, the need for relocation shapes is obviated, therefore they have the form (@reloc <method> <symbol>).

The only instructions that currently need explicit relocations are const expressions, and load/store expressions.
For const expressions their relocations target the constant operand and are written after that operand:

(module
  (func $foo (result i32)
    i32.const 0 (@reloc functable $foo)))

For load/store expressions, their relocations target their offsets, and are therefore written after the offset directive:

(module
  (@sym.import.data $foo name="x")
  (func $foo (result i32)
    i32.const 0
    i32.load offset=0 (@reloc data $foo)))

Data relocations

Like data symbols, data relocations are written inline with the data segment text, but unlike symbols, and like instruction relocations, they are written at the end of the byte sequence to rewrite.
For example:

(module
  (import (func $foo (@sym name="foo")))
  (data (i32.const 0) (@sym (; 0 ;) name="foo_pfn" size=4) "\0\0\0\0" (@reloc i32 functable $foo)))

will produce a data section relocation starting at offset 0.

Approach

Since the current binary reader is single-pass, during the parsing phase, store the references to all instructions that can potentially be relocated into their respective relocation queue (one per section). At the respective reloc section, look up the target section's queue, and store all relocations into that queue. At the end of the module, go through each of the queues, and use relocations seen to link instructions/data segments to their respective symbols.

Known limitations

This PR, while improving the relocation support significantly, still lacks full compatibility to the linking spec.

Multiple symbols per WebAssembly entity

In particular, it is not possible at at the moment to create several symbols referencing the same WASM entity, but symbols in this PR are directly tied to their respective entity. This is done to avoid having to explicitly annotate every non-memory mention of a WebAssembly entity to resolve which of the symbols is being referred to. Apart from that, the actual semantics that is implied by attaching several potentially conflicting symbols to a single entity is not really clear.

Offset relocations and their addends

Section/function offset relocations are crucial for emitting accurate debug info, for that Embedded DWARF uses relocations that reference functions/sections, and uses the addend field to specify an offset into its text. To accurately represent that In WAT we would like to have something like debug labels, that are inserted inline into the instruction stream or a custom section data buffer. Unfortunately, because the BinaryReaderIR does not interface directly with the binary, but instead has to go through BinaryReaderDelegate, it's not possible to accurately predict where an instruction starts, so it's not possible to accurately determine the place where that addend would usually point to, so it's not possible to reliably reconstruct the debug label.

@feedab1e
Copy link
Author

@sbc100 @binji you might want to take a look at this

@sbc100
Copy link
Member

sbc100 commented Oct 16, 2025

Wow, this is very impressive that you got all this to work @feedab1e.

My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own .s format for representing all of this. Have you proposed this elsewhere (e.g. wasm-tools) and found folks generally supportive of this kind of usage of wat?

@dschuff @tlively @kripken WDYT?

@feedab1e
Copy link
Author

Wow, this is very impressive that you got all this to work @feedab1e.

My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own .s format for representing all of this. Have you proposed this elsewhere (e.g. wasm-tools) and found folks generally supportive of this kind of usage of wat?

@dschuff @tlively @kripken WDYT?

I did not yet propose this anywhere else outside of my scope of work, however, I intend to use these annotations to output relocation information for my wasm backend in GCC (WIP). Also based on prior work with dynamic linking that wasm-tools proposed some time ago, I expect that they wouldn't be against this inclusion, but I'll ask

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2025

Wow, this is very impressive that you got all this to work @feedab1e.
My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own .s format for representing all of this. Have you proposed this elsewhere (e.g. wasm-tools) and found folks generally supportive of this kind of usage of wat?
@dschuff @tlively @kripken WDYT?

I did not yet propose this anywhere else outside of my scope of work, however, I intend to use these annotations to output relocation information for my wasm backend in GCC (WIP).

Wow, a gcc backend! Thats is exciting! Are you sure it wouldn't make sense to use the .s format there, like the one that llvm uses? I assume that the gcc assembly and disassembly won't be outputtint wat ? Or if they did it would very different to the llvm approach.

Also based on prior work with dynamic linking that wasm-tools proposed some time ago, I expect that they wouldn't be against this inclusion, but I'll ask

Just to be clear, the parts being proposed for addition here are specifically about object files and static linking. They only exist in the object file format, not the in executable or DSO fomat (Specifically the linking section, symbols and relocations).

@feedab1e
Copy link
Author

Wow, a gcc backend! Thats is exciting! Are you sure it wouldn't make sense to use the .s format there, like the one that llvm uses? I assume that the gcc assembly and disassembly won't be outputtint wat ? Or if they did it would very different to the llvm approach.

Well, for now my backend already outputs WAT which passes validation (although I still can't run it yet because of linking). That is different from LLVM, because LLVM never actually creates any text during compilation, and produces a binary directly. I cannot do that with GCC.

Just to be clear, the parts being proposed for addition here are specifically about object files and static linking. They only exist in the object file format, not the in executable or DSO fomat (Specifically the linking section, symbols and relocations).

Yeah, that's true, but AFAIK those are still valid modules and I assume people would want to manipulate those too.

@feedab1e
Copy link
Author

The problem with .s for me is that I would be relying on what is essentially an implementation detail of LLVM, a format which is undocumented, and implementation of which exists (and will likely continue to exist) only in LLVM. And a hard dependency on LLVM is not a good look for GCC.

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2025

Wow, a gcc backend! Thats is exciting! Are you sure it wouldn't make sense to use the .s format there, like the one that llvm uses? I assume that the gcc assembly and disassembly won't be outputtint wat ? Or if they did it would very different to the llvm approach.

Well, for now my backend already outputs WAT which passes validation (although I still can't run it yet because of linking). That is different from LLVM, because LLVM never actually creates any text during compilation, and produces a binary directly. I cannot do that with GCC.

LLVM can produce and consume Wasm assembly in the .s format. You can see this if you pass -fno-integrated-as (maybe along with -save-temps?) The format used if very similar to existing architectures assembly formats.

Just to be clear, the parts being proposed for addition here are specifically about object files and static linking. They only exist in the object file format, not the in executable or DSO fomat (Specifically the linking section, symbols and relocations).

Yeah, that's true, but AFAIK those are still valid modules and I assume people would want to manipulate those too.

This is true.

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2025

The problem with .s for me is that I would be relying on what is essentially an implementation detail of LLVM, a format which is undocumented, and implementation of which exists (and will likely continue to exist) only in LLVM. And a hard dependency on LLVM is not a good look for GCC.

The main reason its not documented and currently only used in LLVM is lacks documentation is that no other compiler has needed to use it yet. A GCC backend might be a great time and place to make it more official and documented.

@feedab1e
Copy link
Author

The problem with .s for me is that I would be relying on what is essentially an implementation detail of LLVM, a format which is undocumented, and implementation of which exists (and will likely continue to exist) only in LLVM. And a hard dependency on LLVM is not a good look for GCC.

The main reason its not documented and currently only used in LLVM is lacks documentation is that no other compiler has needed to use it yet. A GCC backend might be a great time and place to make it more official and documented.

Well, that documentation would be nice in any case, but I suspect that someone from LLVM have to do it first before I can consider adopting .s, and then I assume someone would also have to develop a parser for this format either here, or in binaryen, or in wasm-tools, so that I wouldn't need to rely on LLVM for compilation using GCC. Contrast to that, I found this to be fairly easily implementable here, and I assume that would also be the case for other WAT consumers (since implementation-wise this is fairly similar to code metadata)

@bjorn3
Copy link

bjorn3 commented Oct 17, 2025

My main concern is how much we want to actually commit to being able to express object files in the wat format like this. llvm (the main producer of object files) does not use wat and instead has its own .s format for representing all of this. Have you proposed this elsewhere (e.g. wasm-tools) and found folks generally supportive of this kind of usage of wat?

On the issue for stabilizing inline asm support on wasm in rust I suggested using wat + some way to encode relocations instead of stabilizing LLVM's custom assembly format: rust-lang/rust#136382 (comment) If we get the format that this PR adds documented and eventually stabilized, that would make it feasible from a language perspective to use wat for inline asm in rust. But it would probably still be non-trivial to either add support for it to LLVM or to add a translation pass to rustc. The latter would add complexity to rustc, but I did personally feel a lot more comfortable stabilizing inline asm on wasm that way. If LLVM changes the assembly format we did only need to change the translation pass, not all user code.

@SingleAccretion
Copy link
Contributor

I would point out that using a particular assembly format is a user-visible contract in the case of a compiler like GCC due to inline/module-level assembly (it is also the reason why the .S format is not really 'LLVM internal' - it is part of the public interface). So if you want to compile a repository like emscripten which uses .S files, you need to support it. Introducing a separate format that is less powerful is a downside for compiler portability.

@alexcrichton
Copy link
Contributor

I can perhaps lend some thoughts from a wasm-tools perspective -- this is a feature I've long wanted! I don't necessarily have a killer use case for this, though, and mostly historically for me it's been in the bucket of "it'd be neat to print the object-related custom sections with annotations". I would have no aspirations to supplant LLVM's assembly syntax and I'd understand that it'd be a perpetual game of catch-up if LLVM added new features.

I've made half-hearted attempts to implement something like this in wasm-tools historically but the "known limitations" listed in the PR description here are mostly what stopped me, especially the one about relocations in data sections. I've had problems historically trying to retrofit s-expressions and the text format with relocations and I've found that it's not always the most suitable. For example i32.const 0 (@reloc functable $foo) needs to sort of magically encode a 5-byte leb for the i32.const value. Relocations in data sections would need something like (data (i32.const ...) "..." (@reloc ...) "..." (@reloc ...) ...) or something like that. I more-or-less kept coming to the conclusion that LLVM's assembly format is quite suitable for these relocations and such but the standard wasm text format is much less amenable.

Now that being said I wouldn't want to stop effort on implementing this! I'm all for having a shared convention amongst tools as much as anyone else, and wasm having an official text format I think is a great place to start from. One possibility is that if the official text format isn't amenable enough for relocations it might be possible to make offical changes to make it more amenable (iunno what these would be but I suspect the CG would be receptive to tweaks to the text format). I'd also have bikeshedding opinions about various syntaxes in play here, but I'll reserve those for a different time since it's always easy to tweak.

@feedab1e
Copy link
Author

feedab1e commented Oct 17, 2025

I've made half-hearted attempts to implement something like this in wasm-tools historically but the "known limitations" listed in the PR description here are mostly what stopped me, especially the one about relocations in data sections.

The "known limitations" part of this PR is more about WABT itself rather than the syntax. As for addends, the issue there is that in the binary the relocation section comes after the code, so creating debug labels in an already formed IR would be a struggle. And for multiple symbols, it would be fairly easy to just support multiple @sym annotations on a symbol, it's just that the semantics of that is not really clear.
That said, data symbols and relocations work perfectly fine, I think, both in this syntax and in the implementation

I've had problems historically trying to retrofit s-expressions and the text format with relocations and I've found that it's not always the most suitable. For example i32.const 0 (@reloc functable $foo) needs to sort of magically encode a 5-byte leb for the i32.const value. Relocations in data sections would need something like (data (i32.const ...) "..." (@reloc ...) "..." (@reloc ...) ...) or something like that.

That's pretty much what I did for relocations. When the relocation is in the code section, its format is inferred from the instruction's operand being relocated, and when the relocation is in the data section, the user has to specify the relocation's shape so that the assembler can recognize the relocation type. And yes, relocations are spliced into the data, but I think that's just the only reasonable way to do it, LLVM syntax or otherwise

@rossberg
Copy link
Member

For example i32.const 0 (@Reloc functable $foo) needs to sort of magically encode a 5-byte leb for the i32.const value.

This problem could be avoided by making it a requirement that the annotated constant instruction has the value 0x8000_0000, which will always take 5 bytes.

And yes, relocations are spliced into the data, but I think that's just the only reasonable way to do it, LLVM syntax or otherwise

If somebody moved the Wat Numerical Values proposal forward from phase 2, then that would perhaps provide a nicer way to annotate data segments. :)

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2025

Well, that documentation would be nice in any case, but I suspect that someone from LLVM have to do it first before I can consider adopting .s, and then I assume someone would also have to develop a parser for this format either here, or in binaryen, or in wasm-tools, so that I wouldn't need to rely on LLVM for compilation using GCC.

Does GCC not have a kind of generic .s file reader/write like LLVM does? At least it LLVM one of the main reasons for using the .s format was to be able to re-use all the existing machinery for dealing with assembly files. There was no need to write any new parser or writer at all.

@feedab1e
Copy link
Author

Well, that documentation would be nice in any case, but I suspect that someone from LLVM have to do it first before I can consider adopting .s, and then I assume someone would also have to develop a parser for this format either here, or in binaryen, or in wasm-tools, so that I wouldn't need to rely on LLVM for compilation using GCC.

Does GCC not have a kind of generic .s file reader/write like LLVM does? At least it LLVM one of the main reasons for using the .s format was to be able to re-use all the existing machinery for dealing with assembly files. There was no need to write any new parser or writer at all.

No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend.

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2025

No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend.

I see, that makes sense then. Sounds like with gcc there is way less motivation to make you assembly format resemble traditional formats since there is less code sharing between backends. However, I do think it might be worth while at least considering sharing an assembly format with LLVM since having a different assembly format for each compiler seems like maybe a bad outcome?

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2025

No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend.

I see, that makes sense then. Sounds like with gcc there is way less motivation to make you assembly format resemble traditional formats since there is less code sharing between backends. However, I do think it might be worth while at least considering sharing an assembly format with LLVM since having a different assembly format for each compiler seems like maybe a bad outcome?

As someone who has done a fair amount of work on the de-dacto assembly format in LLVM I'd be happen to help document the existing format and maybe even update it to remove any rough edges.

@feedab1e
Copy link
Author

And yes, relocations are spliced into the data, but I think that's just the only reasonable way to do it, LLVM syntax or otherwise

If somebody moved the Wat Numerical Values proposal forward from phase 2, then that would perhaps provide a nicer way to annotate data segments. :)

Yeah, I think we can infer relocation types from those declarations too. However, I see that that proposal lacks a way to output a leb, which is needed for debug info, for example, but I think that can be simply added into the spec. One other thing is that I don't really know how would roundtrips look like with this proposal; I suspect that everything apart from relocations would have to just be text

@feedab1e
Copy link
Author

No and I don't even know what would that look like, with the current variety of real-world assembly syntaxes. AFAIK there isn't much baked in, and the output is mostly fully controlled by the backend.

I see, that makes sense then. Sounds like with gcc there is way less motivation to make you assembly format resemble traditional formats since there is less code sharing between backends. However, I do think it might be worth while at least considering sharing an assembly format with LLVM since having a different assembly format for each compiler seems like maybe a bad outcome?

How complicated would it be to move LLVM to use WAT? My main concern here is that WAT consumers aren't rushing to adopt .s, so if I am to adopt it, it would still mean that two projects out of the entire WASM ecosystem use their own oddball format, which would still be a bad outcome.

As someone who has done a fair amount of work on the de-dacto assembly format in LLVM I'd be happen to help document the existing format and maybe even update it to remove any rough edges.

@alexcrichton, would wasm-tools be willing to to implement a parser for .s if a spec for it lands?

@sbc100
Copy link
Member

sbc100 commented Oct 17, 2025

How complicated would it be to move LLVM to use WAT?

Thats a good question. At the bare minimum it would require writing a completely new assembly parser and writer. But I don't know if there would be even farther reaching effects. LLVM backends share a lot of common code in this area (The MC layer) but I'm don't have enough knowledge to know how much work it would be or what the long term maintenance costs would be in diverging.

@alexcrichton
Copy link
Contributor

The "known limitations" part of this PR is more about WABT itself rather than the syntax.

Oh! Never mind me then. In that case this is definitely something I'd like to implement in wasm-tools eventually as well. If you're up for it I'd like to have a chance to bikeshed some syntax and such, so would you be up for sending a PR to Linking.md with a section describing the text format? That'd be a good place to explain how all the various constructs translate to text and would also provide a good place IMO to do some minor bikeshedding. If that's more than you're wanted to bite off and chew though I understand.

This problem could be avoided by making it a requirement that the annotated constant instruction has the value 0x8000_0000, which will always take 5 bytes.

True! That's not always applicable, though, because for example call 0 also needs to be encoded as a 5-byte leb and call 0x8000_0000 would produce an invalid wasm.

This actually also reminds me @feedab1e that one other thing I ran into was that relocations would be on sub-parts of immediates of instructions and I wasn't sure how to represent that in the text syntax. For example with call_indirect there's one relocation for the table index and one relocation for the type index. This can work with enough "assume this reloc attached here means this" style logic but it was scenarios like this where I started to lose steam personally for implementing this historically.

@alexcrichton, would wasm-tools be willing to to implement a parser for .s if a spec for it lands?

Personally I wouldn't be too interested in implementing it myself at least, but I want to qualify this with some more words as well. For wasm-tools it's centered around the wasm binary and text format as defined in various specifications and is intended to provide ways to manipulate/inspect/debug these formats. The .s format would be more compiler-oriented in terms of LLVM or GCC and effectively wouldn't mesh well with what's already in wasm-tools. I suspect it'd be an entirely new crate/implementation/subcommands/etc, and at that point I'm not really sure what the benefit would be over using LLVM's or GCC's tooling that already exists for the .s format.

I also realize my opinion isn't necessarily being solicited here but if you'll indulge me I'll go ahead and give it anyway. On one hand there's no denying the current reality of the .s format naturally meshing well with LLVM and supporting all the various features necessary that LLVM needs. On the other hand though there's also no denying that wasm has a different, officially specified, text format that shares instruction names but not much else. Balancing these two is, in my opinion, not possible without elbow grease going somewhere. I wouldn't be surprised if updating LLVM was a major effort, but I also wouldn't be surprised if new users continued to be surprised that the .s format is different than the official text format. I suspect the most expedient way forward for GCC would be to implement LLVM's .s format, but I also don't know much about GCC backends so I don't know if that would be easier or harder than implementing the wasm standard text format.

@feedab1e
Copy link
Author

The "known limitations" part of this PR is more about WABT itself rather than the syntax.

Oh! Never mind me then. In that case this is definitely something I'd like to implement in wasm-tools eventually as well. If you're up for it I'd like to have a chance to bikeshed some syntax and such, so would you be up for sending a PR to Linking.md with a section describing the text format? That'd be a good place to explain how all the various constructs translate to text and would also provide a good place IMO to do some minor bikeshedding. If that's more than you're wanted to bite off and chew though I understand.

Sure, I'll do that.

This problem could be avoided by making it a requirement that the annotated constant instruction has the value 0x8000_0000, which will always take 5 bytes.

True! That's not always applicable, though, because for example call 0 also needs to be encoded as a 5-byte leb and call 0x8000_0000 would produce an invalid wasm.

The linking spec mandates the use of overlong lebs anyway in places where a relocation occurs, and that's also what wat2wasm already does with the -r flag, so I see no issue here.

This actually also reminds me @feedab1e that one other thing I ran into was that relocations would be on sub-parts of immediates of instructions and I wasn't sure how to represent that in the text syntax. For example with call_indirect there's one relocation for the table index and one relocation for the type index. This can work with enough "assume this reloc attached here means this" style logic but it was scenarios like this where I started to lose steam personally for implementing this historically.

I don't think that this would be a huge issue for me, since in this implementation the actual relocation types are looked up by their shape and method, so it wouldn't be hard to just hardcode the method too for the cases where multiple symbols exist for the same wasm entity.

@alexcrichton, would wasm-tools be willing to to implement a parser for .s if a spec for it lands?

Personally I wouldn't be too interested in implementing it myself at least, but I want to qualify this with some more words as well. For wasm-tools it's centered around the wasm binary and text format as defined in various specifications and is intended to provide ways to manipulate/inspect/debug these formats. The .s format would be more compiler-oriented in terms of LLVM or GCC and effectively wouldn't mesh well with what's already in wasm-tools. I suspect it'd be an entirely new crate/implementation/subcommands/etc, and at that point I'm not really sure what the benefit would be over using LLVM's or GCC's tooling that already exists for the .s format.

So the current situation is that there is no GCC tooling at all for the .s format, it only exists in LLVM. Since I am the one writing the implementation of WASM in GCC, one of the main priorities for me is to be compatible with most of the ecosystem, so I'd rather choose whatever is most supported, which, in this case would be WAT, especially because the wider ecosystem isn't very receptive of the .s format.

I also realize my opinion isn't necessarily being solicited here but if you'll indulge me I'll go ahead and give it anyway. On one hand there's no denying the current reality of the .s format naturally meshing well with LLVM and supporting all the various features necessary that LLVM needs. On the other hand though there's also no denying that wasm has a different, officially specified, text format that shares instruction names but not much else. Balancing these two is, in my opinion, not possible without elbow grease going somewhere. I wouldn't be surprised if updating LLVM was a major effort, but I also wouldn't be surprised if new users continued to be surprised that the .s format is different than the official text format. I suspect the most expedient way forward for GCC would be to implement LLVM's .s format, but I also don't know much about GCC backends so I don't know if that would be easier or harder than implementing the wasm standard text format.

Your opinion is totally welcome here. So, the thing is that at this point I think at least for me it would be easier to continue the development of my backend with WAT and not .s, since I did my development against WAT initially, and at this point rewriting that to use the other syntax (and reimplementing the entirety of LLVM's assembler) would be more effort than continuing development against WAT and its standard assemblers. I think it will be possible to implement either syntax or even both, given time and effort (and a spec for the format). But then again, if no one except for LLVM (and possibly some time later binutils) would be able to parse that format, it would be suboptimal for both GCC and the wider ecosystem.

Of the features that probably exist in .s and are lacking in WAT are Linking and DWARF Debugging, but I think we can achieve format parity if both are to be implemented (and there will be more work in case of .s since there the entire assembler will have to be implemented from scratch, including linking and debug info)

@feedab1e
Copy link
Author

@alexcrichton here's the PR: WebAssembly/tool-conventions#258

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants