Add text format specification for Linking.md #258

feedab1e · 2025-10-18T09:52:11Z

This PR proposes a vendor-neutral syntax for describing relocation information in WAT.

Unlike the syntax described in WebAssembly/wabt#2649, this proposal is intended to express everything that the binary format can, including whatever the current proposed implementation in WABT does not support. Of note here is the fact that @sym annotations can appear multiple times per declaration, the inclusion of COMDATs, segment infos, and the inclusion of labels and addends.

sbc100

Seems like a good idea to me. Good to see the annotation proposal being used in innovative ways like this!

Linking.md

sbc100 · 2025-10-18T16:55:11Z

Linking.md

+| `weak`           | sets `WASM_SYM_BINDING_WEAK` symbol flag                                                       |
+| `static`         | sets `WASM_SYM_BINDING_LOCAL` symbol flag                                                      |
+| `hidden`         | sets `WASM_SYM_VISIBILITY_HIDDEN` symbol flag                                                  |
+| `retain`         | sets `WASM_SYM_NO_STRIP` symbol flag                                                           |


If we think retain is a better terminology then we should probably propose renaming WASM_SYM_NO_STRIP rather than diverging.

Yeah, that's sound. I just thought that those names are from LLVM's source, so that rename should be done in sync with LLVM

Linking.md

sbc100 · 2025-10-18T22:26:30Z

For comparison I wonder if you could post a simple hello_world.o object file in both .s assembly format and the .wat format. We should probably almost mention that LLVM uses its own .s format somewhere here too, but that can be separate PR if you like.

feedab1e · 2025-10-18T22:49:02Z

For comparison I wonder if you could post a simple hello_world.o object file in both .s assembly format and the .wat format. We should probably almost mention that LLVM uses its own .s format somewhere here too, but that can be separate PR if you like.

I can do that for WAT, but since I don't really know the structure of .s, and there seems to be no documentation for it, I don't think that I can do it with .s.

feedab1e · 2025-10-18T23:09:39Z

On second thought, I can just use clang -S for .s and clang -c wasm2wat for .wat, so I can do that, yeah

feedab1e · 2025-10-19T00:54:52Z

@sbc100 My "hello world" examples turned out to be more than a full screen of code when using .s. I can send the files here, but I don't think it would be good to put that output into the document.

sbc100 · 2025-10-19T01:05:49Z

@sbc100 My "hello world" examples turned out to be more than a full screen of code when using .s. I can send the files here, but I don't think it would be good to put that output into the document.

Can you just include it here as quoted text? Along with the equivalent wat for comparison?

sbc100 · 2025-10-19T01:07:32Z

@sbc100 My "hello world" examples turned out to be more than a full screen of code when using .s. I can send the files here, but I don't think it would be good to put that output into the document.

Agreed, Im mostly curious to see the two side by side here in the comments for comparison/discussion.

feedab1e · 2025-10-19T01:19:23Z

Oh, sure, then, for the source file test.cpp, when compiled with clang -S -O3, yields test.s, and when compiled clang -c -O3, and than run through wasm2wat -r --enable-annotations (provided wabt from my fork), yields test.wat.

(Unfortunately, github forbids me from posting .s and .wat, so .txt it is)

sbc100 · 2025-10-19T17:09:27Z

Oh, sure, then, for the source file test.cpp, when compiled with clang -S -O3, yields test.s, and when compiled clang -c -O3, and than run through wasm2wat -r --enable-annotations (provided wabt from my fork), yields test.wat.

(Unfortunately, github forbids me from posting .s and .wat, so .txt it is)

Oh nice, the wat format looks much more readable than I expected it to be.

feedab1e · 2025-10-19T17:41:45Z

One thing I wanted to mention here is that there is a limitation in the text format, that code section relocations that do not make sense for instruction operands may not be expressed in the text format. So for example i32.load instruction with a R_WASM_TYPE_INDEX_LEB could occur in a perfectly valid object file, from binary format's PoV, yet would be inexpressible in the text format. Do we want to disallow such relocations in the binary format too?
(AFAIK this would also be a problem for .s)

feedab1e · 2025-10-19T17:45:56Z

Also the same issue exists for labels, which can only point between instructions in the code segment, or into the data area of a data segment in the data section

sbc100 · 2025-10-19T18:05:02Z

One thing I wanted to mention here is that there is a limitation in the text format, that code section relocations that do not make sense for instruction operands may not be expressed in the text format. So for example i32.load instruction with a R_WASM_TYPE_INDEX_LEB could occur in a perfectly valid object file, from binary format's PoV, yet would be inexpressible in the text format. Do we want to disallow such relocations in the binary format too? (AFAIK this would also be a problem for .s)

Yes it seems perfectly reasonable for an object file validator to declare such relocations as invalid based on the instructions they are part of.

However, the linker itself (wasm-ld) will blindly accept such things, and likely produce invalid output as a result too. The linker explictly does not parse the code section but instead blindly applies relocation. The same goes for relocation that don't point to a correct spot in the instruction stream, e.g. a relocation could in theory point at the i32.load opcode itself, rather than its operand. This would technically be an invalid object files, but the linker is blind to that.

feedab1e · 2025-10-19T18:32:49Z

Would you be fine with me adding into the doc that such relocations are invalid for the purposes of validation, then?
And same for R_WASM_SECTION_OFFSET_I32 addends.
Also, do we want to specify that section entries in relocation tables may only point to the data section, the code section, or custom sections?

sbc100 · 2025-10-19T19:25:15Z

Would you be fine with me adding into the doc that such relocations are invalid for the purposes of validation, then?

Sure that makes sense to me. Relocation that don't point to a valid spot in the instruction stream are certainly invalid. It might be worth also noting that wasm-ld does not do validation of the code section though, so bad inputs can result in bad outputs.

And same for R_WASM_SECTION_OFFSET_I32 addends. Also, do we want to specify that section entries in relocation tables may only point to the data section, the code section, or custom sections?

Sounds reasonable yes, we could always expand the list, but for now those are the only sections that are copied by the linker from input files into the output file, so they are the only section for which relocations make sense. I think for GC types we maybe want to one day include the type section somehow, but we are long way from that.

feedab1e · 2025-10-19T22:00:36Z

Added the validation rules, please take a look

Linking.md

sbc100 · 2025-10-20T15:28:34Z

Linking.md

 | ------------ | -------------- | ------------------------------------------- |
 | section      | `varuint32`    | the index of the target section             |

+Section symbols may only reference the CODE section, the DATA section, or custom sections.


Hm.. I'm not sure about this actually.

When you asked about documenting a limitation on sections I thought you were referring to the fact that relocations can only apply to certain section types.

"Which sections can have relocations within them" is a different concept to "which sections can be referred to by WASM_SYMBOL_TYPE_SECTION symbols".

I believe that WASM_SYMBOL_TYPE_SECTION symbols are only used by debug info, but my memory is a little foggy here.

Looking at the code it actually looks like this symbols might only be valid for custom sections: https://github.com/llvm/llvm-project/blob/38372df53fd7f6c8bd8c46bf720b676e12f481d9/lld/wasm/InputFiles.cpp#L697-L705.

Which would make sense if these only used in debug info since all debug info is stored in custom sections.

I don't believe it can work like that, though, since R_WASM_SECTION_OFFSET_I32 relocations reference a section symbol, and for that to work as DWARF code addresses, the symbol that relocation references would have to reference the CODE section, while the relocation itself would have to target a place in the debug section.

No, actually, that can absolutely work if WASM_*_OFFSET_* relocations actually resolve to offsets form the file start, like DWARF actually expects. I do think the current spec is not very clear on this and someone from LLVM should take a look at what actually happens there and adjust https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#processing-relocations accordingly.

Linking.md

sbc100 · 2025-10-20T22:13:39Z

Thanks for this! I'd second the idea of including examples directly in this document if you're up for it. Would it be possible to show the C/C++ source you're working with plus the raw text output? Maybe not the entire file in all cases but it'd be convenient to see example usage of all of these dierctives within the context of actual *.wat.

I think a simple hello world in C should be enough since that will include a function relocation for printf and a data relocation for the hello world string. No need to do anything more fancy in the basic example.

alexcrichton · 2025-10-20T22:50:06Z

One aspect I'd like to see as well is relocations within a data section too, but compiling hello world with -g might suffice for that since the dwarf sections will have relocations.

feedab1e · 2025-10-21T00:23:53Z

One aspect I'd like to see as well is relocations within a data section too, but compiling hello world with -g might suffice for that since the dwarf sections will have relocations.

Relocations in data sections already have an example in https://github.com/feedab1e/tool-conventions/blob/main/Linking.md#data-relocations, perhaps you meant something else?

feedab1e · 2025-10-21T00:36:55Z

As for disassembling a "hello world" compiled with -g, the problem here is that I can't disassemble those without information loss using WABT, just like I can't properly validate that such a binary is a valid object file. This is a limitation in the current design of WABT's binary reader, where I don't have up-to-date information on the file offsets of each operand and/or instruction, as well as binary reader being single-pass in combination with relocation section occurring later than the section being relocated.

feedab1e · 2025-10-21T05:45:08Z

Thanks for this! I'd second the idea of including examples directly in this document if you're up for it. Would it be possible to show the C/C++ source you're working with plus the raw text output? Maybe not the entire file in all cases but it'd be convenient to see example usage of all of these dierctives within the context of actual *.wat.

I think a simple hello world in C should be enough since that will include a function relocation for printf and a data relocation for the hello world string. No need to do anything more fancy in the basic example.

As per current primary symbol rules, the relocation annotation for printf would actually be entirely elided, so for a simple example I'd actually suggest the more complex code I posted above. IMO it is a good showcase for symbols, relocations in code section, data symbols, data imports, and relocations in data sections.

sbc100 · 2025-10-21T16:30:30Z

As per current primary symbol rules, the relocation annotation for printf would actually be entirely elided

I see so there would still be a reloc entry in the binary but not the the test format.

When relocation entries are implicit like this how does the binary writer know to produce them? Would there be special flag to the wat2wasm program that says something like --generate-object-file / --add-implicit-relocs ? Or would it be somehow automatic? i.e. what would stop the wat2wasm program generating relocs in normal non-object-file cases?

feedab1e · 2025-10-21T16:45:46Z

When relocation entries are implicit like this how does the binary writer know to produce them? Would there be special flag to the wat2wasm program that says something like --generate-object-file / --add-implicit-relocs ? Or would it be somehow automatic? i.e. what would stop the wat2wasm program generating relocs in normal non-object-file cases?

Yes, that flag already exists in wat2wasm trunk, it's -r, or --relocatable

feedab1e · 2025-10-21T16:57:29Z

The caveat here is that currently wat2wasm can only produce implicit relocs. The plan for this proposal is to mostly preserve old -r behaviour while additionally allowing explicit relocs where they are actually required.

sbc100 · 2025-10-21T17:20:40Z

When relocation entries are implicit like this how does the binary writer know to produce them? Would there be special flag to the wat2wasm program that says something like --generate-object-file / --add-implicit-relocs ? Or would it be somehow automatic? i.e. what would stop the wat2wasm program generating relocs in normal non-object-file cases?

Yes, that flag already exists in wat2wasm trunk, it's -r, or --relocatable

Do you think this a good design (its something I just threw together back in the day)?

I wonder if "explicit relocs everywhere" might not be better? Otherwise won't all the tools that do text to binary conversion will need some kind of extra flag to see if implicit relocs are enabled or not?

Or can we magically enable implicit relocs whenever we see any kind linking annotation in the wat file? Could there exist a wat file with zero explict annotations? How would the tooling know to create a linking section or not in that case?

feedab1e · 2025-10-21T18:32:18Z

When relocation entries are implicit like this how does the binary writer know to produce them? Would there be special flag to the wat2wasm program that says something like --generate-object-file / --add-implicit-relocs ? Or would it be somehow automatic? i.e. what would stop the wat2wasm program generating relocs in normal non-object-file cases?

Yes, that flag already exists in wat2wasm trunk, it's -r, or --relocatable

Do you think this a good design (its something I just threw together back in the day)?

Well, it does make the code prettier, and it does allow to link against all existing WAT modules with no source changes, I see that as a benefit.

I wonder if "explicit relocs everywhere" might not be better? Otherwise won't all the tools that do text to binary conversion will need some kind of extra flag to see if implicit relocs are enabled or not?

Or can we magically enable implicit relocs whenever we see any kind linking annotation in the wat file? Could there exist a wat file with zero explict annotations? How would the tooling know to create a linking section or not in that case?

Well, if we really do want that, I suppose we could dispatch based on whether the features section is present. But that wouldn't be reliable in any case, since tooling that isn't relocation-aware would skip unknown annotations and silently yield a simple module instead of an object file.

alexcrichton · 2025-10-21T20:14:53Z

I wonder if "explicit relocs everywhere" might not be better? Otherwise won't all the tools that do text to binary conversion will need some kind of extra flag to see if implicit relocs are enabled or not?

I may not fully be following the context here, but my hope is that wasm-tools print wouldn't need any sort of special flags to print @reloc annotations from a *.o file. Similarly I wouldn't want to need to pass extra flags to wasm-tools parse to generate reloc.* and linking sections when generating a *.o file.

feedab1e · 2025-10-21T21:04:20Z

I wonder if "explicit relocs everywhere" might not be better? Otherwise won't all the tools that do text to binary conversion will need some kind of extra flag to see if implicit relocs are enabled or not?

I may not fully be following the context here, but my hope is that wasm-tools print wouldn't need any sort of special flags to print @reloc annotations from a *.o file. Similarly I wouldn't want to need to pass extra flags to wasm-tools parse to generate reloc.* and linking sections when generating a *.o file.

So, @sbc100 is basically asking if it would be possible at all to assemble a file form WAT without emitting relocation metadata. For disassembly it'd be easy, since it's always obvious if a file is an object file based on the presence of the relevant custom sections. But for assembly with current syntax, any valid text module can also be an object file now.

sbc100 · 2025-10-21T21:08:37Z

I wonder if "explicit relocs everywhere" might not be better? Otherwise won't all the tools that do text to binary conversion will need some kind of extra flag to see if implicit relocs are enabled or not?

I may not fully be following the context here, but my hope is that wasm-tools print wouldn't need any sort of special flags to print @reloc annotations from a *.o file. Similarly I wouldn't want to need to pass extra flags to wasm-tools parse to generate reloc.* and linking sections when generating a *.o file.

So, @sbc100 is basically asking if it would be possible at all to assemble a file form WAT without emitting relocation metadata. For disassembly it'd be easy, since it's always obvious if a file is an object file based on the presence of the relevant custom sections. But for assembly with current syntax, any valid text module can also be an object file now.

I suppose we could make a rule such as "If at least one explicit linking annotation exists in the wat file then the whole file is assumed to be relocatable, and implicit relocations will be injected/generated in all possible locations"

Should we go with "at least on explicit annotation" or should we maybe have some kind of top level annotation that expresses "this is a relocatable object file"?

sbc100 · 2025-10-21T21:09:17Z

Something like (@linkable) at the top level? Or (@relocatable)?

feedab1e · 2025-10-21T21:13:07Z

Looking at prior art, if I am to run wat2wasm --enable-code-metadata --enable-annotations on a file with no code metadata annotations, do the sections still appear?

sbc100 · 2025-10-21T21:16:00Z

Looking at prior art, if I am to run wat2wasm --enable-code-metadata --enable-annotations on a file with no code metadata annotations, do the sections still appear?

I think they would appear in the binary if and only if they appear in the source (wat).

The idea of creating implicit annotations (i.e. annotation that don't exist in the wat file at all) is the tricky thing here. I'm not sure its good idea to that route.

feedab1e · 2025-10-21T21:17:36Z

Something like (@linkable) at the top level? Or (@relocatable)?

Well, I can make a description for target-features, require that to be present for the binary to be relocatable

alexcrichton · 2025-10-21T21:34:47Z

Personally I would advocate for linking and reloc.* sections are only generated if the input has @sym or @reloc by default. I'd prefer to add more flags to wasm-tools parse-the-CLI (the text-to-binary conversion) as that would also proliferate to all API users of the programmatic functionality as well. To me it also feels less magic to have annotations-per-item as opposed to generating them automatically

feedab1e · 2025-10-21T21:58:40Z

Personally I would advocate for linking and reloc.* sections are only generated if the input has @sym or @reloc by default. I'd prefer to add more flags to wasm-tools parse-the-CLI (the text-to-binary conversion) as that would also proliferate to all API users of the programmatic functionality as well. To me it also feels less magic to have annotations-per-item as opposed to generating them automatically

I would assume, then, that a reasonable implementation of that would always generate the sections internally, but then strip them during output, unless either a flag or either @reloc or @sym is specified, right? If so, that seems fine by me.

alexcrichton · 2025-10-21T22:33:22Z

My assumption is that the presence of @reloc and @sym causes stuff to be generated internally, but otherwise nothing happens. For example call $foo wouldn't generate any relocations by default (I realize this is against the -r flag y'all seem to be describing with wabt). So I was naively thinking that nothing would need stripping, only things would need to be preserved if they were present.

feedab1e · 2025-10-21T22:44:08Z

By "default" do you mean "if no linking annotations are present in the file" (i.e. not creating an object file) or do you mean just silently not generating a relocation there?

feedab1e · 2025-10-22T16:32:31Z

I guess my actual question is how would you diagnose the following:

(module
  (func (param) (result))
  (func (@sym) (param) (result)))

It is either

"missing (@sym)" on the first declaration (all WebAssembly entities are required to have at least one symbol)
"(@sym) is not allowed" on the second declaration (upon seeing the first declaration, we decided that we aren't actually processing an object file)

Option 1 would require either restarting the parse, or remembering all of the declaration locations, and then issuing an error for every one that had no annotations.

sbc100 · 2025-10-22T16:38:59Z

I guess my actual question is how would you diagnose the following:
(module
  (func (param) (result))
  (func (@sym) (param) (result)))
It is either

"missing (@sym)" on the first declaration (all WebAssembly entities are required to have at least one symbol)

"(@sym) is not allowed" on the second declaration (upon seeing the first declaration, we decided that we aren't actually processing an object file)

Option 1 would require either restarting the parse, or remembering all of the declaration locations, and then issuing an error for every one that had no annotations.

How about we require @relocatable on the module itself?

alexcrichton · 2025-10-22T16:49:57Z

Could handling func-with-@sym plus func-without-@sym be part of the validation of the linking section? Where validation requries that if any entity has @sym then they all have @sym? And then dealing with the text format of invalid modules is expected to return an error of some kind?

feedab1e · 2025-10-22T16:52:51Z

Could handling func-with-@sym plus func-without-@sym be part of the validation of the linking section? Where validation requries that if any entity has @sym then they all have @sym? And then dealing with the text format of invalid modules is expected to return an error of some kind?

Sure, but how would you implement those diagnostics? The parser state is gone by the time you do object file validation, as well as source locations for the declarations.

feedab1e · 2025-10-22T16:57:42Z

I guess my actual question is how would you diagnose the following:
(module
  (func (param) (result))
  (func (@sym) (param) (result)))
It is either

"missing (@sym)" on the first declaration (all WebAssembly entities are required to have at least one symbol)

"(@sym) is not allowed" on the second declaration (upon seeing the first declaration, we decided that we aren't actually processing an object file)

Option 1 would require either restarting the parse, or remembering all of the declaration locations, and then issuing an error for every one that had no annotations.
How about we require @relocatable on the module itself?

That would work, too. Actually, then, I think it would be better to have an annotation describing all WAT features that are enabled for the module.
Something like:

(module (@wat-features linking code-metadata dwarf))

For object file linking specifically, another option is to say something like

(module (@target-features +overlong-leb -multimemory))

to trigger object file generation, since those make sense only for linking, AFAIK

feedab1e added 5 commits October 18, 2025 09:53

Add text format description

392a958

Fix GitHub SKILL ISSUE, take 1

51db201

Fix GitHub SKILL ISSUE, take 2

e505cb0

Fix typos

e314caf

Fix invalid addend condition, rename relocation methods

c741bc4

feedab1e mentioned this pull request Oct 18, 2025

Advanced relocation support for wat2wasm, wasm2wat, wasm-validate, wat-desugar WebAssembly/wabt#2649

Draft

sbc100 reviewed Oct 18, 2025

View reviewed changes

feedab1e added 2 commits October 19, 2025 00:45

Fix formatting

ea16ade

Introduce binding and visibility qualifiers

a5f894e

feedab1e added 2 commits October 19, 2025 01:56

Make the example with data relocations more complex

d7ecb30

Add an overview for the text format description

79c827e

feedab1e force-pushed the main branch from 6f0a5f1 to 79c827e Compare October 18, 2025 22:56

feedab1e added 2 commits October 20, 2025 00:52

Add additional validation rules for object files

9895551

Turn the overlong leb note into a validation rule

c5270f7

sbc100 reviewed Oct 20, 2025

View reviewed changes

feedab1e mentioned this pull request Oct 20, 2025

Add validation rules for Linking.md #259

Merged

feedab1e added 3 commits October 21, 2025 08:30

Clarify rules about symbol identifiers

c4be34b

Replace ambiguous "Wasm object" by "WebAssembly entity"

f01ca3a

Replace "thread_local" by "tls"

41ddb22

Make qualifiers use parens instead of =

6fab7ca

Add text format specification for Linking.md #258

Are you sure you want to change the base?

Add text format specification for Linking.md #258

Conversation

feedab1e commented Oct 18, 2025

Uh oh!

sbc100 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sbc100 Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

feedab1e Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sbc100 commented Oct 18, 2025

Uh oh!

feedab1e commented Oct 18, 2025

Uh oh!

feedab1e commented Oct 18, 2025

Uh oh!

feedab1e commented Oct 19, 2025

Uh oh!

sbc100 commented Oct 19, 2025

Uh oh!

sbc100 commented Oct 19, 2025

Uh oh!

feedab1e commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbc100 commented Oct 19, 2025

Uh oh!

feedab1e commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feedab1e commented Oct 19, 2025

Uh oh!

sbc100 commented Oct 19, 2025

Uh oh!

feedab1e commented Oct 19, 2025

Uh oh!

sbc100 commented Oct 19, 2025

Uh oh!

feedab1e commented Oct 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sbc100 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

feedab1e Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feedab1e Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sbc100 commented Oct 20, 2025

Uh oh!

alexcrichton commented Oct 20, 2025

Uh oh!

feedab1e commented Oct 21, 2025

Uh oh!

feedab1e commented Oct 21, 2025

Uh oh!

feedab1e commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbc100 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feedab1e commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feedab1e commented Oct 19, 2025 •

edited

Loading

feedab1e commented Oct 19, 2025 •

edited

Loading

feedab1e Oct 20, 2025 •

edited

Loading

feedab1e commented Oct 21, 2025 •

edited

Loading

sbc100 commented Oct 21, 2025 •

edited

Loading

feedab1e commented Oct 21, 2025 •

edited

Loading

feedab1e commented Oct 21, 2025 •

edited

Loading