Add memory64 and table64 support to the Canonical ABI by adambratschikaye · Pull Request #624 · WebAssembly/component-model

adambratschikaye · 2026-03-17T14:47:08Z

Parameterize the Canonical ABI to handle 32-bit or 64-bit memory addresses and table indices. This is done by adding two new fields to LiftOptions to indicate if the memory64/table64 feature is being used in a core module.

Parameterize the Canonical ABI to handle 32-bit or 64-bit memory addresses and table indices. This is done by adding two new fields to `LiftOptions` to indicate if the `memory64`/`table64` feature is being used in a core module.

alexcrichton

Thanks so much for helping to pick this up! We chatted a bit on Zulip but for posterity I'll mention here about table-related index types how most of them want to remain 32-bits. I'll take a closer look once that's been passed over here, and in the meantime I skimmed over things and noted a few things here and there. I'm sure though that @lukewagner will have thoughts on this too!

alexcrichton · 2026-03-17T16:06:12Z

design/mvp/Explainer.md

-      (param $originalSize i32)
+(func (param $originalPtr $addr)
+      (param $originalSize $addr)
      (param $alignment i32)


For this I might recommend making alignment have type $addr as well for consistency, and I think that matches the signature in Rust/C/etc as well.

alexcrichton · 2026-03-17T16:07:46Z

design/mvp/Explainer.md

 | -------------------------- | ------------------------ |
 | Approximate WIT signature  | `func<T>(t: T) -> T.rep` |
-| Canonical ABI signature    | `[t:i32] -> [i32]`       |
+| Canonical ABI signature    | `[t:$idx] -> [i32]`      |


This'll be a bit subtle, but this'll actually want to be a mapping of i32 as an argument (I talked with you a bit about this already, but the input here is a host-side index so always 32-bit), but the output here should be something variable. Resource "rep"s are generally pointers in linear memory so 64-bit components are going to want 64-bit storage values. In the component model resources are defined with (rep i32) and validation currently requires that i32 is the only one listed there, but this PR will relax that meaning that the resource type itself stores the lowered type which'll get plumbed here.

Thanks, I've now changed it so that the types which are expected to be pointers to memory are allowed to be either i32 or i64 without forcing them to match the memory type (e.g. the return value here, or in context.{get,set}, or thread.new-indirect). This means we don't need to require the memory canonopt anywhere where it didn't previously exist.

alexcrichton

Looks good to me! I think this'll need minor adjustments over time but overall looks good 👍

alexcrichton · 2026-03-20T06:31:12Z

design/mvp/canonical-abi/definitions.py

+  if opts.memory is None:
+    return 'i32'


Could this have an assert? This in theory shouldn't ever be called dynamically if memory is None

alexcrichton · 2026-03-20T06:34:21Z

design/mvp/CanonicalABI.md

+* `memory` - this is a subtype of `(memory 1)`. In the rest of the explainer,
+  `PTR` will refer to either `i32` or `i64` core Wasm types as determined by the
+  type of this `memory`.


This'll technically want to say it's a subtype of memory 1 or memory i64 1 since those are separate types

alexcrichton · 2026-03-20T06:34:55Z

design/mvp/CanonicalABI.md

  feature (the "stackful" ABI), this restriction is lifted.
-* 🔀 `callback` - the function has type `(func (param i32 i32 i32) (result i32))`
-  and cannot be present without `async` and is only allowed with
+* 🔀 `callback` - the function has type `(func (param i32 i32 PTR) (result i32))`


I think this one may remain a three i32s since they're all async-related codes/events/etc.

lukewagner

This looks great overall; thanks so much for all the work and adding tests too! There's only one relatively minor, but non-nit, suggestion below that I'm happy to discuss and then re-review based on what we decide.

design/mvp/canonical-abi/definitions.py

design/mvp/canonical-abi/run_tests.py

design/mvp/Concurrency.md

design/mvp/CanonicalABI.md

design/mvp/canonical-abi/definitions.py

Co-authored-by: Luke Wagner <mail@lukewagner.name>

adambratschikaye · 2026-03-26T15:40:44Z

Thanks for taking a look @lukewagner! I think I've addressed everything so far.

lukewagner

Thanks for addressing all the feedback! This is looking great; just a few more suggestions based on the last round of changes:

design/mvp/Explainer.md

design/mvp/canonical-abi/definitions.py

design/mvp/Explainer.md

lukewagner · 2026-03-26T18:07:25Z

design/mvp/canonical-abi/definitions.py


 def canon_context_set(t, i, thread, v):
-  assert(t == 'i32')
+  assert(t == 'i32' or t == 'i64')


Perhaps add an assert(v <= MASK_32BIT or t == 'i64')?

(reminder to add this)

design/mvp/Concurrency.md

design/mvp/CanonicalABI.md

lukewagner · 2026-03-26T19:23:47Z

design/mvp/CanonicalABI.md

+Strings are loaded from two pointer-sized values: a pointer (offset in linear
+memory) and a number of [code units]. There are three supported string encodings
+in [`canonopt`]: [UTF-8], [UTF-16] and `latin1+utf16`. This last option allows a
+*dynamic* choice between [Latin-1] and UTF-16, indicated by the 32nd bit of the


If we're passing i64s for string/list lengths, then it seems a bit weird to use the 32nd bit. It seems like either we should use i32 lengths or, if using i64 lengths, use the most-significant (64th) bit. I like i64s for regularity, while I like i32s for better capturing the length constraint. Perhaps a concrete tie-breaker is that, if these values get copied into heap structures, the i32 could perhaps be packed with other fields, reducing memory use? If you agree, could we switch the lengths to always be i32s. Or if not, then probably we should go back to your original parameterized utf16_tag(ptr_size) definition.

I'm not sure myself which one seems better.

One argument for i64 is that (similar to @alexcrichton's previous comment) it's closer to the signatures source languages expect, right?

I implemented using i32 anyway, but I realized two things:

Now it's a little bit inconsistent with lists - list lengths also could fit in i32, but use i64. We should probably change lists to match strings.

The packing won't work without further changes. The current logic for elem_size rounds up to alignment and we rely on this for lists where we assume no padding is needed between elements. So a string would need to have elem_size 16 on 64-bit even if the length is i32. Of course it wouldn't be that big of a change to allow the packing and it would also help the packing of other record types.

Yes, agreed that we should be consistent between string and list, whichever way we go. And you're right about packing in linear memory; I was more thinking about packing in some engine internal structure that holds arguments, but probably the same argument applies since things will get boxed generically, so probably there's no packing benefit.

I'm kindof on the fence: i64 looks like what you'd expect; otoh, by being not what you expect, i32 makes it clear that the runtime semantics is that you can't expect to pass any arbitrarily-large string/list that fits in your 64-bit memory. It is worth asking: "what if we do end up wanting to allow passing really-big lists/strings (without using stream)?" -- does i32 paint us into a corner? But for this scenario, it seems like probably the WIT interface should reflect "you need the ability to represent really-big strings/lists", in a way that encourages you to either use a 64-bit memory (or otherwise, do something fancy with a 32-bit lazy ABI), so then a new WIT big-string/big-list (names TBD) would be appropriate, and they'd have i64 lengths.

Personally I'd say we should go the way of i64, even if anything >32 bits can't actually be used. I feel that's more along the lines of what folks expect and it's easier to change semantics in the future than ABI (IMO at least). I'd agree though that the tag of utf16-or-latin1 would be in the high bit at all times

Yeah, I suppose it's a good idea to preserve optionality in this situation, so i64 sgtm w/ high bit set sgtm. Sorry for the churn @adambratschikaye!

No problem, it's back to the variable utf16_tag now.

lukewagner · 2026-03-26T20:30:16Z

design/mvp/CanonicalABI.md

  return load_list_from_valid_range(cx, ptr, length, elem_type)

 def load_list_from_valid_range(cx, ptr, length, elem_type):
+  trap_if(length * worst_case_elem_size(elem_type, cx.opts.memory.ptr_type()) > MAX_LIST_BYTE_LENGTH)


worst_case_elem_size has a nice intuitive definition, but I wonder if we can simplify implementers' lives for a relatively corner case by fixing a suitably-conservative constant so that it's simply trap_if(length * elem_size(elem_type, cx.opts.memory.ptr_type()) > MAX_LIST_BYTE_LENGTH). Following strings, to minimize the number of distinct ad hoc constants, maybe MAX_LIST_BYTE_LENGTH = 2 ** 28 - 1 as well?

Independently, I think this trap_if() should be moved into load_list_from_range() since load_list_from_valid_range() is called for stream buffers where we don't have the same "needs to be representable as an i32 constraints since streams can copy as much or little as the reader/writer agree on.

Co-authored-by: Luke Wagner <mail@lukewagner.name>

This reverts commit cca0beb.

lukewagner

Looks great, thanks! One small suggested tweak to string lengths.

One other request: I think we need a new production for resourcetype (in Explainer.md and Binary.md) to allow (rep i64) (right now we hard-code i32 in the grammar).

We also probably need to add a new emoji-gate for this new feature (in the list at the top of Explainer.md). These gates would be added to the new (rep i64) productions I just mentioned. Additionally, it would also be good to add it to the validation rule descriptions in CanonicalABI.md (e.g., see how 🔀 gates a validation rule for canon lower) for all the canonical definitions whose types vary based on memory.addrtype. I'll leave the choice of emoji up to you, but if you'd like inspiration: 🐘 is big and related to "memory"...

Other than that, this looks great to implement. One question is whether we think we'll get implementation feedback soon enough to want to keep the PR open until so we can simply iterate on it here, or whether we think it'd be more valuable to merge it now, and iterate in separate PRs if/when tweaks are found. Happy to hear what you and @alexcrichton think we should do.

design/mvp/canonical-abi/definitions.py

lukewagner · 2026-04-01T18:18:06Z

design/mvp/canonical-abi/definitions.py

  except UnicodeError:
    trap()

+  trap_if((tagged_code_units & ~tag) > MAX_STRING_CODE_UNITS)


In the utf8 and utf16 cases, if the high bit is set, I think it should trap, but I believe this definition would let it pass. I said "code units" in my last comment b/c I was reading the worst_case_size lowering logic, but looking more carefully at this surrounding lifting logic and trying to simplify things: since we're already computing byte_length and guarding on it anyways, what if we just trap_if(byte_length >= 2**28), just like with lists (putting this trap_if before the other 2 above, symmetric with load_list_from_range). Coincidentally, just like with lists (but for a completely different reason), the worst case is a doubling of byte length, which is conservatively covered by this definition.

lukewagner · 2026-04-01T18:24:34Z

design/mvp/canonical-abi/definitions.py


 def canon_context_set(t, i, thread, v):
-  assert(t == 'i32')
+  assert(t == 'i32' or t == 'i64')


(reminder to add this)

lukewagner · 2026-04-01T18:27:23Z

design/mvp/CanonicalABI.md


+The `MemInst` class represents a core WebAssembly [`memory` instance], with
+`bytes` corresponding to the memory's bytes and `addrtype` coming from the
+[`memory type`].


Suggested change

[`memory type`].

[`memtype`].

(and also several places below that use/define memory type)

lukewagner · 2026-04-01T19:00:56Z

design/mvp/Explainer.md

+of lowered functions may also depend on the [`core:memory-type`] of this memory,
+specifically its [`core:address-type`] (indicated by `memory.addrtype`), if pointers


Suggested change

of lowered functions may also depend on the [`core:memory-type`] of this memory,

specifically its [`core:address-type`] (indicated by `memory.addrtype`), if pointers

of lowered functions may also depend on the [`core:memtype`] of this memory,

specifically its [`core:addrtype`] (indicated by `memory.addrtype`), if pointers

(here and below)

lukewagner · 2026-04-01T19:08:06Z

design/mvp/Explainer.md

-`call_indirect`). Lastly, the indexed function is called in the new thread
-with `c` as its first and only parameter.
+`call_indirect`). Here the `table.addrtype` is either `i32` or `i64` as
+determined by the [`core:table-type`] of the table. Lastly, the indexed function


Suggested change

determined by the [`core:table-type`] of the table. Lastly, the indexed function

determined by the [`core:tabletype`] of the table. Lastly, the indexed function

(here and below)

Co-authored-by: Luke Wagner <mail@lukewagner.name>

Add memory64 and table64 support to the Canonical ABI

746f251

Parameterize the Canonical ABI to handle 32-bit or 64-bit memory addresses and table indices. This is done by adding two new fields to `LiftOptions` to indicate if the `memory64`/`table64` feature is being used in a core module.

alexcrichton reviewed Mar 17, 2026

View reviewed changes

adambratschikaye added 4 commits March 18, 2026 15:14

Correct types that are in host tables.

2d35ea9

Use existing canonical options and static parameters

a18568e

typo

99274e8

small fixes

2955e03

alexcrichton reviewed Mar 20, 2026

View reviewed changes

review comments

0fd86a2

lukewagner requested changes Mar 23, 2026

View reviewed changes

adambratschikaye and others added 15 commits March 24, 2026 10:47

some test cleanup

4eec09d

fix spacing

1a24955

Co-authored-by: Luke Wagner <mail@lukewagner.name>

fix spacing

dfa163f

Co-authored-by: Luke Wagner <mail@lukewagner.name>

fix spacing

8036409

Co-authored-by: Luke Wagner <mail@lukewagner.name>

inline table.addrtype

6824cc0

Co-authored-by: Luke Wagner <mail@lukewagner.name>

inline table.addrtype

b3d4804

Co-authored-by: Luke Wagner <mail@lukewagner.name>

fixup inline table.addrtype

24a917c

add MemInst class

940e989

inline memory defaults in tests

e06bda6

revert max string length to a const

0270ee3

replace PTR with memory.addrtype

97d6301

note on buffer size

322c1ba

clarify address type

f834bbb

mixed context types behavior

8ab246a

fix 32 bit mask

4fd7d8b

adambratschikaye force-pushed the abk/memory64 branch from 0b5e62d to 4fd7d8b Compare March 25, 2026 13:27

adambratschikaye added 4 commits March 25, 2026 14:05

transparent indexing on MemInst

50d68b6

undo old change

c0ad662

revert some small changes

70e876e

check list and string lengths when loading

520cf25

adambratschikaye marked this pull request as ready for review March 26, 2026 15:39

adambratschikaye requested a review from lukewagner March 26, 2026 15:39

lukewagner reviewed Mar 26, 2026

View reviewed changes

adambratschikaye and others added 5 commits March 30, 2026 13:33

Update design/mvp/Explainer.md

dada36d

Co-authored-by: Luke Wagner <mail@lukewagner.name>

Update design/mvp/Explainer.md

46c4efe

Co-authored-by: Luke Wagner <mail@lukewagner.name>

Update design/mvp/Explainer.md

cf35a6a

Co-authored-by: Luke Wagner <mail@lukewagner.name>

remove equal for MemInst

030f8df

simplify string load limit

1a7d3b6

adambratschikaye force-pushed the abk/memory64 branch from 57d46b2 to 1a7d3b6 Compare March 30, 2026 12:23

adambratschikaye added 4 commits March 31, 2026 07:15

simplify list length limits

70b298e

string length i32

cca0beb

Revert "string length i32"

1474305

This reverts commit cca0beb.

utf16 tag in high bit

f8deeff

adambratschikaye requested a review from lukewagner April 1, 2026 11:08

lukewagner reviewed Apr 1, 2026

View reviewed changes

remove python comment.

aba2786

Co-authored-by: Luke Wagner <mail@lukewagner.name>

		of lowered functions may also depend on the [`core:memory-type`] of this memory,
		specifically its [`core:address-type`] (indicated by `memory.addrtype`), if pointers

	determined by the [`core:table-type`] of the table. Lastly, the indexed function
	determined by the [`core:tabletype`] of the table. Lastly, the indexed function

Conversation

adambratschikaye commented Mar 17, 2026

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukewagner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adambratschikaye commented Mar 26, 2026

Uh oh!

lukewagner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukewagner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment