Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax requirements on assigning integers to resource handles #395

Open
alexcrichton opened this issue Sep 16, 2024 · 5 comments
Open

Relax requirements on assigning integers to resource handles #395

alexcrichton opened this issue Sep 16, 2024 · 5 comments

Comments

@alexcrichton
Copy link
Collaborator

Currently in the canonical ABI resources (own<T> and borrow<T>) are required to strictly follow the specification in terms of how integers are assigned to handles as they are created. Specifically it's required that engines implement per-resource tables with a slab allocation scheme which is a LIFO allocator for resource indices. In the context of this discussion on Zulip though it's been found that while this is a compact indexing scheme it has a drawback of creating situations which can be difficult to debug. For example creating an initial resource A gives handle index 1, and then creating an initial resource B also gives handle index 1. If these handle indices are accidentally passed to a function that wants resource C then all the runtime can do is say "unknown handle index". The runtime doesn't know whether the "1" is of type A or of type B.

The specific confusing case in question that came up on Zulip was that wasi:io/poll/[email protected] was created with index 1 and then passed to a function that wanted wasi:io/poll/[email protected]. This ended up (rightfully) not working and the runtime raised a trap, but it was a difficult-to-debug situation to determine that this was happening.

In discussion with some folks we had the idea of possibly relaxing the requirements of exactly how indices are allocated. I believe our rough conclusion is that the spec algorithm would change to allocating a random, but unique, index per resource type. This is a change from the slab allocation of today to be random instead (but still unique).

This is a small and subtle change, but the intention is to enable this to preserve the best guarantees that both guests and hosts today have without actually breaking anything in practice. Notably:

  • This enables true RNG generation of handle indices for "fuzzing" a component if desired. This can help weed out accidental mistakes in guests for example.
  • This enables the host to have a single table of all types of resources. Because the spec says handle indices are random there's no reason that "random" can't mean "it's sequential" for example. This would help solve the above issue because with a single table of all resources the host could provide a better error message.
  • All existing hosts which implement the component model already match this new specification. The "random" behavior just happens to look like a slab.

We'd probably want to document that randomness is not guaranteed so the handle index shouldn't be used as a seed for a CSPRNG for example, but other than that my hope is that while this would complicate the canonical ABI Python bits it would in the end grant hosts flexibility to choose the best indexing scheme to match their needs (or perhaps providing a configuration knob to select a particular indexing scheme)

@sunfishcode
Copy link
Member

A possible alternative here would be to leave the spec as-is, and just observe that it's common for debugging features to deviate from specs, and that implementations could have debugging modes where they randomize or ensure uniqueness across types or other things. This would support the fuzzing scenario.

It wouldn't support hosts using a single table for all types though. On the other hand, it would theoretically make it less likely that guest code could come to depend on hosts that use a single table for all types.

I don't have a strong opinion which way is best here; I just wanted to mention this approach as an option.

@lukewagner
Copy link
Member

I definitely see the value of making it easier to develop and catch bugs by having a unique index space shared by all resource types; it mostly just seems like a question of what the best technical approach is to achieving this.

One risk is that, whether or not the spec specifies deterministic indices, if popular runtimes only exhibit one behavior in practice during normal execution (e.g., unless the developer sets a flag), then code will end up accidentally depending on that one behavior and break when a runtime tries to take full advantage of the nondeterminism allowed (or not allowed) by the spec. This could inadvertently make the debugging use case worse because, when I flip the "catch bugs" flag, I might end up triggering some separate bug unrelated to the real bug I'm trying to track down.

One way to catch these accidental dependencies early is to have normal/default execution mode actively take advantage of the nondeterminism. For some types of nondeterminism (e.g., preemptive threads), this happens naturally. But I expect in the case of resource handles, runtimes mostly won't want to do this by default and will mostly just copy each other's behavior.

As an alternative to consider: what if we kept determinism but switched to a single resource table? While it's nice to eliminate the runtime type check w/ separate resource tables; I expect in practice this could be compiled down to a cheap branch that would be amortized by the overall call.

@alexcrichton
Copy link
Collaborator Author

I personally agree that a single table is probably better than what we have today for debuggabilty and "probably the same perf" reasons you mention. I'd also personally still prefer to at least try to spec random indices being possible, but I don't disagree that it seems unlikely to stick in practice.

@lukewagner
Copy link
Member

Thinking about this a bit more, one thing that seems potentially useful for bindings/runtime glue code is knowing that indices are mostly dense. If you can assume that, then if you want to associate state with a C-M table element (which I think will end up being common for async subtasks, streams and futures), you can simply maintain a mirror dense array in linear memory whereas, if the indices are sparse, you'd need a map of some sort which will be somewhat more expensive. That doesn't necessarily force determinism, but it does suggest against allowing random indices that range over [1,232].

@lukewagner
Copy link
Member

Thinking about this in the background and also chatting about it with various folks, I increasingly think our best option is to maintain determinism but assist in debugging wrong-resource-type errors and simplify implementations (particularly runtimes targeting #378 directly) by putting all resources into a single (per-component-instance) resources table (using the same deterministic Table.get/Table.remove logic, but now for just 1 table instead of N). This is also symmetric to how p3 recently defined a single waitables table populated by the futures, streams and subtasks of various types.

Given that this is a subtle change that probably won't break anyone now, but might as more code is generated over time, I'd propose making the change in-place now before more code gets generated that might break and require going through the whole motion of adding a canonopt, etc. How does that sound?

dicej added a commit to dicej/wasmtime that referenced this issue Nov 21, 2024
This addresses a couple of issues:

- Previously, we were passing task/stream/future/error-context reps directly to
  instances while keeping track of which instance had access to which rep.  That
  worked fine in that there was no way to forge access to inaccessible reps, but
  it leaked information about what other instances were doing.  Now we maintain
  per-instance waitable and error-context tables which map the reps to and from
  the handles which the instance sees.

- The `no_std` build was broken due to use of `HashMap` in
  `runtime::vm::component`, which is now fixed.

Note that we use one single table per instance for all tasks, streams, and
futures.  This is partly necessary because, when async events are delivered to
the guest, it wouldn't have enough context to know which stream or future we're
talking about if each unique stream and future type had its own table.  So at
minimum, we need to use the same table for all streams (regardless of payload
type), and likewise for futures.  Also, per
WebAssembly/component-model#395 (comment),
the plan is to move towards a shared table for all resource types as well, so
this moves us in that direction.

Signed-off-by: Joel Dice <[email protected]>
dicej added a commit to dicej/wasmtime that referenced this issue Nov 21, 2024
This adds support for loading, compiling, linking, and running components which
use the [Async
ABI](https://github.com/WebAssembly/component-model/blob/main/design/mvp/Async.md)
along with the [`stream`, `future`, and
`error-context`](WebAssembly/component-model#405) types.
It also adds support for generating host bindings such that multiple host
functions can be run concurrently with guest tasks -- without monopolizing the
`Store`.

See the [implementation RFC](bytecodealliance/rfcs#38)
for details, as well as [this
repo](https://github.com/dicej/component-async-demo) containing end-to-end smoke
tests.

This is very much a work-in progress, with a number of tasks remaining:

- [ ] Avoid exposing global task IDs to guests and use per-instance IDs instead
- [ ] Track `task.return` type during compilation and assert the actual and expected types match at runtime
- [ ] Ensure all guest pointers are bounds-checked when lifting, lowering, or copying values
- [ ] Reduce code duplication in `wasmtime_cranelift::compiler::component`
- [ ] Reduce code duplication between `StoreContextMut::on_fiber` and `concurrent::on_fiber`
- [ ] Minimize and/or document the use of unsafe code
- [ ] Add support for `(Typed)Func::call_concurrent` per the RFC
- [ ] Add support for multiplexing stream/future reads/writes and concurrent calls to guest exports per the RFC
- [ ] Refactor, clean up, and unify handling of backpressure, yields, and even polling
- [ ] Guard against reentrance where required (e.g. in certain fused adapter calls)
- [ ] Add integration test cases covering new functionality to tests/all/component_model (starting by porting over the tests in https://github.com/dicej/component-async-demo)
- [ ] Add binding generation test cases to crates/component-macro/tests
- [ ] Add WAST tests to tests/misc_testsuite/component-model
- [ ] Add support and test coverage for callback-less async functions (e.g. goroutines)
- [ ] Switch to back to upstream `wasm-tools` once bytecodealliance/wasm-tools#1895 has been merged and released

Signed-off-by: Joel Dice <[email protected]>

fix clippy warnings and bench/fuzzing errors

Signed-off-by: Joel Dice <[email protected]>

revert atomic.wit whitespace change

Signed-off-by: Joel Dice <[email protected]>

fix build when component-model disabled

Signed-off-by: Joel Dice <[email protected]>

bless component-macro expected output

Signed-off-by: Joel Dice <[email protected]>

fix no-std build error

Signed-off-by: Joel Dice <[email protected]>

fix build with --no-default-features --features runtime,component-model

Signed-off-by: Joel Dice <[email protected]>

partly fix no-std build

It's still broken due to the use of `std::collections::HashMap` in
crates/wasmtime/src/runtime/vm/component.rs.  I'll address that as part of the
work to avoid exposing global task/future/stream/error-context handles to
guests.

Signed-off-by: Joel Dice <[email protected]>

maintain per-instance tables for futures, streams, and error-contexts

Signed-off-by: Joel Dice <[email protected]>

refactor task/stream/future handle lifting/lowering

This addresses a couple of issues:

- Previously, we were passing task/stream/future/error-context reps directly to
  instances while keeping track of which instance had access to which rep.  That
  worked fine in that there was no way to forge access to inaccessible reps, but
  it leaked information about what other instances were doing.  Now we maintain
  per-instance waitable and error-context tables which map the reps to and from
  the handles which the instance sees.

- The `no_std` build was broken due to use of `HashMap` in
  `runtime::vm::component`, which is now fixed.

Note that we use one single table per instance for all tasks, streams, and
futures.  This is partly necessary because, when async events are delivered to
the guest, it wouldn't have enough context to know which stream or future we're
talking about if each unique stream and future type had its own table.  So at
minimum, we need to use the same table for all streams (regardless of payload
type), and likewise for futures.  Also, per
WebAssembly/component-model#395 (comment),
the plan is to move towards a shared table for all resource types as well, so
this moves us in that direction.

Signed-off-by: Joel Dice <[email protected]>

fix wave breakage due to new stream/future/error-context types

Signed-off-by: Joel Dice <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants