Prototype support for async native functions #4237

rib · 2025-05-09T10:24:03Z

Based on some discussion in #3442 (comment); this experiments with enabling support for async NativeFunctions that are only async from the pov of the host, and appear as synchronous from within JavaScript.

Instead of running the async functions as a Promise via enqueue_job, this works by allowing Operations to be executed over multiple VM cycles, so an Operation may start some async work in one step and then further steps can poll for completion of that work and finish the Operation.

In particular this works by allowing Call Operations to return an OpStatus::Pending value that indicates that the same Call operation needs to be executed repeatedly, until it returns an OpStatus::Finished status.

In the case of a Pending status, the program counter is reset and anything that was taken off the stack is pushed back so the same Operation can be re-executed.

There is a new NativeFunction::from_async_as_sync_with_captures() that lets the host provide a (sync) closure that itself returns / spawns a boxed Future. This is tracked internally as an Inner::AsyncFn.

Whenever the function is __call__ed then (assuming the operation isn't already in a pending / running state) a new Future is spawned via the application's closure and the Operation enters a "pending" state.

When a NativeFunction is pending then each __call__ will poll() the spawned Future to see if the async function has a result.

This effectively stalls the VM at the same Opcode while still accounting for any cycle budget and periodically yielding to the application's async runtime while waiting for an async Call Operation to finish.

Limitations / Issues

Requires PC rewind and reverting stack state

Ideally operations that may complete over multiple steps would maintain a state machine via private registers, whereby it would not be necessary to repeatedly rewind the program counter and re-push values to the stack so that the operation can be decoded and executed repeatedly from the beginning.

Only adapts Call Operation

Currently only the Call Operation handles async NativeFunctions but there are other Call[XYZ] Operations that could be adapted too.

Not compatible with composite Operations that make function calls internally

The ability to track pending async functions is implemented in terms of repeatedly executing an Opcode in the VM until it signals that it's not Pending.

This currently relies on being able to reset and re-execute the Operation (such as reverting program counter and stack changes).

There are various Operations that make use of JsObject::call() internally and they would currently trigger a panic if they called an async NativeFunction because they would not be able to "resolve()" the "AsyncPending" status that would be returned by the call().

Ideally all Operations that use __call__ or __construct__ should support CallValue::AsyncPending and be fully resumable in the same way that the Call Operation is now.

This would presumably be easier to achieve with Rust Coroutines if they were stable because it would otherwise be necessary to adapt composite Operations into a state machine, similar to what the compiler does for an async Future, so they can yield for async function calls and be resumed by the VM.

Addresses: #3442

jedel1043 · 2025-06-21T19:08:17Z

core/engine/src/native_function/mod.rs

+                // FIXME: figure out how to work with an async Context / Waker from the application (e.g. from Tokio)
+                let waker = std::task::Waker::noop();
+                let mut context = std::task::Context::from_waker(waker);
+                let result = f.poll(&mut context);
+                match result {
+                    std::task::Poll::Pending => {
+                        //println!("Pending");
+                        AsyncCallResult::Pending
+                    }
+                    std::task::Poll::Ready(result) => {
+                        *state = AsyncCallState::None;
+                        AsyncCallResult::Ready(result)
+                    }
+                }
+            }
+        }
+    }


Sorry for the delay! I've been repeatedly looking at this PR again and I wondered, Couldn't we store a map of the "active" wakers in the Context itself, then try to poll all of them whenever the Context is trying to resume execution?

No worries - it's does seem fiddly to figure out.

I agree that this current polling approach isn't good enough as is

I think some notable issues are that:

It doesn't feel great to be materializing a special waker + context while these are things that you would normally expect your async runtime to own (i.e. tokio)

I think there's probably a risk of futures polled this way not integrating with your async runtime properly. Tokio sleep did seem to work for me when testing but futures that depend on mio polling might not get handled properly - I'm not 100% sure.

This approach effectively leads to a non-blocking busy loop on the CPU that will keep polling for async native functions to complete

In terms of:

Couldn't we store a map of the "active" wakers in the Context itself, then try to poll all of them whenever the Context is trying to resume execution?

I think my instinct is that, ideally this wouldn't involve any custom wakers at all.

The thing we have to poll currently is a boxed future which is currently saved in the NativeAsyncFunction::state, based on an assumption that any particular function can only have one active call at a time, but maybe it would make sense to store this operation state in the Context. It's conceptually like an internal register state for the operation.

If we could see the active/pending future from an async context, outside of the sync parts of the VM implementation (such as in run_async_with_budget) then we could do an .await there and avoid the need for any ad-hoc waker/context for manually polling.

That sounds like it could work, and hopefully address all the issues above.

... assuming it's possible to improve how the async native function futures are polled, I'd be curious to know what you think about the general idea of Operations having a Pending status that allows Operations to take multiple VM cycles to execute?

I think I was concerned about the possibility that other Operations may use call as an implementation detail.

I'm not entirely sure how practical it would be to audit the Operations that use call to ensure that they can all, also support returning a Pending status and be resumed after the async function finishes.

Or otherwise I'm not sure if it's feasible to assume that those operations will never call a native async function.

Without something like stable coroutines in Rust it seems like it might be a pain to have to manually support more-complex resumable operations that involve calls internally.

Maybe if there are only a small handful of operations that could internally call an async native function then it would be manageable to create a match based state machine manually to allow them to be resumable.

I've taken a pass at reworking this to avoid the need for manually polling futures. The boxed futures are now stashed in the Context and .await is used from Context::run_async_with_budget to wait for any native async function to complete.

This should avoid async runtime compatibility concerns (from polling with a task::Context and Waker that wasn't managed by the current runtime) and should also avoid polling for native function completion in a busy loop.

Ref: b97ab83

This experiments with enabling support for async NativeFunctions that are only async from the pov of the host, and appear as synchronous from within JavaScript. Instead of running the async functions as a Promise via enqueue_job, this works by allowing Operations to be executed over multiple VM cycles, so an Operation may start some async work in one step and then further steps can poll for completion of that work and finish the Operation. In particular this works by allowing Call Operations to return an `OpStatus::Pending`value that indicates that the same Call operation needs to be executed repeatedly, until it returns an `OpStatus::Finished` status. In the case of a `Pending` status, the program counter is reset and anything that was taken off the stack is pushed back so the same Operation can be re-executed. There is a new `NativeFunction::from_async_as_sync_with_captures()` that lets the host provide a (sync) closure that itself returns / spawns a boxed Future. This is tracked internally as an `Inner::AsyncFn`. Whenever the function is `__call__`ed then (assuming the operation isn't already in a pending / running state) a new Future is spawned via the application's closure and the Operation enters a "pending" state. When a NativeFunction is pending then each `__call__` will `poll()` the spawned `Future` to see if the `async` function has a result. This effectively stalls the VM at the same Opcode while still accounting for any cycle budget and periodically yielding to the application's async runtime while waiting for an async Call Operation to finish. Limitations / Issues ==================== == Busy Loop Polling == Even though the implementation does yield back to the application's async runtime when waiting for a NativeFunction to complete, the implementation isn't ideal because it uses a noop task Context + Waker when polling NativeFunction Futures. The effectively relies on the VM polling the future in a busy loop, wasting CPU time. A better solution could be to implement a shim Waker that would flag some state on the Boa engine Context, and then adapt the Future that's used to yield the VM to the executor so that it only becomes Ready once the async NativeFunction has signalled the waker. I.e. the Waker would act like a bridge/proxy between a spawned async NativeFunction and the the Future/Task associated with the VM's async `run_async_with_budget`. This way I think the VM could remain async runtime agnostic but would be able to actually sleep while waiting for async functions instead of entering a busy yield loop. == Requires PC rewind and reverting stack state == Ideally operations that may complete over multiple steps would maintain a state machine via private registers, whereby it would not be necessary to repeatedly rewind the program counter and re-push values to the stack so that the operation can be decoded and executed repeatedly from the beginning. == Only adapts Call Operation == Currently only the Call Operation handles async NativeFunctions but there are other Call[XYZ] Operations that could be adapted too. == Not compatible with composite Operations that `call()` == The ability to track pending async functions is implemented in terms of repeatedly executing an Opcode in the VM until it signals that it's not Pending. This currently relies on being able to reset and re-execute the Operation (such as reverting program counter and stack changes). There are lots of Operations that make use of JsObject::call() internally and they would currently trigger a panic if they called an async NativeFunction because they would not be able to "resolve()" the "Pending" status that would be returned by the `call()`. Ideally all Operations that use `__call__` or `__construct__` should be fully resumable in the same way that the Call Operation is now. This would presumably be easier to achieve with Rust Coroutines if they were stable because it would otherwise be necessary to adapt composite Operations into a state machine, similar to what the compiler does for an async Future, so they can yield for async function calls and be resumed by the VM.

This avoids manually polling boxed futures within the __call__ implementation for NativeAsyncFunctions. This instead stashes the boxed future under `Context::async_call` so that we can `.await` for it's completion within `Context::run_async_with_budget`. This approach should integrate much better with async runtimes like Tokio, since it doesn't involve manually polling with a task::Context + Waker that aren't managed by the current async runtime. This also means the thread can block waiting for async IO without polling for native function completion in a busy loop. This still needs further iteration, but hopefully serves as a usable draft / proof of concept.

rib mentioned this pull request May 9, 2025

Cancel/Interrupt active evaluation #3442

Open

rib force-pushed the rib/async-native-functions branch 3 times, most recently from 38f7f58 to 9f8760e Compare May 14, 2025 20:09

jedel1043 reviewed Jun 21, 2025

View reviewed changes

rib added 2 commits June 25, 2025 22:04

rib force-pushed the rib/async-native-functions branch from 9f8760e to b97ab83 Compare June 26, 2025 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Prototype support for async native functions #4237

Prototype support for async native functions #4237

Uh oh!

rib commented May 9, 2025 •

edited

Loading

Uh oh!

jedel1043 Jun 21, 2025

Uh oh!

rib Jun 25, 2025

Uh oh!

rib Jun 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Prototype support for async native functions #4237

Are you sure you want to change the base?

Prototype support for async native functions #4237

Uh oh!

Conversation

rib commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Limitations / Issues

Requires PC rewind and reverting stack state

Only adapts Call Operation

Not compatible with composite Operations that make function calls internally

Uh oh!

jedel1043 Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

rib Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

rib Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rib commented May 9, 2025 •

edited

Loading

rib Jun 26, 2025 •

edited

Loading