Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First WAMR function call takes significantly longer time due to stack boundary calculation #3966

Open
sjamesr opened this issue Dec 19, 2024 · 5 comments

Comments

@sjamesr
Copy link
Contributor

sjamesr commented Dec 19, 2024

Engine: WAMR Fast Intepreter

On the first call to wasm_runtime_call_wasm with HW bounds checking enabled, WAMR ultimately ends up calling call_wasm_with_hw_bound_check, which has this:

uint8 *
os_thread_get_stack_boundary()
{
  /* ... */
#ifdef __linux__
    if (pthread_getattr_np(self, &attr) == 0) {

Unfortunately this call to pthread_getattr_np is very slow on the main thread, an issue that has been noted in other projects (e.g. golang/go#68587)

In a particular environment we have, the first call takes 9ms, the second and subsequent calls take 0.5ms.

@sjamesr
Copy link
Contributor Author

sjamesr commented Dec 19, 2024

I think the problem is that this check in glibc pthreads returns false, so /proc/self/maps has to be read and parsed.

@sjamesr
Copy link
Contributor Author

sjamesr commented Dec 19, 2024

Would the WAMR project be amenable to moving the cost of stack boundary computation earlier in the process, e.g. in exec env creation? This is done in #3967.

sjamesr added a commit to sjamesr/wasm-micro-runtime that referenced this issue Dec 19, 2024
For boundary checking, WAMR calls `pthread_attr_np`, which is
unfortunately quite slow on Linux when not called on the main thread
(see bytecodealliance#3966
for discussion).

This change moves the cost of stack bounds checking earlier in the
wasm_exec_env creation process. The idea is that it's perhaps better to
pay the price when creating the execution environment rather than in the
first function call.

The original code is left in place inside
`call_wasm_with_hw_bound_check` in case the `wasm_exec_env` is created
via `wasm_runtime_spawn_exec_env`.
@sjamesr sjamesr changed the title First WAMR function call takes significantly longer time due to the OS hardware bound check First WAMR function call takes significantly longer time due to stack boundary calculation Dec 19, 2024
@sjamesr
Copy link
Contributor Author

sjamesr commented Dec 19, 2024

Just changed the title, because this happens with hardware bounds checking disabled as well.

@yamt
Copy link
Collaborator

yamt commented Dec 20, 2024

have you (or someone) reported the issue to glibc?

@sjamesr
Copy link
Contributor Author

sjamesr commented Dec 20, 2024

There is a discussion of a proposed fix here: https://sourceware.org/pipermail/libc-alpha/2022-September/141932.html, in response to https://internals.rust-lang.org/t/who-is-doing-read-proc-self-maps-1024-at-startup/17348/9. So the issue is known, but I don't think a solution has been agreed and there hasn't been any discussion for a while.

sjamesr added a commit to sjamesr/wasm-micro-runtime that referenced this issue Dec 20, 2024
For boundary checking, WAMR calls `pthread_attr_np`, which is
unfortunately quite slow on Linux when not called on the main thread
(see bytecodealliance#3966
for discussion).

This change moves the cost of stack bounds checking earlier in the
wasm_exec_env creation process. The idea is that it's perhaps better to
pay the price when creating the execution environment rather than in the
first function call.

The original code is left in place inside
`call_wasm_with_hw_bound_check` in case the `wasm_exec_env` is created
via `wasm_runtime_spawn_exec_env`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants