Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What breaks ABI? #3961

Open
g0djan opened this issue Dec 16, 2024 · 2 comments
Open

What breaks ABI? #3961

g0djan opened this issue Dec 16, 2024 · 2 comments

Comments

@g0djan
Copy link
Contributor

g0djan commented Dec 16, 2024

Hi, I have a POC of a feature that let's pause WASM VM with a signal interruption, write function indexes of stackframes to a file and resume.

To make it async-signal-safe(signal is delivered asynchronously with pthread_kill from another native thread) I need to have an atomic copy of this variable exec_env->wasm_stack->top https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/common/wasm_exec_env.h#L85 Aomic basically would mean that updating valued of the variable couldn't be interrupted by signal and the pointer holds a valid value.

I consider 3 options on how to store it but @loganek and I were discussing that it might break ABI:

  1. Make exec_env->wasm_stack->top atomic, I made a proof of concept of that . The idea is to check that sizeof uint8 and sizeof _Atomic(uint8) are the same and then set a compilation flag(needs to be added) to true otherwise don't enable the feature.
  2. Write the value to the end of the stack, it's usually not used by anyone. A bit harder to access from wamrc in my opinion but probably shouldn't break ABI for anything
  3. Just add a field to exec_env, but I believe it will break ABI, right?

I would appreciate an advice on that @wenyongh @yamt

Context: we need this feature to report stacktraces when WASM/AOT stuck somewhere (e.g. deadlock, infinite loop, a long operation). Sometimes the app unfreezes so we would like avoid crashing or terminating VM to get those stacktraces

@yamt
Copy link
Collaborator

yamt commented Dec 17, 2024

  • have you considered to use the existing check_suspend_flags mechanism used by wasm_cluster_suspend_thread, interpreter debug-engine, wasm_runtime_terminate, etc?
  • _Atomic is for threading and quite expensive. for signals it's overkill in general.

@g0djan
Copy link
Contributor Author

g0djan commented Dec 17, 2024

Hi, thanks for the quick reply.

have you considered to use the existing check_suspend_flags mechanism used by wasm_cluster_suspend_thread, interpreter debug-engine, wasm_runtime_terminate, etc?

We did consider wasm_runtime_terminate and I had an internal POC with it as well. But we chose to proceed with signal interruption approach because:

  1. It allows us avoid unnecessary app terminations
  2. I ran into problem with AOT, it would clear the frames in aot compiled code before I could access any to report(I managed to workaround it though).

We didn't really consider wasm_cluster_suspend_thread, I need to read the code. One complication might be calling it from another native thread. And another - we probably need to get a response that we can start read frames and I'm not sure it can be done if the app stuck.

_Atomic is for threading and quite expensive
Yes, we realise that. However today I just compared parallel-wasm-gzip compiled to AOT with my changes and without and compared benchmarks on compressing 1GB file with benchamrk measurement tool https://github.com/sharkdp/hyperfine

There were no clear signs of any degradation. (upd: I think I got some on low end device, I'll share later)

I ran compressing 1GB with each version for 10 iterations with hyperfine. I tried 4 times in total: 2 times version without my changes was better, 2 times version with my changes was better.

Here are 2 examples of output

godjan@ud6d5de19674f5f:~/github/wasm-parallel-gzip$ hyperfine --warmup 2 'make benchmark.wamr.nopatch' 'make benchmark.wamr'Benchmark 1: make benchmark.wamr.nopatch
 Time (mean ± σ): 19.877 s ± 2.828 s [User: 34.389 s, System: 1.771 s]
 Range (min … max): 17.379 s … 23.536 s 10 runs

Benchmark 2: make benchmark.wamr
 Time (mean ± σ): 20.840 s ± 2.867 s [User: 36.400 s, System: 1.808 s]
 Range (min … max): 17.319 s … 24.151 s 10 runs

Summary
 'make benchmark.wamr.nopatch' ran
 1.05 ± 0.21 times faster than 'make benchmark.wamr'

godjan@ud6d5de19674f5f:~/github/wasm-parallel-gzip$ hyperfine --warmup 2 'make benchmark.wamr' 'make benchmark.wamr.nopatch'
Benchmark 1: make benchmark.wamr
 Time (mean ± σ): 20.768 s ± 3.547 s [User: 36.232 s, System: 1.751 s]
 Range (min … max): 17.748 s … 27.061 s 10 runs

Benchmark 2: make benchmark.wamr.nopatch
 Time (mean ± σ): 23.088 s ± 3.942 s [User: 39.985 s, System: 1.963 s]
 Range (min … max): 17.667 s … 29.005 s 10 runs

Summary
 'make benchmark.wamr' ran
 1.11 ± 0.27 times faster than 'make benchmark.wamr.nopatch'

Also I compared it with our app on a slow device(ARM Cortex A9 dual core 1 GHz from 2014) and there were also no clear winner, sometimes version with my changes was faster sometimes without, I was profiling iteration of the loop that covers everything.

I checked that the AOT binaries that I ran were indeed different and it's Intermidiate representation code was different in exactly atomic store/load operations.

for signals it's overkill in general.

From https://en.cppreference.com/w/cpp/utility/program/signal I found it the only non UB way to access the data. In practice reading regular pointer in signal handler never caused me any problems in my testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants