Signals #17

rennergade · 2024-09-13T03:24:05Z

Adding this issue as a tracker for implementing signals, which we'll pursue in phase 3.

rennergade · 2024-09-13T03:24:20Z

This thread seems to be very helpful in describing the current state: WebAssembly/WASI#166

rennergade · 2024-11-08T05:03:21Z

Adding a few links I found while looking ahead a bit on this:

Compiling for signals - https://withbighair.com/webassembly/2024/08/26/WebAssembly-and-signals.html

StoreExt - https://docs.rs/wasmtime/latest/wasmtime/unix/trait.StoreExt.html#tymethod.set_signal_handler

Possible PoC - https://cocalc.com/github/bytecodealliance/wasmtime/blob/main/tests/custom_signal_handler.rs

rennergade · 2024-11-18T17:43:55Z

Apparently exception handling is something being worked on which could be useful: https://github.com/WebAssembly/exception-handling

Also may be helpful for the problems with longjmp that @qianxichen233 reported.

rennergade · 2024-11-27T04:02:00Z

ByteCodeAlliance recently merged this PR to wasmtime: bytecodealliance/wasmtime#9614

Seems to have a potential to answer some of our problems here. @qianxichen233 can you try to get a handle on whats happening here and we'll discuss next week.

rennergade · 2024-12-14T21:57:12Z

There seems to be a working version of signals using wasmer/wasix: https://github.com/wasix-org/wasix.org/blob/69a419132c797fcd6e02768355395e9215bdaf32/pages/docs/language-guide/c/tutorials/signal.mdx#L5

This seems like its probably our best roadmap further if we can dig into whats happening here behind the scenes.

rennergade · 2024-12-19T01:07:06Z

Per @qianxichen233 and my chat with Jonathan it seems like we should figure out exactly what weird cases postgres actually introduces with signals. I'm going to go back and run through the native code to make sure I'm correct that in my assumption that postgres both calls fork() and longjmp from handlers.

qianxichen233 · 2024-12-19T20:24:08Z

If longjmp() is guaranteed to be called after fork(), then using the detached thread to run the signal handler might work, since longjmp does not require to suspend the main process as the state is already captured via setjmp previously.

In Linux as I tested, fork in signal handler would cause both signal handler and the main process being forked, but if longjmp is guaranteed to be called after fork(), that would mean the fork of main process is not going to happen, which would make things much more easier as we do not need to suspend the main process and capture its state and fork it. The logic would therefore becomes something like: fork is called in signal handler -> only the signal handler is forked -> the forked signal handler calls longjmp before returns to main process -> we restore the forked signal handler stack back to the saved longjmp state and let the parent process return normally

However, I still see an issue on using detached thread to run the signal handler. Since signal handler in linux runs synchronously (i.e. main process runs -> got signal, suspend main process, switch to signal handler -> finish signal handler and switch back to main process), which means the code running inside signal handler may not be thread-safe. If we are using a detached thread to run the signal handler, that might cause some race condition.

I have a potential idea on how to resolve the issue, though I am not very sure if this is viable. I am thinking about to use kernel signal to suspend the wasm execution: we register an uncommon used kernel signal (probably SIGUSR1) to the thread running the wasm module to a function in wasmtime, whenever an (emulated) signal is triggered targeting the thread, we raise SIGUSR1 to the thread, which should cause the wasm thread jump to the function in wasmtime which gets registered. And since we transferred the control from user code to wasmtime, we can suspend the thread until we are notified that the signal handler is returned. This way should allow us to suspend a thread I guess?

rennergade · 2024-12-20T22:49:21Z

@qianxichen233 and I discovered this function which seems like its exactly what we need.

We're still not sure how it will interact with asyncify which is a big issue. Qianxi is looking into that.

qianxichen233 · 2025-01-20T09:31:22Z

already mentioned in this issue, but also want to mention it here with more context about signal implementation specifically.

I did a minimal prototype for signal implementation, and I am able to run a process that stuck in a dead loop, but are able to jump to a custom user function that are able to do things like printf when host intended to trigger it.

However, fork is still not able to run within the custom user function (or signal handler). When I am trying to do fork within custom user function, I am getting unreachable instruction executed error, which is something I am expecting because I believe fork within signal handler shouldn't work correctly with current approach.

The reason why I believe simply combing Asyncify and epoch_interruption will not work:
Asyncify works by injecting some wasm code to support stack unwind/rewind. For example, one kind of code that Asyncify would inject is something like this:
Before Asyncify:

loop
	...some operations...
	call funcA()

After Asyncify:

loop
	if asyncify_state == normal
		...some operations...
		call funcA()
		if asyncify_state == unwind
			save_function_context()
			return

So stack unwind in Asyncify works by inserting a check right after each function return, if state becomes unwind, it will save current function context and return immediately.
But if we have epoch feature also added to the code, the logic would become something like this:
After Asyncify and epoch injection:

loop
	if epoch is triggered
		call epochCallback()

	if asyncify_state == normal
		...some operations...
		call funcA()
		if asyncify_state == unwind
			save_function_context()
			return

In this case, if fork is invoked within epochCallback(), it needs to perform a stack unwind. But after the asyncify_state switched to unwind and the wasm process returned from epochCallback(), the nescessary operations to unwind the stack is missing. Instead of saving the function context and returning immediately, it will continue to execute within the function. I am not sure what exactly is going to happen here, but I assume this will just break the design of Asyncify. And the error message from my test is pretty much an expected error message from this perspective: the design of Asyncify is broken and it went to unreachable blocks.
And we cannot easily switch the order to apply Asyncify and epoch insertion because Asyncify insertion happens at wasm level while epoch insertion happens at cranelift IR level. So epoch insertion must happen after Asyncify insertion.

One workaround for this is to implement our own epoch interruption at wasm level. The whole logic of epoch insertion is pretty simple: it just inserts a check at the beginning of every function and loop. To implement this, we might be able to utilize the existing infrastructure from Binaryen, which already has tons of passes that does wasm code transformation (Asyncify is one of them), and we can possibly implement our own Binaryen pass to do this. But this would incur several issues:

The implementation would not be trival. We might need someone with certain degree of knowledge of compiler design to do the task. It might also take some time for them to understand the Bineryen IR and codebase of Binaryen.
I believe the performance overhead would be much larger by doing the epoch insertion at wasm level rather than cranelift IR level.

JustinCappos · 2025-01-20T16:13:49Z

already mentioned in this issue, but also want to mention it here with more context about signal implementation specifically.

I did a minimal prototype for signal implementation, and I am able to run a process that stuck in a dead loop, but are able to jump to a custom user function that are able to do things like printf when host intended to trigger it.

However, fork is still not able to run within the custom user function (or signal handler). When I am trying to do fork within custom user function, I am getting unreachable instruction executed error, which is something I am expecting because I believe fork within signal handler shouldn't work correctly with current approach.

The reason why I believe simply combing Asyncify and epoch_interruption will not work: Asyncify works by injecting some wasm code to support stack unwind/rewind. For example, one kind of code that Asyncify would inject is something like this: Before Asyncify:
loop
	...some operations...
	call funcA()
After Asyncify:
loop
	if asyncify_state == normal
		...some operations...
		call funcA()
		if asyncify_state == unwind
			save_function_context()
			return
So stack unwind in Asyncify works by inserting a check right after each function return, if state becomes unwind, it will save current function context and return immediately. But if we have epoch feature also added to the code, the logic would become something like this: After Asyncify and epoch injection:
loop
	if epoch is triggered
		call epochCallback()

	if asyncify_state == normal
		...some operations...
		call funcA()
		if asyncify_state == unwind
			save_function_context()
			return
In this case, if fork is invoked within epochCallback(), it needs to perform a stack unwind. But after the asyncify_state switched to unwind and the wasm process returned from epochCallback(), the nescessary operations to unwind the stack is missing. Instead of saving the function context and returning immediately, it will continue to execute within the function. I am not sure what exactly is going to happen here, but I assume this will just break the design of Asyncify. And the error message from my test is pretty much an expected error message from this perspective: the design of Asyncify is broken and it went to unreachable blocks. And we cannot easily switch the order to apply Asyncify and epoch insertion because Asyncify insertion happens at wasm level while epoch insertion happens at cranelift IR level. So epoch insertion must happen after Asyncify insertion.

One workaround for this is to implement our own epoch interruption at wasm level. The whole logic of epoch insertion is pretty simple: it just inserts a check at the beginning of every function and loop. To implement this, we might be able to utilize the existing infrastructure from Binaryen, which already has tons of passes that does wasm code transformation (Asyncify is one of them), and we can possibly implement our own Binaryen pass to do this. But this would incur several issues:

The implementation would not be trival. We might need someone with certain degree of knowledge of compiler design to do the task. It might also take some time for them to understand the Bineryen IR and codebase of Binaryen.

I believe the performance overhead would be much larger by doing the epoch insertion at wasm level rather than cranelift IR level.

This is absolutely the sort of thing that Ben and Arjun would be able to give us solid guidance on. Let's ask them for their thoughts before we proceed.

rennergade · 2025-01-20T18:08:43Z

From their paper write up on signals they talk a bit about adding custom checks. It also seems like WALI has some compiler customization.

Definitely agree it would be useful to talk to them.

rennergade assigned rennergade, yzhang71 and qianxichen233 Nov 18, 2024

qianxichen233 mentioned this issue Jan 20, 2025

fixed epoch_interruption on new wasm instance creation #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signals #17

Signals #17

rennergade commented Sep 13, 2024

rennergade commented Sep 13, 2024

rennergade commented Nov 8, 2024

rennergade commented Nov 18, 2024

rennergade commented Nov 27, 2024

rennergade commented Dec 14, 2024

rennergade commented Dec 19, 2024

qianxichen233 commented Dec 19, 2024

rennergade commented Dec 20, 2024

qianxichen233 commented Jan 20, 2025

JustinCappos commented Jan 20, 2025

rennergade commented Jan 20, 2025

Signals #17

Signals #17

Comments

rennergade commented Sep 13, 2024

rennergade commented Sep 13, 2024

rennergade commented Nov 8, 2024

rennergade commented Nov 18, 2024

rennergade commented Nov 27, 2024

rennergade commented Dec 14, 2024

rennergade commented Dec 19, 2024

qianxichen233 commented Dec 19, 2024

rennergade commented Dec 20, 2024

qianxichen233 commented Jan 20, 2025

JustinCappos commented Jan 20, 2025

rennergade commented Jan 20, 2025