Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signals #17

Open
rennergade opened this issue Sep 13, 2024 · 11 comments
Open

Signals #17

rennergade opened this issue Sep 13, 2024 · 11 comments
Assignees

Comments

@rennergade
Copy link
Contributor

Adding this issue as a tracker for implementing signals, which we'll pursue in phase 3.

@rennergade
Copy link
Contributor Author

This thread seems to be very helpful in describing the current state: WebAssembly/WASI#166

@rennergade
Copy link
Contributor Author

@rennergade
Copy link
Contributor Author

Apparently exception handling is something being worked on which could be useful: https://github.com/WebAssembly/exception-handling

Also may be helpful for the problems with longjmp that @qianxichen233 reported.

@rennergade
Copy link
Contributor Author

ByteCodeAlliance recently merged this PR to wasmtime: bytecodealliance/wasmtime#9614

Seems to have a potential to answer some of our problems here. @qianxichen233 can you try to get a handle on whats happening here and we'll discuss next week.

@rennergade
Copy link
Contributor Author

There seems to be a working version of signals using wasmer/wasix: https://github.com/wasix-org/wasix.org/blob/69a419132c797fcd6e02768355395e9215bdaf32/pages/docs/language-guide/c/tutorials/signal.mdx#L5

This seems like its probably our best roadmap further if we can dig into whats happening here behind the scenes.

@rennergade
Copy link
Contributor Author

Per @qianxichen233 and my chat with Jonathan it seems like we should figure out exactly what weird cases postgres actually introduces with signals. I'm going to go back and run through the native code to make sure I'm correct that in my assumption that postgres both calls fork() and longjmp from handlers.

@qianxichen233
Copy link
Contributor

If longjmp() is guaranteed to be called after fork(), then using the detached thread to run the signal handler might work, since longjmp does not require to suspend the main process as the state is already captured via setjmp previously.

In Linux as I tested, fork in signal handler would cause both signal handler and the main process being forked, but if longjmp is guaranteed to be called after fork(), that would mean the fork of main process is not going to happen, which would make things much more easier as we do not need to suspend the main process and capture its state and fork it. The logic would therefore becomes something like: fork is called in signal handler -> only the signal handler is forked -> the forked signal handler calls longjmp before returns to main process -> we restore the forked signal handler stack back to the saved longjmp state and let the parent process return normally

However, I still see an issue on using detached thread to run the signal handler. Since signal handler in linux runs synchronously (i.e. main process runs -> got signal, suspend main process, switch to signal handler -> finish signal handler and switch back to main process), which means the code running inside signal handler may not be thread-safe. If we are using a detached thread to run the signal handler, that might cause some race condition.

I have a potential idea on how to resolve the issue, though I am not very sure if this is viable. I am thinking about to use kernel signal to suspend the wasm execution: we register an uncommon used kernel signal (probably SIGUSR1) to the thread running the wasm module to a function in wasmtime, whenever an (emulated) signal is triggered targeting the thread, we raise SIGUSR1 to the thread, which should cause the wasm thread jump to the function in wasmtime which gets registered. And since we transferred the control from user code to wasmtime, we can suspend the thread until we are notified that the signal handler is returned. This way should allow us to suspend a thread I guess?

@rennergade
Copy link
Contributor Author

@qianxichen233 and I discovered this function which seems like its exactly what we need.

We're still not sure how it will interact with asyncify which is a big issue. Qianxi is looking into that.

@qianxichen233
Copy link
Contributor

already mentioned in this issue, but also want to mention it here with more context about signal implementation specifically.

I did a minimal prototype for signal implementation, and I am able to run a process that stuck in a dead loop, but are able to jump to a custom user function that are able to do things like printf when host intended to trigger it.

However, fork is still not able to run within the custom user function (or signal handler). When I am trying to do fork within custom user function, I am getting unreachable instruction executed error, which is something I am expecting because I believe fork within signal handler shouldn't work correctly with current approach.

The reason why I believe simply combing Asyncify and epoch_interruption will not work:
Asyncify works by injecting some wasm code to support stack unwind/rewind. For example, one kind of code that Asyncify would inject is something like this:
Before Asyncify:

loop
	...some operations...
	call funcA()

After Asyncify:

loop
	if asyncify_state == normal
		...some operations...
		call funcA()
		if asyncify_state == unwind
			save_function_context()
			return

So stack unwind in Asyncify works by inserting a check right after each function return, if state becomes unwind, it will save current function context and return immediately.
But if we have epoch feature also added to the code, the logic would become something like this:
After Asyncify and epoch injection:

loop
	if epoch is triggered
		call epochCallback()

	if asyncify_state == normal
		...some operations...
		call funcA()
		if asyncify_state == unwind
			save_function_context()
			return

In this case, if fork is invoked within epochCallback(), it needs to perform a stack unwind. But after the asyncify_state switched to unwind and the wasm process returned from epochCallback(), the nescessary operations to unwind the stack is missing. Instead of saving the function context and returning immediately, it will continue to execute within the function. I am not sure what exactly is going to happen here, but I assume this will just break the design of Asyncify. And the error message from my test is pretty much an expected error message from this perspective: the design of Asyncify is broken and it went to unreachable blocks.
And we cannot easily switch the order to apply Asyncify and epoch insertion because Asyncify insertion happens at wasm level while epoch insertion happens at cranelift IR level. So epoch insertion must happen after Asyncify insertion.

One workaround for this is to implement our own epoch interruption at wasm level. The whole logic of epoch insertion is pretty simple: it just inserts a check at the beginning of every function and loop. To implement this, we might be able to utilize the existing infrastructure from Binaryen, which already has tons of passes that does wasm code transformation (Asyncify is one of them), and we can possibly implement our own Binaryen pass to do this. But this would incur several issues:

  1. The implementation would not be trival. We might need someone with certain degree of knowledge of compiler design to do the task. It might also take some time for them to understand the Bineryen IR and codebase of Binaryen.
  2. I believe the performance overhead would be much larger by doing the epoch insertion at wasm level rather than cranelift IR level.

@JustinCappos
Copy link
Member

already mentioned in this issue, but also want to mention it here with more context about signal implementation specifically.

I did a minimal prototype for signal implementation, and I am able to run a process that stuck in a dead loop, but are able to jump to a custom user function that are able to do things like printf when host intended to trigger it.

However, fork is still not able to run within the custom user function (or signal handler). When I am trying to do fork within custom user function, I am getting unreachable instruction executed error, which is something I am expecting because I believe fork within signal handler shouldn't work correctly with current approach.

The reason why I believe simply combing Asyncify and epoch_interruption will not work: Asyncify works by injecting some wasm code to support stack unwind/rewind. For example, one kind of code that Asyncify would inject is something like this: Before Asyncify:

loop
	...some operations...
	call funcA()

After Asyncify:

loop
	if asyncify_state == normal
		...some operations...
		call funcA()
		if asyncify_state == unwind
			save_function_context()
			return

So stack unwind in Asyncify works by inserting a check right after each function return, if state becomes unwind, it will save current function context and return immediately. But if we have epoch feature also added to the code, the logic would become something like this: After Asyncify and epoch injection:

loop
	if epoch is triggered
		call epochCallback()

	if asyncify_state == normal
		...some operations...
		call funcA()
		if asyncify_state == unwind
			save_function_context()
			return

In this case, if fork is invoked within epochCallback(), it needs to perform a stack unwind. But after the asyncify_state switched to unwind and the wasm process returned from epochCallback(), the nescessary operations to unwind the stack is missing. Instead of saving the function context and returning immediately, it will continue to execute within the function. I am not sure what exactly is going to happen here, but I assume this will just break the design of Asyncify. And the error message from my test is pretty much an expected error message from this perspective: the design of Asyncify is broken and it went to unreachable blocks. And we cannot easily switch the order to apply Asyncify and epoch insertion because Asyncify insertion happens at wasm level while epoch insertion happens at cranelift IR level. So epoch insertion must happen after Asyncify insertion.

One workaround for this is to implement our own epoch interruption at wasm level. The whole logic of epoch insertion is pretty simple: it just inserts a check at the beginning of every function and loop. To implement this, we might be able to utilize the existing infrastructure from Binaryen, which already has tons of passes that does wasm code transformation (Asyncify is one of them), and we can possibly implement our own Binaryen pass to do this. But this would incur several issues:

  1. The implementation would not be trival. We might need someone with certain degree of knowledge of compiler design to do the task. It might also take some time for them to understand the Bineryen IR and codebase of Binaryen.
  2. I believe the performance overhead would be much larger by doing the epoch insertion at wasm level rather than cranelift IR level.

This is absolutely the sort of thing that Ben and Arjun would be able to give us solid guidance on. Let's ask them for their thoughts before we proceed.

@rennergade
Copy link
Contributor Author

From their paper write up on signals they talk a bit about adding custom checks. It also seems like WALI has some compiler customization.

Definitely agree it would be useful to talk to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants