-
Notifications
You must be signed in to change notification settings - Fork 151
Fixes a race condition in killing Sandboxes #959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
96a6fce to
d200c7c
Compare
| /// retrying until either: | ||
| /// - The signal is successfully delivered (VCPU transitions from running to not running) | ||
| /// - The VCPU stops running for another reason (e.g., call completes normally) | ||
| /// - No call is active (call_active=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it block? I thought it would just return false?
| // The virtualization stack can use this function to return the control | ||
| // of a virtual processor back to the virtualization stack in case it | ||
| // needs to change the state of a VM or to inject an event into the processor | ||
| debug!("Internal cancellation detected, returning Retry error"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4e211f2 to
4de305a
Compare
| /// This is acceptable because the generation tracking provides an additional | ||
| /// safety layer. Even if a stale kill somehow stamped cancel_requested, the | ||
| /// generation mismatch would cause it to be ignored. | ||
| call_active: AtomicBool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this flag is redundant. We will already not send any signals when calling kill if the vcpu is not running
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
…sted by us Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
cf14cfd to
f1aef3e
Compare
Signed-off-by: Simon Davies <[email protected]>
Signed-off-by: Simon Davies <[email protected]>
| /// | ||
| /// This is only called on Linux (KVM/MSHV). Windows uses a different interrupt mechanism. | ||
| #[cfg(any(kvm, mshv))] | ||
| fn increment_call_generation(&self) -> u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these all method are public right now. I recommend making a new private trait InterruptHandleImpl like so pub(crate) trait InterruptHandleImpl: InterruptHandle and store a dyn of this trait, but return it as a dyn InterruptHandle to the user
| } | ||
|
|
||
| // Default implementations for common methods | ||
| impl dyn InterruptHandle { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can put default method directly in the trait definition, no need to impl dyn InterruptHandle
| ) -> Result<ReturnValue> { | ||
| // Mark that a guest function call is now active | ||
| // (This also increments the generation counter internally) | ||
| self.vm.interrupt_handle().set_call_active(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could consider making this a function which takes closure like
start_call(||{
...rest of existing code
})
where the function calls set_call_active and clear_call_active for you (and calls your closure in between).
Similar to existing maybe_time_and_emit_host_call
Fixes a race condition where a sandbox kill arrives after a sandbox has successfully exited causing the subsequent run to fail.
There is a breaking change in this PR, previously if kill was called on an
InterruptHandlebefore or while a guest call was not in progress the next guest call made on theSandboxwould be cancelled , now this scenario is a no-op. kill only takes effect if there is a guest call running.