-
-
Couldn't load subscription status.
- Fork 736
feat(machined): add panic/force mode reboot #12049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8e9d12f to
7e280cf
Compare
pkg/machinery/client/events.go
Outdated
| if msg == nil { | ||
| log.Printf("event not supported: %s", typeURL) | ||
| // We haven't implemented the handling of this event yet. | ||
| return nil, ErrEventNotSupported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to log here, or can we just wrap the returned error with the typeURL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was going to ask about this – I'm not sure about in general what we tend to do in talosctl, and (from what I saw) logging seems kind of inconsistent – I had to add this to even figure out what was going wrong because I was just getting very generic error output.
I'm happy to change the error returned, (after checking whether we're using ErrEventNotSupported isn't being used as a sentinel anywhere/compared to directly).
(As a sidenote, I wish we had a talosctl --debug ... and debug level logging.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird to wrap ErrEventNotSupported with the typeURL since the typeURL is a part of why that error happened, so I pushed a commit changing ErrEventNotSupported into a
type ErrEventNotSupported struct {
TypeURL string
}
...However, in practice we can't/shouldn't do this outside of a major version, since it changes a public API (hence me leaving it separate). In the meantime, we can either keep it as-is, or wrap it like suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't realize having an extra commit meant CI would complain/tests won't run 😭 I'll remove it then and just skip it for now.
internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_controller.go
Outdated
Show resolved
Hide resolved
internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_controller_test.go
Outdated
Show resolved
Hide resolved
7e280cf to
b666578
Compare
22ced7a to
a3ef956
Compare
30094cd to
42ef9d9
Compare
42ef9d9 to
ffe1c34
Compare
internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_controller.go
Outdated
Show resolved
Hide resolved
ffe1c34 to
c7c3b07
Compare
|
Looks like "regular" reboot is broken now (I was looking at |
f13d584 to
7a423e6
Compare
It might restart kube-apiserver (due to cert change) at a random moment. Signed-off-by: Andrey Smirnov <[email protected]>
In certain situations, Talos's shutdown/reboot sequence hangs while waiting for services/mounts to be gracefully stopped (see: siderolabs#11775). This patch adds a forceful mode to the reboot sequence (`talosctl reboot --mode force`) that bypasses graceful userspace teardown and hard reboots the machine. Signed-off-by: Laura Brehm <[email protected]>
`client.ErrEventNotSupported` was a simple sentinel with no information. Replaced it with `client.EventNotSupportedError`, a struct implementing error with the offending TypeURL included. Signed-off-by: Laura Brehm <[email protected]>
7a423e6 to
81260dd
Compare
Pull Request
closes #11782
What? (description)
Add a forceful mode to the reboot sequence (
talosctl reboot --mode force) that bypasses graceful userspace teardown.Why? (reasoning)
In certain situations, Talos's shutdown/reboot sequence hangs while waiting for services/mounts to be gracefully stopped (see: #11775).
I left a few questions as TODOs, PTAL at them if you review! Mainly I'm wondering about
*machine.RebootRequestvs.runtime.XxxOptions, and whether to error out if we don't get a*machine.RebootRequestor just silently continue (which is fine for this, since any hypothetical older client wouldn't support force-reboot anyway, and the default is "graceful reboot".Acceptance
Please use the following checklist:
make conformance)make fmt)make lint)make docs)make unit-tests)