Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore signals, add pre-stop sleep #88

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

xakep666
Copy link

@xakep666 xakep666 commented Nov 8, 2022

Changes in pull-request:

  • Stop notification on signal channel after signal received. This recovers default signal behaviour (if no other notifiers registered) and allows to close application via Ctrl-C faster if any shutdown routine hangs.
  • Add new option for hook to sleep before actual execution. This needed for correct termination in k8s. See https://blog.palark.com/graceful-shutdown-in-kubernetes-is-not-always-trivial/ for detailed description.

This allows to restore default signal behaviour.
It's useful to faster termination when i.e. user pressed Ctrl-C, shutdown routine hangs, user presses Ctrl-C again and application terminates without waiting.
@skovtunenko
Copy link
Owner

@xakep666 , I will review the given article in some time, thank you

@@ -140,6 +141,10 @@ func (t *Terminator) waitShutdown(appCtx context.Context) {
go func(f Hook) {
defer runWg.Done()

if f.preStop > 0 {
Copy link

@BMalaichik BMalaichik Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Sleep will be called for every registered hook order despite the initial intention to wait once before executing all hooks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good idea to introduce global pre-stop sleep instead of per-hook one to resolve this?

Copy link

@BMalaichik BMalaichik Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO yes, what might be the point of having that for each hook? What are the use-cases ?

@@ -277,6 +279,39 @@ func TestTerminator_Wait(t *testing.T) {
}
}

func TestTerminator_restores_signals(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is more like a integration one, introducing a dedicated env variable to simulate the behavior.
To test the actual library end-to-end a separate integration test suite can be added, that can spawn a simple app with graterm configured and webserver waiting for requests

This timing-specific test relying on the scheduler are tricky to implement right, consider mocking time adding assertions that Sleep method was called

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test actually is not timing-specific because it tests signal default handler restoration. I'll try to test without sleeps between signals.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's scheduler timing-specific because you rely on time.Sleep which is not a good practice, tests are likely to be flaky especially on lower CPU resources machines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed sleeps before sending SIGINT's locally. Test doesn't fail. So it shouldn't be flaky at all because we're waiting for process termination and checking for specific flags.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xakep666 now it looks better, thank you. What about having a separate integration test suite in a separate file ? So we could leverage integration testing with this approach for other cases as well and not mix it with unit tests

Copy link
Author

@xakep666 xakep666 Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good idea but Suite from testify doesn't allow parallel runs so maybe just move this test to separate file. Probably with specific build tag.

return
}

cmd := exec.Command(os.Args[0], "-test.run=TestTerminator_restores_signal")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a good approach to introduce a recursion in tests, pls see the comment above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subprocess-based testing is a widely used technique
You can find a lot of articles mentions it. I.e. https://about.sourcegraph.com/blog/go/advanced-testing-in-go

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BMalaichik thank you for review! I implemented "preStop" because k8s' one relies on "sleep" availability inside container. But sometimes container doesn't contain it. I. e. container built from scratch with app only.

Hm, sleep is provided as a part of coreutils distribution package, having a size <3 MBs. I believe if the docker container is packed as minimalistic as possible for some reason graterm not likely to be used there as a dependency either :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Distroless" images from Google doesn't contain coreutils.

$ docker run --rm -it gcr.io/distroless/base sleep 1
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "sleep": executable file not found in $PATH: unknown.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking generally about this approach I believe it makes sense when you do integration testing of your CLI application, so it's easy an interesting approach to bind the test case to the CLI program execution, as in the given article they do in Hashicorp

For this particular case testing a package providing a utility structure it's an overkill IMO

@@ -55,6 +55,7 @@ func withSignals(ctx context.Context, chSignals chan os.Signal, sig ...os.Signal

// function invoke cancel once a signal arrived OR parent context is done:
go func() {
defer signal.Stop(chSignals)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@BMalaichik
Copy link

Generally SIGKILL + SIGTERM combination handling would be sufficient to gracefully handle the shutdown.
If you need to wait for some period to ensure the Pod is removed from the Endpoints you can register a Pod preStop hook with sleep 5; command, waiting before sending the SIGTERM command.
As per official documentation

Anyway, If there is a specific need to wait for some time before starting the shutdown this can be achieved in with registering a 1st order hook to wait for some time
to make the api in a k8s style a PreStop hook can be defined, however IMO for this minimalistic lib it's not needed.

@xakep666
Copy link
Author

xakep666 commented Dec 19, 2022

@BMalaichik thank you for review! I implemented "preStop" because k8s' one relies on "sleep" availability inside container. But sometimes container doesn't contain it. I. e. container built from scratch with app only.

@BMalaichik
Copy link

@xakep666 Have you considered as an alternative simply registering a first 0 order hook with time.Sleep(X seconds) ?
I believe it will do the job you need

@xakep666
Copy link
Author

@BMalaichik this should work as expected. But IMHO it looks weird when user should specify same value twice (inside hook and as hook timeout) and add it to global termination timeout.

@BMalaichik
Copy link

BMalaichik commented Dec 19, 2022

@BMalaichik this should work as expected. But IMHO it looks weird when user should specify same value twice (inside hook and as hook timeout) and add it to global termination timeout.

I propose to internally use it as a high-priority hook with the time limit > then the specified duration, to avoid scheduling uncertainties and random rejections, smth like

terminator.WithOrder(-1).Register(timePeriod * 1.5, func (_) { time.Sleep(timePeriod) })

This way you re-use already tested code not introducing additional complexity.

Also I think would be good to set it as a constructor argument instead of a Hook API, @xakep666 WDYT ?
I think we can add this if @skovtunenko has no objections

@xakep666
Copy link
Author

xakep666 commented Dec 19, 2022

@BMalaichik "twice" because

t.Register(time.Second, func(context.Context) { time.Sleep(time.Second) })

My initial idea was about decoupling pre-stop pause from global termination timeout. But maybe you're right and it's better not to do this. So for simplicity I can add method that registers hook with sleep always as first one.

UPD: sent this before your last comment. Actually we found the same approach :)

@BMalaichik
Copy link

@BMalaichik "twice" because

t.Register(time.Second, func(context.Context) { time.Sleep(time.Second) })

My initial idea was about decoupling pre-stop pause from global termination timeout. But maybe you're right and it's better not to do this. So for simplicity I can add method that registers hook with sleep always as first one.

yes, Constructor API doesn't accept extra params so it would be easier to add a method like BeforeAll or RegisterBeforeAll (needs to think more about the naming) that would bind internally to the list of before all hooks with same priority, e.g. -1
PreStop is not a good name because the whole idea of the terminator is about stop/termination, so I think classic hook naming might be used

@xakep666
Copy link
Author

xakep666 commented Dec 20, 2022

@BMalaichik how about adding just one function:

func Wait(ctx context.Context) { <-ctx.Done() }

and use it like

terminator.WithOrder(0).Register(3*time.Second, Wait)

This approach requires actually no changes in terminator and user don't have to specify sleep period twice.

Also maybe it will be good to add alias like

terminator.WithOrder(0).RegisterWait(3*time.Second)

And one more question: is it ok to force-push changes after discussion?

@BMalaichik
Copy link

@BMalaichik how about adding just one function:

func Wait(ctx context.Context) { <-ctx.Done() }

and use it like

terminator.WithOrder(0).Register(3*time.Second, Wait)

This approach requires actually no changes in terminator and user don't have to specify sleep period twice.

Also maybe it will be good to add alias like

terminator.WithOrder(0).RegisterWait(3*time.Second)

And one more question: is it ok to force-push changes after discussion?

it's up to the @skovtunenko , IMO whatever works for you, you can force push or open new fresh PR

After sleeping on our discussion I think adding new API methods to the lib is redundant, you can do a number of workaround using lib primitives, e.g. the last provided example by you OR even having a hook with time.Sleep seems not bad.

What might be the way to improve is to add a support for general beforeAll hook(s), standard way of intercepting the component lifecycle.
Let's wait what @skovtunenko replies

@xakep666
Copy link
Author

@skovtunenko @BMalaichik gentle bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants