Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore signals, add pre-stop sleep #88

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions hook.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ type Hook struct {

order Order // order is Hook order.
name string // name is an optional component name for pretty-printing in logs.
preStop time.Duration // time to wait _before_ triggering shutdown hook.
timeout time.Duration // timeout is max hookFunc execution timeout.
hookFunc func(ctx context.Context) // hookFunc is a user-defined termination hook function.
}
Expand All @@ -40,6 +41,15 @@ func (h *Hook) WithName(name string) *Hook {
return h
}

// WithPreStopSleep sets (optional) period between signal receive and shutdown hook call. It does not depend on hook timeout.
//
// This needed for correct graceful termination of network services in Kubernetes.
// See https://blog.palark.com/graceful-shutdown-in-kubernetes-is-not-always-trivial/ for detailed information.
func (h *Hook) WithPreStopSleep(t time.Duration) *Hook {
h.preStop = t
return h
}

// Register registers termination [Hook] that should finish execution in less than given timeout.
//
// Timeout duration must be greater than zero; if not, timeout of 1 min will be used.
Expand Down
5 changes: 5 additions & 0 deletions terminator.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ func withSignals(ctx context.Context, chSignals chan os.Signal, sig ...os.Signal

// function invoke cancel once a signal arrived OR parent context is done:
go func() {
defer signal.Stop(chSignals)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

defer cancel()

select {
Expand Down Expand Up @@ -140,6 +141,10 @@ func (t *Terminator) waitShutdown(appCtx context.Context) {
go func(f Hook) {
defer runWg.Done()

if f.preStop > 0 {
Copy link

@BMalaichik BMalaichik Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Sleep will be called for every registered hook order despite the initial intention to wait once before executing all hooks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good idea to introduce global pre-stop sleep instead of per-hook one to resolve this?

Copy link

@BMalaichik BMalaichik Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO yes, what might be the point of having that for each hook? What are the use-cases ?

time.Sleep(f.preStop)
}

ctx, cancel := context.WithTimeout(context.Background(), f.timeout)
defer cancel()

Expand Down
35 changes: 35 additions & 0 deletions terminator_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@ package graterm
import (
"context"
"errors"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"log"
"os"
"os/exec"
"os/signal"
"runtime"
"sync"
Expand Down Expand Up @@ -277,6 +279,39 @@ func TestTerminator_Wait(t *testing.T) {
}
}

func TestTerminator_restores_signals(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is more like a integration one, introducing a dedicated env variable to simulate the behavior.
To test the actual library end-to-end a separate integration test suite can be added, that can spawn a simple app with graterm configured and webserver waiting for requests

This timing-specific test relying on the scheduler are tricky to implement right, consider mocking time adding assertions that Sleep method was called

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test actually is not timing-specific because it tests signal default handler restoration. I'll try to test without sleeps between signals.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's scheduler timing-specific because you rely on time.Sleep which is not a good practice, tests are likely to be flaky especially on lower CPU resources machines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed sleeps before sending SIGINT's locally. Test doesn't fail. So it shouldn't be flaky at all because we're waiting for process termination and checking for specific flags.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xakep666 now it looks better, thank you. What about having a separate integration test suite in a separate file ? So we could leverage integration testing with this approach for other cases as well and not mix it with unit tests

Copy link
Author

@xakep666 xakep666 Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good idea but Suite from testify doesn't allow parallel runs so maybe just move this test to separate file. Probably with specific build tag.

if os.Getenv("HANG_ON_STOP") == "1" {
terminator, ctx := NewWithSignals(context.Background(), syscall.SIGINT, syscall.SIGTERM)
require.NotNil(t, ctx)

terminator.WithOrder(0).Register(5*time.Second, func(ctx context.Context) {
select {}
})

_ = terminator.Wait(ctx, time.Minute)
return
}

cmd := exec.Command(os.Args[0], "-test.run=TestTerminator_restores_signal")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a good approach to introduce a recursion in tests, pls see the comment above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subprocess-based testing is a widely used technique
You can find a lot of articles mentions it. I.e. https://about.sourcegraph.com/blog/go/advanced-testing-in-go

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BMalaichik thank you for review! I implemented "preStop" because k8s' one relies on "sleep" availability inside container. But sometimes container doesn't contain it. I. e. container built from scratch with app only.

Hm, sleep is provided as a part of coreutils distribution package, having a size <3 MBs. I believe if the docker container is packed as minimalistic as possible for some reason graterm not likely to be used there as a dependency either :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Distroless" images from Google doesn't contain coreutils.

$ docker run --rm -it gcr.io/distroless/base sleep 1
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "sleep": executable file not found in $PATH: unknown.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking generally about this approach I believe it makes sense when you do integration testing of your CLI application, so it's easy an interesting approach to bind the test case to the CLI program execution, as in the given article they do in Hashicorp

For this particular case testing a package providing a utility structure it's an overkill IMO

cmd.Env = append(cmd.Env, "HANG_ON_STOP=1")
require.NoError(t, cmd.Start())

time.Sleep(100 * time.Millisecond)
require.NoError(t, cmd.Process.Signal(syscall.SIGINT))

time.Sleep(100 * time.Millisecond)
require.NoError(t, cmd.Process.Signal(syscall.SIGINT))

var exitErr *exec.ExitError
require.ErrorAs(t, cmd.Wait(), &exitErr)

ws, ok := exitErr.Sys().(syscall.WaitStatus)
require.Truef(t, ok, "exit error is not syscall.WaitStatus, but %T", exitErr.Sys())

assert.True(t, ws.Signaled(), "process stopped not by signal")
assert.Equal(t, syscall.SIGINT, ws.Signal(), "process stopped not by SIGINT")
}

func Test_withSignals(t *testing.T) {
t.Run("termination_on_SIGHUP", func(t *testing.T) {
rootCtx, cancel := context.WithCancel(context.Background())
Expand Down