Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: shutdown logger after container process exits #2337

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mharwani
Copy link

Addresses #2313

Container task is not deleted after it exits, leaving the runtime shim and logger open throughout the lifecycle of a container. This can be a problem when there is unprocessed stdout data after a container task finishes and the logger is blocked forever waiting for a newline or EOF.

This PR fixes the issue by piping container output instead of reading directly from stdout and stderr file descriptors. Closing the io pipes upon task exit allows the logger to shutdown gracefully and process leftover data without a newline .

@mharwani mharwani marked this pull request as ready for review June 29, 2023 19:07
@fahedouch fahedouch self-requested a review July 1, 2023 22:35
@fahedouch fahedouch added this to the v1.5.0 milestone Jul 1, 2023
@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 97ef2b7 to 9d8a14f Compare July 10, 2023 17:31
@fahedouch
Copy link
Member

fahedouch commented Jul 10, 2023

Hi @mharwani, as mentioned here closing task io io.Close() after task exiting should close FIFOs => send EOF to stdio => logger will receive EOF

@mharwani
Copy link
Author

Hi @mharwani, as mentioned here closing task io io.Close() after task exiting should close FIFOs => send EOF to stdio => logger will receive EOF

Hi @fahedouch, I tried closing the task IO upon container exit with task.IO().Close(), but it did not fix the issue. The scanner did not receive EOF. I posted my thoughts here:

From what I understand, when a task is running in detached mode it uses logURI as the task IO, which doesn't have any FIFOs open and does nothing on close.

base := testutil.NewBase(t)
containerName := testutil.Identifier(t)
defer base.Cmd("rm", containerName).Run()
expected := []string{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected := []string{}
expected := make([]string, 10)

}

base.Cmd("run", "-d", "--name", containerName, testutil.CommonImage,
"sh", "-euc", "for i in `seq 1 10`; do echo $i; sleep 1; done").AssertOK()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"sh", "-euc", "for i in `seq 1 10`; do echo $i; sleep 1; done").AssertOK()
"sh", "-euc", "for i in `seq 1 10`; do echo $i; done").AssertOK()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added sleep to have both the container task and the logs command running concurrently

base.Cmd("run", "-d", "--name", containerName, testutil.CommonImage,
"sh", "-euc", "for i in `seq 1 10`; do echo $i; sleep 1; done").AssertOK()
base.Cmd("logs", "-f", containerName).AssertOutContainsAll(expected...)
base.Cmd("rm", "-f", containerName).AssertOK()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless regarding defer

t.Parallel()
base := testutil.NewBase(t)
containerName := testutil.Identifier(t)
defer base.Cmd("rm", containerName).Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer base.Cmd("rm", containerName).Run()
defer base.Cmd("rm", "-f", containerName).Run()

@@ -143,3 +143,49 @@ func TestLogsWithFailingContainer(t *testing.T) {
base.Cmd("logs", "-f", containerName).AssertNoOut("baz")
base.Cmd("rm", "-f", containerName).AssertOK()
}

func TestLogsWithRunningContainer(t *testing.T) {
Copy link
Member

@fahedouch fahedouch Jul 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what this test covers comparing to this one ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other test runs the log command after the container has stopped running, but this one tests logs during the task execution and as logs are being sent to the logger with the --follow option.

cmd/nerdctl/container_logs_test.go Outdated Show resolved Hide resolved
@@ -90,6 +91,8 @@ func Logs(ctx context.Context, client *containerd.Client, container string, opti
// Setup goroutine to send stop event if container task finishes:
go func() {
<-waitCh
// wait 100ms to let logViewer process data sent after container exit, if any
time.Sleep(100 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you experienced data not processed by the LogViewer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the logger may process additional data after container exits and there is no newline or EOF. For example, the following will not print any logs without time.Sleep:

sudo nerdctl run -d --name test alpine sh -c 'sleep 10; echo -n hello'
sudo nerdctl logs -f test

Sleeping for 100ms is undeterministic. There should be a better way to do this. Maybe somehow wait for the logger to finish instead of container wait?

Copy link
Member

@fahedouch fahedouch Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, waiting for the logger to complete instead of waiting for the container to exit gives a more accurate result

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't get a chance to work on it. I will send a new revision later today.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fahedouch To wait for the logger to complete, it has to send a signal to the LogViewer. I'm using file locks to accomplish this IPC. PTAL here and here. Thanks!

if err := driver.PreProcess(dataStore, config); err != nil {
return err
}
exit, err := getContainerWait(ctx, hostAddress, config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exit, err := getContainerWait(ctx, hostAddress, config)
exitCh, err := getContainerWait(ctx, hostAddress, config)

stderrR, stderrW := io.Pipe()
copyStream := func(reader io.Reader, writer *io.PipeWriter) {
// copy using a buffer of size 16K
buf := make([]byte, 16*1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
buf := make([]byte, 16*1024)
buf := make([]byte, 16<<10)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why choose a 16k buffer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to 32k as that is what containerd uses to copy stdio:
https://github.com/containerd/containerd/blob/main/cio/io.go#L34

pkg/logging/logging.go Show resolved Hide resolved
@@ -639,8 +639,9 @@ func writeCIDFile(path, id string) error {
}

// generateLogConfig creates a LogConfig for the current container store
func generateLogConfig(dataStore string, id string, logDriver string, logOpt []string, ns string) (logConfig logging.LogConfig, err error) {
func generateLogConfig(dataStore string, id string, logDriver string, logOpt []string, gOpt *types.GlobalCommandOptions) (logConfig logging.LogConfig, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why passing a pointer here gOpt *types.GlobalCommandOptions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 9d8a14f to 8763da2 Compare July 19, 2023 14:58
@mharwani
Copy link
Author

It looks like the environment setup is failing for some of the checks because curl cannot reach the endpoint https://raw.githubusercontent.com/AkihiroSuda/containerized-systemd/v0.1.1/docker-entrypoint.sh:
SSL: no alternative certificate subject name matches target host name 'raw.githubusercontent.com'

Any ideas?

@mharwani mharwani requested a review from fahedouch July 19, 2023 15:41
@AkihiroSuda
Copy link
Member

CI is now green for Linux.
Please try rebasing to make Windows CI green

Copy link
Member

@fahedouch fahedouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 8763da2 to 23b24a6 Compare July 31, 2023 05:08
@AkihiroSuda
Copy link
Member

Windows tests are failing

=== NAME  TestLogs
    container_logs_test.go:42: assertion failed: 
        Command:  C:\Windows\system32\config\systemprofile\go\bin\nerdctl.exe --namespace=nerdctl-test logs --since 10s nerdctl-testlogs
        ExitCode: 0
        Stdout:   
        Stderr:   
        Failures:
        Expected stdout to contain "foo\nbar"
=== CONT  TestLogsWithoutNewlineOrEOF
--- FAIL: TestLogs (4.60s)
=== NAME  TestLogsAfterRestartingContainer

https://cirrus-ci.com/task/4610614887186432?logs=build#L843

@@ -234,11 +246,24 @@ func loggerFunc(dataStore string) (logging.LoggerFunc, error) {
if err != nil {
return err
}
if err := ready(); err != nil {

lockFile := getLockPath(dataStore, config.Namespace, config.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be loggerLock is more explicit here

@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 3ce8c17 to 1df7878 Compare August 1, 2023 21:18
@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 1df7878 to 39fe0b3 Compare August 1, 2023 21:20
@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 39fe0b3 to 0050ac5 Compare August 1, 2023 21:30
LogURI string `json:"-"`
Driver string `json:"driver"`
Opts map[string]string `json:"opts,omitempty"`
HostAddress string `json:"host"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HostAddress string `json:"host"`
ContainerdAddress string `json:"containerdAddress"`

Ideally log-config.json should not contain the address, but it can be refactored later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a default value for HostAddress ?

@fahedouch fahedouch self-requested a review August 8, 2023 20:37
cmd/nerdctl/container_logs_test.go Show resolved Hide resolved
LogURI string `json:"-"`
Driver string `json:"driver"`
Opts map[string]string `json:"opts,omitempty"`
HostAddress string `json:"host"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a default value for HostAddress ?

for {
select {
case <-ctx.Done():
return nil, fmt.Errorf("timed out waiting for container task to start")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("timed out waiting for container task to start")
return nil, errors.New("timed out waiting for container task to start")


// If task was not found, it's possible that the container runtime is still being created.
// Retry every 100ms.
ticker := time.NewTicker(100 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ticker := time.NewTicker(100 * time.Millisecond)
ticker := time.NewTicker(100 * time.Millisecond)
defer ticker.Stop()

@fahedouch
Copy link
Member

@mharwani PTAL

@AkihiroSuda AkihiroSuda modified the milestones: v1.5.1, v1.5.2 Sep 11, 2023
@AkihiroSuda
Copy link
Member

ping @mharwani

@apostasie
Copy link
Contributor

Hey @mharwani

Would you still be interested in carrying this PR through?
If so, could you rebase (a lot has changed) and see about @fahedouch comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants