Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Cleaning up orphaned directories and files when containers creat… #3457

Merged
merged 1 commit into from
Sep 28, 2024

Conversation

haytok
Copy link
Contributor

@haytok haytok commented Sep 23, 2024

…ion fails

When trying to create new containers, the following directories are created based on the new container ID:

  • /var/lib/nerdctl/1935db59/containers/<namespace>/<container id>
  • /var/lib/nerdctl/1935db59/etchosts/<namespace>/<container id>

If containers with existing names are attempted to be created, the processes fail.
However, in the events of failures, these directories are not cleaned up.

This issue is reported in the following:

This commit resolves the issue by cleaning up the mentioned directories when containers creation fails.

@AkihiroSuda AkihiroSuda added this to the v2.0.0 milestone Sep 23, 2024
Copy link
Member

@djdongjin djdongjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/dnsutil/hostsstore/hostsstore.go Outdated Show resolved Hide resolved
}
}

func generateRemoveOrphanedDirsFunc(ctx context.Context, id, dataStore string, internalLabels internalLabels) func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are we deleting multiple dirs, or only 1. If only 1, we should name it generateRemoveOrphanedDirFunc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it deletes one or more of the following directories, the function name is Dirs.

  • /var/lib/nerdctl/1935db59/containers/<namespace>/<container id>
  • /var/lib/nerdctl/1935db59/etchosts/<namespace>/<container id>

@haytok
Copy link
Contributor Author

haytok commented Sep 24, 2024

Hi, @djdongjin

Thanks for comments !!!

Is this testable? E.g., can we add a test in https://github.com/containerd/nerdctl/blob/772feb40ee9d0d51db6da8ac3a8f737bbb48475f/cmd/nerdctl/container/container_create_linux_test.go?

I think it is difficult to implement tests based on the number of files in the directory under /var/lib/nerdctl/1935db59/ in unit tests.

Is there any E2E test that can perform tests based on the number of local files ... ????

@apostasie
Copy link
Contributor

apostasie commented Sep 24, 2024

Is there any E2E test that can perform tests based on the number of local files ... ????

Here is an e2e test for you that does it (the tricky part is to access the right location):

func TestIssue2993(t *testing.T) {
	nerdtest.Setup()

	testCase := &test.Case{
		Description: "Issue #2993 - nerdctl is no longer leaking etchosts dirs & files on failure",
		// Make sure we are in a private env so that other tests do not pollute this
		Require: nerdtest.Private,
		// Run a first container successfully
		Setup: func(data test.Data, helpers test.Helpers) {
			helpers.Ensure("run", "--name", data.Identifier(), testutil.CommonImage)
		},
		Cleanup: func(data test.Data, helpers test.Helpers) {
			helpers.Anyhow("rm", "-f", data.Identifier())
		},
		// Try to run a second container with the same name
		Command: func(data test.Data, helpers test.Helpers) test.Command {
			return helpers.Command("run", "--name", data.Identifier(), testutil.CommonImage)
		},
		Expected: func(data test.Data, helpers test.Helpers) *test.Expected {
			return &test.Expected{
				// We expect the command to fail
				ExitCode: 1,
				// As the name is already used
				Errors: []error{errors.New("is already used by")},
				Output: func(stdout string, info string, t *testing.T) {
					// Get the test DataRoot, the containerd address, and get the dataStore location
					base := testutil.NewBase(t)
					dataStore, err := clientutil.DataStore(string(data.ReadConfig(nerdtest.DataRoot)), base.ContainerdAddress())
					assert.NilError(t, err, "data store failed")
					// Now, we are interested in "etchosts", and in the Namespace (data.Identifier() with `nerdtest.Private`)
					etchostsdir := filepath.Join(dataStore, "etchosts", data.Identifier())
					// Read the dir
					files, err := os.ReadDir(etchostsdir)
					assert.NilError(t, err, fmt.Sprintf("failed reading %s", etchostsdir))
					// We are expecting only one directory there, as the other container should have deleted its directory on failure
					assert.Equal(t, len(files), 1, "there should be only 1 subdirectory in the etchosts store")
				},
			}
		},
	}

	testCase.Run(t)
}

Obviously, the important part is the assert.Equal(t, len(files), 1 which verifies that the failed container did not leave a second directory behind.

@haytok haytok force-pushed the fix-to-clean-up-orphaned-files branch from 30416e3 to 9baf881 Compare September 24, 2024 05:46
@haytok
Copy link
Contributor Author

haytok commented Sep 24, 2024

Thanks !!!!!!!
I'll add test for this bug.

@apostasie
Copy link
Contributor

Pending addition of a test validating the fix, LGTM.

CI failures are unrelated (windows regression #3437).

@haytok
Copy link
Contributor Author

haytok commented Sep 25, 2024

Added tests work well in my local env, but they fail in GitHub Actions.

Details

=== RUN TestIssue2993

[444](https://github.com/containerd/nerdctl/actions/runs/11031877260/job/30639560230?pr=3457#step:7:445)=== RUN TestIssue2993/Issue_#2993_-_nerdctl_no_longer_leaks_containers_and_etchosts_directories_and_files_when_container_creation_fails_or_a_container_is_removed.

[445](https://github.com/containerd/nerdctl/actions/runs/11031877260/job/30639560230?pr=3457#step:7:446) container_create_linux_test.go:216: assertion failed: error is not nil: open /tmp/1935db59/containers/nerdctl-test: no such file or directory

[446](https://github.com/containerd/nerdctl/actions/runs/11031877260/job/30639560230?pr=3457#step:7:447)=== RUN TestIssue2993/Issue_#2993_-_nerdctl_no_longer_leaks_containers_and_etchosts_directories_and_files_when_container_creation_fails_or_a_container_is_removed.#01

[447](https://github.com/containerd/nerdctl/actions/runs/11031877260/job/30639560230?pr=3457#step:7:448) container_create_linux_test.go:235: assertion failed: error is not nil: open /tmp/1935db59/containers/nerdctl-test: no such file or directory

[448](https://github.com/containerd/nerdctl/actions/runs/11031877260/job/30639560230?pr=3457#step:7:449)--- FAIL: TestIssue2993 (0.10s)

[449](https://github.com/containerd/nerdctl/actions/runs/11031877260/job/30639560230?pr=3457#step:7:450) --- FAIL: TestIssue2993/Issue_#2993_-_nerdctl_no_longer_leaks_containers_and_etchosts_directories_and_files_when_container_creation_fails_or_a_container_is_removed. (0.06s)

[450](https://github.com/containerd/nerdctl/actions/runs/11031877260/job/30639560230?pr=3457#step:7:451) --- FAIL: TestIssue2993/Issue_#2993_-_nerdctl_no_longer_leaks_containers_and_etchosts_directories_and_files_when_container_creation_fails_or_a_container_is_removed.#01 (0.00s)

There seems to be a problem with the creation of the data-root (/tmp/1935db59/) in the test, so I'll investigate.

If there are any changes that need to be made about the test, I would like your advise


Updated

I forgot to run go test -p 1 ../../../cmd/nerdctl/... -run "TestIssue2993" -args -test.target=docker in my local env.

When running the command, I can check the failure of tests occured in GitHub Actions.

Details

 /Users/haytok/workspace/nerdctl/cmd/nerdctl/container [fix-to-clean-up-orphaned-files]
> go test -p 1 ../../../cmd/nerdctl/... -run "TestIssue2993" -args -test.target=docker
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl	0.005s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/apparmor	0.004s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/builder	0.004s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/completion	0.004s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/compose	0.005s [no tests to run]
test target: "docker"
[rm --data-root /tmp -f nerdctl-testissue2993]
[run --name nerdctl-testissue2993 --data-root /tmp -d ghcr.io/stargz-containers/alpine:3.13-org sleep infinity]
--- FAIL: TestIssue2993 (0.04s)
    --- FAIL: TestIssue2993/Issue_#2993_-_nerdctl_no_longer_leaks_containers_and_etchosts_directories_and_files_when_container_creation_fails_or_a_container_is_removed. (0.02s)
        container_create_linux_test.go:216: assertion failed: error is not nil: open /tmp/1935db59/containers/nerdctl-test: no such file or directory
    --- FAIL: TestIssue2993/Issue_#2993_-_nerdctl_no_longer_leaks_containers_and_etchosts_directories_and_files_when_container_creation_fails_or_a_container_is_removed.#01 (0.00s)
        container_create_linux_test.go:235: assertion failed: error is not nil: open /tmp/1935db59/containers/nerdctl-test: no such file or directory
FAIL
FAIL	github.com/containerd/nerdctl/v2/cmd/nerdctl/container	0.042s
?   	github.com/containerd/nerdctl/v2/cmd/nerdctl/helpers	[no test files]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/image	0.005s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/inspect	0.004s [no tests to run]
?   	github.com/containerd/nerdctl/v2/cmd/nerdctl/internal	[no test files]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/ipfs	0.004s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/login	0.004s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/namespace	0.004s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/network	0.005s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/system	0.007s [no tests to run]
ok  	github.com/containerd/nerdctl/v2/cmd/nerdctl/volume	0.005s [no tests to run]
FAIL

Therefore, I'll investigate how to retrieve dataStore in test code.

@@ -174,3 +180,72 @@ func TestCreateWithTty(t *testing.T) {
withTtyContainer := base.InspectContainer(withTtyContainerName)
assert.Equal(base.T, 0, withTtyContainer.State.ExitCode)
}

// TestIssue2993 tests https://github.com/containerd/nerdctl/issues/2993
func TestIssue2993(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to mark this test as DockerIncompatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !
Added testutil.DockerIncompatible(t) to tests.


// TestIssue2993 tests https://github.com/containerd/nerdctl/issues/2993
func TestIssue2993(t *testing.T) {
base := testutil.NewBase(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting you use a dedicated namespace for that test, so that we can parallelize the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed to specify independent namespaces for each test case.

@haytok
Copy link
Contributor Author

haytok commented Sep 26, 2024

Hi, @apostasie

Thanks for the review.

I have added tests for this issue, so could you re-review when you have time ?

CI failure is related to this comment (#3457 (comment))

Copy link
Contributor

@apostasie apostasie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haytok !

Left a few comments on tests.

If you want, there is more info about tests principles here: https://github.com/containerd/nerdctl/blob/main/docs/testing/README.md#principles

And details about a new tooling framework here:
https://github.com/containerd/nerdctl/blob/main/docs/testing/tools.md

containerName: "failure-of-container-creation",
namespace: "failure-of-container-creation",
command: func(t *testing.T, containerName, namespace, containersPath, etchostsPath string) {
base := testutil.NewBaseWithNamespace(t, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use testutil.Identifier(t) for the namespace name.
Same thing for container name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !!! Fixed.

assert.NilError(t, err)
etchostsDirCounts := len(etchostsDirs)

base.Cmd("rm", "-f", containerName).Run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also want to make sure you run this ^ before anything else, so that leftovers from previous tests do not make you fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed based on the testing principles.

command: func(t *testing.T, containerName, namespace, containersPath, etchostsPath string) {
base := testutil.NewBaseWithNamespace(t, namespace)

defer base.Cmd("rm", "-f", containerName).Run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use t.Cleanup instead of defer.
Also make sure to call your cleanup routine ALSO before you start doing anything to take care of leftovers from previous tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !!! fixed based on the testing principles.

@haytok
Copy link
Contributor Author

haytok commented Sep 27, 2024

The local branch was out of date and I was unaware that there was documentation for the test.
Thanks for sharing!
I will check the documentation and fix it.

@haytok haytok force-pushed the fix-to-clean-up-orphaned-files branch from 5e6bfc3 to 833cb72 Compare September 27, 2024 14:55
Copy link
Member

@djdongjin djdongjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

cmd/nerdctl/container/container_create_linux_test.go Outdated Show resolved Hide resolved
@apostasie
Copy link
Contributor

…ion fails

When trying to create new containers, the following directories are
created based on the new container ID:

  - `/var/lib/nerdctl/1935db59/containers/<namespace>/<container id>`
  - `/var/lib/nerdctl/1935db59/etchosts/<namespace>/<container id>`

When containers with existing names are attempted to be created, the
processes fail.
However, in the events of failures, these directories are not cleaned up.

This issue is reported in the following:

  - containerd#2993

This commit resolves the issue by cleaning up the mentioned directories
when containers creation fails.

Also, It has also been modified so that the following directory is also
cleaned up when the container is removed.

  - `/var/lib/nerdctl/1935db59/etchosts/<namespace>/<container id>`

Note that tests of logic to fix bug in Issue containerd#2993 are added based on the
following testing principles.

  - https://github.com/containerd/nerdctl/blob/main/docs/testing/tools.md

Signed-off-by: Hayato Kiwata <[email protected]>
@haytok haytok force-pushed the fix-to-clean-up-orphaned-files branch from 4cec857 to 2c2745e Compare September 28, 2024 05:20
@haytok
Copy link
Contributor Author

haytok commented Sep 28, 2024

Hi, @apostasie

I have squashed all commits.

@apostasie
Copy link
Contributor

@haytok appreciate all your efforts here!
Looks good.
Now to maintainers.
@djdongjin @AkihiroSuda

@djdongjin djdongjin merged commit cdd9a79 into containerd:main Sep 28, 2024
22 checks passed
@haytok haytok deleted the fix-to-clean-up-orphaned-files branch September 29, 2024 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants