Remove port-forwarding from Kubernetes E2E tests #9575

jbohanon · 2024-06-04T15:28:12Z

Description

Port forwarding is not a pattern that is used in user deployments, but is a quick hack to get networking into a cluster. Since our intent with Kubernetes E2E tests is to imitate user patterns, we opt to eliminate the port forwards from the Kubernetes E2E tests altogether.

Note: for the specific use case of accessing envoy admin endpoints, this is not a particularly user-facing behavior anyway, so we have chosen to run these as a curl from within the proxy pod

The port-forwarding utility did not by default wait for the port-forward to be available before returning non-nil. This PR adds the dial logic to the utility.

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works

solo-changelog-bot · 2024-06-04T15:28:23Z

Issues linked to changelog:
solo-io#9353

jbohanon · 2024-06-04T15:36:39Z

/kick

github-actions · 2024-06-04T15:49:59Z

Visit the preview URL for this PR (updated for commit a392448):

https://gloo-edge--pr9575-jbohanon-port-forwar-vd8cg4zt.web.app

_{(expires Fri, 21 Jun 2024 17:00:42 GMT)}

_{🔥 via Firebase Hosting GitHub Action 🌎}

_{Sign: 77c2b86e287749579b7ff9cadb81e099042ef677}

pkg/utils/kubeutils/portforward/api_forwarder.go

pkg/utils/kubeutils/portforward/portforwarder.go

pkg/utils/kubeutils/portforward/portforwarder_test.go

…io/gloo into jbohanon/port-forward-safety

jbohanon · 2024-06-04T19:10:13Z

work in progress label added to investigate CI-only failure in Kubernetes End-to-End cluster-two

Removal criteria:

Revert changes in test selection
Remove CI skips in changelog
Actually passing tests

jbohanon · 2024-06-13T13:28:08Z

test/testutils/hack/go-utils/testutils/kubectl/kubectl.go

+					Stdout: params.Stdout,
+					Stderr: params.Stderr,
+					Env:    params.Env,
+					Args:   []string{"describe", "pod", "-l", label},


add namespace

jbohanon · 2024-06-13T13:28:37Z

test/testutils/hack/go-utils/testutils/kubectl/kubectl.go

+}
+
+func KubeLogs(ctx context.Context, label string, params Params) string {
+	params.Args = []string{"logs", "-l", label}


add namespace

sam-heilbron

nice! I haven't looked too in depth at the implementation, but I shared some high level thoughts. Could we make sure the following is completed:

This sentiment around port-forwarding should be captured in our e2e readme
The PR body should clarify why we believe this will reduce flakes and what approvers can do to validate that
There is a lot of new code, how will we prevent developers from calling the wrong code at hte wrong time?

sam-heilbron · 2024-06-13T15:28:54Z

test/testutils/hack/go-utils/testutils/kubectl/kubectl.go

@@ -0,0 +1,193 @@
+package kubectl


I'm not a huge fan of placing a new "hack" directory into our testutils code. It feels like we're creating a potential to add code that will never leave. Why not add the utilities directly to the places we want, and clarify in the code that we would like to progressively migrate some of these components to a shared library when they are stable?

I feel that having a hack directory in the code is a huge smell that creates an underlying tension indicating this shouldn't be here. I would think that this code would be more likely to assimilate itself into the codebase and never end up being extracted back out if we interspersed it with permanent code. I would rather deal with the pain of inter-repo PRs than risk a bespoke Kubectl function staying in this repo for no reason other than it hasn't been looked at in months.

In either case I think an issue should be opened to track migrating the code back to the shared util.

sam-heilbron · 2024-06-13T15:30:10Z

test/kubernetes/testutils/cmd/envoyadmin/envoyadmin.go

+	"strings"
+
+	"github.com/solo-io/gloo/pkg/utils/cmdutils"
+	"github.com/solo-io/gloo/pkg/utils/envoyutils/admincli"


How is this different than the admincli in the envoyutils package? If this is a better place for that code, can we just move it all to live in the same place?

This could probably use some rebranding/relocation to indicate that it is a utility for accessing the admin client from inside the cluster. I could possibly refactor the admincli to take a Cmder to avoid so much code duplication but that seemed like an even deeper bite than I had already committed to

sam-heilbron · 2024-06-13T15:33:37Z

test/kubernetes/testutils/assertions/envoy.go

 )

+type EnvoyAdminAssertion func(ctx context.Context, adminClient *envoyadmin.Client, clientRefresh func(ctx context.Context, adminClient *envoyadmin.Client, expectedReplicas int) *envoyadmin.Client)


I like the idea of dependency injection, though do you think the clientRefresh will have multiple implementations? If anything, it feels like an implementation detail that only exists because of a subtle behavior. It feels like we'd be better suited in fact to "hide" this, ie not expose it to users, and require they use the single implementation. wdyt?

The biggest thing that needs to be injected on a per-case basis is the required number of replicas. Since the status reporting for pods that have had a termination event raised against them by the kubelet is garbage (no way to tell without querying events if a pod is "terminating"; it still shows as ready), the way I figured out how to select a truly ready pod was to wait for the number of replicas for the deployment to stabilize on a known value. I agree that it's pretty smelly though; maybe I could look at passing the expected replicas into the assertion.

Though as I am typing this up I remember that the main reason it was injected was so that the asstertion could call the refresh within its Eventually block. Keeping the client stabilization as close to the call that needs a stable client as possible.

It is truly awful that a pod in Terminating state does not have a status condition stating such

According to kubernetes/kubernetes#102537 this is expected behavior

sam-heilbron · 2024-06-13T15:36:25Z

test/kubernetes/testutils/cmd/remote.go

+	cmd.Stdout = io.MultiWriter(cmd.Stdout, &combinedOutput)
+	cmd.Stderr = io.MultiWriter(cmd.Stderr, &combinedOutput)
+
+	_, err := kube.ExecFromEphemeralPod(context.Background(), cmd.EphemeralPodParams)


I'm not sure I follow this entirely. What is the value in making a kube specific command, versus just invoking the kube specific method on the client?

The original intention of the LocalCmd, was to make it so you could have a shared implementation of cmd.Exec, and clients could use that under the hood. It feels like we've inverted that relationship and now calling code must know a command is being used, and isn't aware that a kube.ExecFromEphemeralPod is being called

I can't argue original intent since you wrote the code, but the shape of the code implies to me that cmdutils.Cmd was abstracting away the implementation of exec.Cmd as LocalCmd. The very existence of LocalCmd to me implies a RemoteCmd implementation was expected eventually.

I think the benefit of this abstraction is that devs can use context-specific Cmds in ways that have similar ergonomics. If you're running in-memory e2e or unit tests, a LocalCmd is what you need, while if you're running in Kube, a RemoteCmd is what you need. The latter requires a slight bit more setup on the Cmder but other than that, they're wielded in the same way with regard to setting up the input/output streams and calling Run

What is the value in making a kube specific command, versus just invoking the kube specific method on the client?

A RemoteCmd, just like a LocalCmd, will I expect be primarily used by clients. This code primarily enables the remote envoy admin client in test/kubernetes/testutils/cmd/envoyadmin/envoyadmin.go and the helper wrapper for a remote curl in test/kubernetes/testutils/cmd/curl.go, which to me is analogous to the LocalCmd being used by the admincli and curl helper clients

sam-heilbron · 2024-06-13T15:38:01Z

test/testutils/hack/k8s-utils/testutils/kube/curl.go

+	}
+)
+
+func CurlWithEphemeralPod(ctx context.Context, logger io.Writer, kubeContext, fromns, frompod string, args ...string) string {


Building from my comment about the introduction of the hack directory, I think if we want to introduce this into the codebase, we can/should just place this on the kube.Cli that we created and perhaps clarify it was copy/pasted from a go-utils@commit-sha

bewebi

Overall makes sense and looks good, just a few nits/questions

pkg/utils/kubeutils/kubectl/cli.go

bewebi · 2024-06-13T15:19:17Z

pkg/utils/kubeutils/portforward/api_forwarder.go

@@ -55,8 +55,19 @@ type apiPortForwarder struct {
 }

 func (f *apiPortForwarder) Start(ctx context.Context, options ...retry.Option) error {
-	return retry.Do(func() error {
+	if err := retry.Do(func() error {


Can we add comments in this function about what each step is?
I think I follow, but I'm not sure, and it will help for others

bewebi · 2024-06-13T15:19:45Z

pkg/utils/kubeutils/portforward/cli_portforwarder.go

@@ -38,8 +38,19 @@ type cliPortForwarder struct {
 }

 func (c *cliPortForwarder) Start(ctx context.Context, options ...retry.Option) error {
-	return retry.Do(func() error {
+	if err := retry.Do(func() error {


Can we also add comments here?

test/kubernetes/testutils/assertions/envoy.go

bewebi · 2024-06-13T15:51:04Z

test/testutils/hack/go-utils/testutils/kubectl/kubectl.go

+	go func() {
+		for {
+			select {
+			case <-time.After(time.Second):


What's the reason for using one second as the polling interval?
It seems high to me, but might make sense depending on the use case

bewebi · 2024-06-13T15:53:23Z

test/testutils/hack/go-utils/testutils/kubectl/kubectl.go

+	finished := func(output string) bool {
+		return strings.Contains(output, "Running") || strings.Contains(output, "ContainerCreating")
+	}
+	for _, label := range labels {
+		if err := WaitPodStatus(ctx, interval, namespace, label, "Running or ContainerCreating", finished, params); err != nil {
+			return err
+		}
+	}


Why do we first wait for Running or ContainerCreating and then wait for just Running?
Logically this seems equivalent to simply waiting for all to be Running?

I can't say for certain, and I'm not even sure if the reason this was originally done is still relevant, but this was a deliberate change made >5 years ago by Marco in https://github.com/solo-io/go-utils/pull/65/files#diff-194e17bf80495c27c08438d8dda9585f21023dfeb92a8910adb4294d58d01b39L103. I'd tag him to ask why, but it was over 5 years ago, so it's unlikely he remembers I think. If you think it's worth pursuing though I can

bewebi · 2024-06-13T16:00:31Z

test/testutils/hack/k8s-utils/testutils/kube/curl.go

+	})
+}
+
+func Curl(ctx context.Context, logger io.Writer, kubeContext, ns, fromDeployment, fromContainer, url string) string {


Do we want to add the ability to pass additional curl args?

Everything in test/testutils/hack was copied† from other shared repos. The intent is for this code to migrate back to the shared repos eventually since this code is actually used by some other projects in the org. That said, unless we wanted to move the entire curl util we have written over to go-utils or k8s-utils, then I don't think we can support that here.

†: There were modifications made to the code, the roots of which can be seen in these two PRs: solo-io/go-utils#527 solo-io/k8s-utils#51

Co-authored-by: Bernie Birnbaum <[email protected]>

…io/gloo into jbohanon/port-forward-safety

jbohanon · 2024-06-14T19:52:48Z

This PR is being placed on hold in favor of a less intrusive fix to the linked issue. The fundamental issue with the test that was flaking is that it is possible to select a pod in Terminating state for port-forward. Adding retries to the port-forward and making sure that we aren't selecting a pod in Terminating status should be sufficient to solve the stated problem.

add wait to port forwarding util

b85caf0

github-actions bot added the keep pr updated signals bulldozer to keep pr up to date with base branch label Jun 4, 2024

update api impl to match interface

e3e7cde

add tests demonstrating fix

ed2d3ef

jbohanon requested review from sam-heilbron and jenshu June 4, 2024 16:55

jenshu reviewed Jun 4, 2024

View reviewed changes

pkg/utils/kubeutils/portforward/api_forwarder.go Outdated Show resolved Hide resolved

sam-heilbron reviewed Jun 4, 2024

View reviewed changes

reduce to one func

38cd737

sam-heilbron previously approved these changes Jun 4, 2024

View reviewed changes

Merge branch 'main' into jbohanon/port-forward-safety

720cfd6

jenshu previously approved these changes Jun 4, 2024

View reviewed changes

jbohanon added 2 commits June 4, 2024 14:14

move test to kube2e/gloo

621f747

Merge branch 'jbohanon/port-forward-safety' of ssh://github.com/solo-…

0dbb5d4

…io/gloo into jbohanon/port-forward-safety

jbohanon dismissed stale reviews from jenshu and sam-heilbron via 0dbb5d4 June 4, 2024 18:14

jbohanon added 2 commits June 4, 2024 14:39

fix dump

6743cf5

remove dial guard

08caec5

sam-heilbron previously approved these changes Jun 4, 2024

View reviewed changes

soloio-bulldozer bot and others added 3 commits June 4, 2024 19:01

Merge refs/heads/main into jbohanon/port-forward-safety

618a9ce

log CI-only failure

0515f1c

Merge branch 'jbohanon/port-forward-safety' of ssh://github.com/solo-…

1ced3d0

…io/gloo into jbohanon/port-forward-safety

jbohanon dismissed sam-heilbron’s stale review via 1ced3d0 June 4, 2024 19:04

jbohanon added the work in progress signals bulldozer to keep pr open (don't auto-merge) label Jun 4, 2024

jbohanon added 2 commits June 4, 2024 15:15

mega-test-focus

6c4d1ba

logging like a lumberjack

4029201

changelog-bot added 2 commits June 12, 2024 19:22

Adding changelog file to new location

972b44c

Deleting changelog file from old location

c6d2d0d

jbohanon commented Jun 13, 2024

View reviewed changes

soloio-bulldozer bot and others added 2 commits June 13, 2024 13:53

Merge refs/heads/main into jbohanon/port-forward-safety

7dd0929

add namespace to a few places

c4fade8

sam-heilbron reviewed Jun 13, 2024

View reviewed changes

bewebi reviewed Jun 13, 2024

View reviewed changes

soloio-bulldozer bot and others added 10 commits June 13, 2024 16:14

Merge refs/heads/main into jbohanon/port-forward-safety

420f7a2

Merge refs/heads/main into jbohanon/port-forward-safety

dbb4020

Merge refs/heads/main into jbohanon/port-forward-safety

53dea6c

Adding changelog file to new location

8d3240c

Deleting changelog file from old location

0a93bef

Merge refs/heads/main into jbohanon/port-forward-safety

b19cbc0

Merge refs/heads/main into jbohanon/port-forward-safety

9db883a

Apply suggestions from code review

f5473eb

Co-authored-by: Bernie Birnbaum <[email protected]>

move changelog

03cd379

cut remote utils

5835610

jbohanon added the work in progress signals bulldozer to keep pr open (don't auto-merge) label Jun 14, 2024

soloio-bulldozer bot and others added 4 commits June 14, 2024 16:45

Merge refs/heads/main into jbohanon/port-forward-safety

a392448

update readme with http request strategy

2755483

bump k8s-utils

b51176e

Merge branch 'jbohanon/port-forward-safety' of ssh://github.com/solo-…

a6aa3d2

…io/gloo into jbohanon/port-forward-safety

jbohanon marked this pull request as draft June 14, 2024 17:55

soloio-bulldozer bot added 2 commits June 14, 2024 20:18

Merge refs/heads/main into jbohanon/port-forward-safety

bccd86a

Merge refs/heads/main into jbohanon/port-forward-safety

d1a6ffe

jbohanon closed this Jun 17, 2024

github-actions bot mentioned this pull request Nov 14, 2024

[Migrated] Remove port-forwarding from Kubernetes E2E tests solo-io/gloo#9575

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove port-forwarding from Kubernetes E2E tests #9575

Remove port-forwarding from Kubernetes E2E tests #9575

jbohanon commented Jun 4, 2024 •

edited

Loading

solo-changelog-bot bot commented Jun 4, 2024

jbohanon commented Jun 4, 2024

github-actions bot commented Jun 4, 2024 •

edited

Loading

jbohanon commented Jun 4, 2024 •

edited

Loading

jbohanon Jun 13, 2024

jbohanon Jun 13, 2024

sam-heilbron left a comment

sam-heilbron Jun 13, 2024

jbohanon Jun 14, 2024

sam-heilbron Jun 13, 2024

jbohanon Jun 14, 2024

sam-heilbron Jun 13, 2024

jbohanon Jun 13, 2024

jbohanon Jun 14, 2024

sam-heilbron Jun 13, 2024

jbohanon Jun 14, 2024

jbohanon Jun 14, 2024

sam-heilbron Jun 13, 2024

bewebi left a comment

bewebi Jun 13, 2024

bewebi Jun 13, 2024

bewebi Jun 13, 2024

bewebi Jun 13, 2024

jbohanon Jun 14, 2024

bewebi Jun 13, 2024

jbohanon Jun 14, 2024

jbohanon commented Jun 14, 2024

		)

		type EnvoyAdminAssertion func(ctx context.Context, adminClient envoyadmin.Client, clientRefresh func(ctx context.Context, adminClient envoyadmin.Client, expectedReplicas int) *envoyadmin.Client)

Remove port-forwarding from Kubernetes E2E tests #9575

Remove port-forwarding from Kubernetes E2E tests #9575

Conversation

jbohanon commented Jun 4, 2024 • edited Loading

Description

Checklist:

solo-changelog-bot bot commented Jun 4, 2024

jbohanon commented Jun 4, 2024

github-actions bot commented Jun 4, 2024 • edited Loading

jbohanon commented Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sam-heilbron left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bewebi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbohanon commented Jun 14, 2024

jbohanon commented Jun 4, 2024 •

edited

Loading

github-actions bot commented Jun 4, 2024 •

edited

Loading

jbohanon commented Jun 4, 2024 •

edited

Loading