-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: Bump k8s.io and controller-runtime dependencies #10069
*: Bump k8s.io and controller-runtime dependencies #10069
Conversation
Signed-off-by: timflannagan <[email protected]> Co-authored-by: Sam Heilbron <[email protected]> Co-authored-by: Tyler Schade <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Issues linked to changelog: |
Signed-off-by: timflannagan <[email protected]>
Note: these clients were manually generated using a solo-kit that points to my local fork that implements kgateway-dev#564. The gateway & gloo clients were updated to adopt recently support for generics throughout the 1.31 client-go release. Namely, listers and clients adopt this new approach. The nested extauth and graphql APIs have updated hack/update-codegen.sh bash scripts checked in with this commit, but I think we need to update the solo-kit.json configuration for those directories since we weren't previously committing their k8s clients. Similarly, the "gloosnapshot" API doesn't need k8s clients generated too. Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
node_version='v1.29.2@sha256:51a1434a5397193442f0be2a297b488b6c919ce8a3931be0ce822606ea5ca245' | ||
kubectl_version='v1.29.2' | ||
kind_version='v0.20.0' | ||
node_version='v1.31.0@sha256:53df588e04085fd41ae12de0c3fe4c72f7013bba32a20e7325357a1ac94ba865' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick note: go.mod specifies the 1.31.1 patch version, but I didn't see a 1.31.1 sha image in the kind releases, so I left this here. I don't think it matters too much w.rt. patch version skew between k8s server and client versions.
@@ -1,6 +1,6 @@ | |||
node_version='v1.25.16@sha256:5da57dfc290ac3599e775e63b8b6c49c0c85d3fec771cd7d55b45fae14b38d3b' | |||
kubectl_version='v1.25.16' | |||
node_version='v1.27.3@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're jumping from 1.29 to 1.31 in Gloo, so updating the min supported k8s version to 1.27 to maintain the N-3 matrix.
@@ -0,0 +1,12 @@ | |||
changelog: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Fix this changelog.
@@ -35,3 +35,7 @@ func (s *switchAdapter) On(name string) { | |||
func (s *switchAdapter) Off(name string) { | |||
s.gauge.WithLabelValues(name).Set(0.0) | |||
} | |||
|
|||
func (s *switchAdapter) SlowpathExercised(name string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed for controller-runtime 0.18.x due to client-go leaderelection changes.
@@ -10,10 +10,9 @@ ROOT_PKG=github.com/solo-io/gloo/projects/gateway/pkg/api/v1 | |||
CLIENT_PKG=${ROOT_PKG}/kube/client | |||
APIS_PKG=${ROOT_PKG}/kube/apis | |||
|
|||
# Below code is copied from https://github.com/weaveworks/flagger/blob/master/hack/update-codegen.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is generated by solo-kit. We updated the Go template file in solo-io/solo-kit#560. Note, this file is technically bugged and I have an open PR for fixing this in solo-io/solo-kit#564.
@@ -65,7 +66,7 @@ var _ = Describe("RetryOnUnavailableClientConstructor", func() { | |||
// sanity check | |||
resp, err := client.Validate(rootCtx, &validation.GlooValidationServiceRequest{}) | |||
Expect(err).NotTo(HaveOccurred()) | |||
Expect(resp).To(Equal(res)) | |||
Expect(resp).To(matchers.MatchProto(res)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed due to the protobuf jump that was clumped with these dependency bumps.
Controller: config.Controller{ | ||
// see https://github.com/kubernetes-sigs/controller-runtime/issues/2937 | ||
// in short, our tests reuse the same name (reasonably so) and the controller-runtime | ||
// package does not reset the stack of controller names between tests, so we disable | ||
// the name validation here. | ||
SkipNameValidation: ptr.To(true), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment calls out why this is needed, but this due to a recent c-r change that enforces stricter validation for controller names.
"gen_kube_types": true, | ||
"gen_kube_types": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to solo-io#10079. For context, solo-kit was generating hack/*-codegen.sh bash scripts for these nested directories that were relevant, so toggling this off / removing this option helps us manage maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick note on these new uniquehash.go files. This is needed as we had a bug in GME's AccessPolicy caching that required us to introduce a new primitive in this library. See https://github.com/solo-io/gloo-mesh-enterprise/pull/17392 for more information. We aren't using this new method, but still wanted to provide context on these generated files as we're bumping skv2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff in these client-gen generated files is a bit confusing. Basically, client-go had a series of improvements in 1.31 to help adopt generics and cut down on the amount of generated code for consumers of this library. The gentype package defines the common Get/List/etc. interfaces now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above comment about client-go generated code, client-go refactored the listers implementation to adopt a generics-based approach. See kubernetes/kubernetes#121574 for more information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to highlight this change in the sea of generated code changes. The primary change here is the removal of typecasting to corev1 listers which was causing the regression suite to fail. IMO, doing this is a violation of our own lister abstraction (that manages corev1 listers under-the-hood) and any net new issues with performance regressions could be handled as a follow-up in solo-kit.
EnableGatewayController: &wrappers.BoolValue{ | ||
Value: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to confirm with Tyler or Sam why this change is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, let's chat about this, I have some context and questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We followed up in Slack. We discussed how this is due to some unknown proto changes that affect boolean values to be overridden in our tests. This impacts just this test, because we require the EnableGatewayController
(edge gw) to be true in Settings, but since we define some other values in the same struct, the default true value is not being respected, and instead the overriding empty value is being used so it is false.
Our plan is two-fold:
- Keep this temporary solution to merge the large code. This way this PR doesn't go out of date
- Immediately after, investigate what proto changes could lead to this and provide an explanation and fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this as a TODO on the parent issue as well.
Adding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a small nit and a quick q. Thanks for working on this Tim!
@@ -67,12 +67,12 @@ jobs: | |||
|
|||
# September 16, 2024: 21 minutes | |||
- cluster-name: 'cluster-three' | |||
go-test-args: '-v -timeout=25m' | |||
go-test-args: '-v -timeout=30m' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious why the timeout bump is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sam-heilbron Any idea on why this was necessary?
Created this branch |
Signed-off-by: timflannagan <[email protected]>
Signed-off-by: timflannagan <[email protected]>
Just saw this now. I opened solo-io#10069 earlier today to work through all the potential issues. |
Description
This PR bumps several critical Go direct dependencies including k8s.io, go-control-plane, and controller-runtime.
Code changes
Reverts solo-io#7920 in the kube-based upstream plugin. Previously, we were typecasting our own lister abstraction that sits on top of client-go into concrete corev1 listers. That approach was breaking our regression suite -- likely because upstream refactored lister generation to be generic in 1.31.x -- and required us to revert that PR in order to get the regression suite back online. From my perspective, this type of typecasting violates our own abstraction layer and any performance-related issues should've been tackled closer to the source-of-truth (e.g. solo-kit).
Additionally, updates to pkg/bootstrap/leaderelector/kube/metrics.go were required after bumping controller-runtime to 0.18.x / 0.19.x to get builds back online.
CI changes
To(Equal(...)
type operations. This is needed due to bumping the protobuf dependency as the default, underlying sizeCache value was changed. Further investigation into why this is needed will be tackled as a follow-upContext
This is blocking several streams of downstream work including RFE and GW API integration initiatives.
Notes for reviewers
Checklist:
BOT NOTES:
Related to solo-io#9683