Skip to content

Commit 46fc0d3

Browse files
tallaxesmatthchr
andauthored
feat: v1 migration and adaptation (#528)
* chore: update devcontainer go version * chore: refresh toolcain * chore: additional processing on verify (and migration to kube-system) * chore: bump dependencies * chore: refresh Helm charts * chore: update golangci config * chore: remove feature gate for drift * chore: update pre-commit tooling * chore: update the shape of main * chore: update the alt operator * chore: update the API (move kubelet config to AKSNodeClass) * chore: migrate cloud provider to v1 API * chore: migrate operator to v1 API * chore: migrate controllers to v1 API * chore: add nodeclass status controller * chore: migrate providers to v1 API * chore: migrate test pkg to v1 API * chore: update utils * chore: update and migrate E2E tests to v1 API * feat: refresh and relink CRDs * fix: move code generation into subfolders to fix golangci-lint (typecheck detecting multiple main.go) * fix: enable most of govet in golangci * fix(linting): exclude alt operator logger * fix: add nodeclass termination controller * fix(lint): restore linting on verify * feat: add nodeclass hash controller * fix: register additional nodeclass and status controllers * fix(e2e): better selection of karpenter pod for logs * fix(e2e): fix utilization suite * chore(e2e): add events to dump-logs (and simplify) * chore: rename v1 to corev1 * fix: remove extra $ * fix(e2e): add cilium label and taint * fix(e2e): fix labels and disruption for deamonset test * feat: update kubelet configuration * fix: conflicting nodeclaim.garbagecollcation controller name * chore: restore webhooks in alt operator * Clean up commented out webhook code * fix(test): fix test for credential provider URL in custom data * Make webhooks work in AKS CCP context (#537) This requires quite a bit of hacking, mostly overriding certain things in the ctx. The major items are: * Copy and modify knative/pkg/webhook/resourcesemantics/conversion to support CRD clientConfig.url in addition to clientConfig.service. * Copy and modify karpenter/pkg/webhooks/webhooks.go to support overriding the informer factory, so that we can point it at the CCP APIServer rather than overlay. * Override Start and supporting methods on the provider specific operator in pkg/operator/operator.go to allow invoking our modified version of karpenter/pkg/webhooks/webhooks.go. * chore: remove failSwapOn from kubelet settings in AKSNodeClass * fix: populate nodeClaim.Status.ImageID * fix: record NodeClass hash and add drift on static fields * chore: rename variabled * fix: remove outdated comment * fix: typo * chore: update CRDs --------- Co-authored-by: Matthew Christopher <[email protected]>
1 parent 2054da0 commit 46fc0d3

File tree

121 files changed

+8735
-1832
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

121 files changed

+8735
-1832
lines changed

.devcontainer/devcontainer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"build": {
66
"dockerfile": "Dockerfile",
77
"args": {
8-
"VARIANT": "1.22-bullseye"
8+
"VARIANT": "1.23-bullseye"
99
}
1010
},
1111
"runArgs": [ "--cap-add=SYS_PTRACE", "--security-opt", "seccomp=unconfined" ],

.github/actions/e2e/dump-logs/action.yaml

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -31,26 +31,23 @@ runs:
3131
client-id: ${{ inputs.client-id }}
3232
tenant-id: ${{ inputs.tenant-id }}
3333
subscription-id: ${{ inputs.subscription-id }}
34-
- name: az set sub
34+
- name: update cluster context
3535
shell: bash
36-
run: az account set --subscription ${{ inputs.subscription-id }}
36+
run: |
37+
az aks get-credentials --name ${{ inputs.cluster_name }} --resource-group ${{ inputs.resource_group }}
3738
- name: controller-logs
3839
shell: bash
3940
run: |
40-
echo "step: controller-logs"
41-
AZURE_CLUSTER_NAME=${{ inputs.cluster_name }} AZURE_RESOURCE_GROUP=${{ inputs.resource_group }} make az-creds
42-
POD_NAME=$(kubectl get pods -n karpenter --no-headers -o custom-columns=":metadata.name" | tail -n 1)
43-
echo "logs from pod ${POD_NAME}"
44-
kubectl logs "${POD_NAME}" -n karpenter -c controller
41+
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --all-containers --ignore-errors
4542
- name: describe-karpenter-pods
4643
shell: bash
4744
run: |
48-
echo "step: describe-karpenter-pods"
49-
AZURE_CLUSTER_NAME=${{ inputs.cluster_name }} AZURE_RESOURCE_GROUP=${{ inputs.resource_group }} make az-creds
50-
kubectl describe pods -n karpenter
45+
kubectl describe pods -n kube-system -l app.kubernetes.io/name=karpenter
5146
- name: describe-nodes
5247
shell: bash
5348
run: |
54-
echo "step: describe-nodes"
55-
AZURE_CLUSTER_NAME=${{ inputs.cluster_name }} AZURE_RESOURCE_GROUP=${{ inputs.resource_group }} make az-creds
56-
kubectl describe nodes
49+
kubectl describe nodes
50+
- name: get-karpenter-events
51+
shell: bash
52+
run: |
53+
kubectl get events -A --field-selector source=karpenter

.golangci.yaml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ linters:
1010
- bidichk
1111
- errorlint
1212
- errcheck
13-
- exportloopref
13+
- copyloopvar
1414
- gosec
1515
- revive
1616
- stylecheck
@@ -33,8 +33,9 @@ linters-settings:
3333
gocyclo:
3434
min-complexity: 11
3535
govet:
36-
enable:
37-
- shadow
36+
enable-all: true
37+
disable:
38+
- fieldalignment
3839
revive:
3940
rules:
4041
- name: dot-imports
@@ -79,3 +80,7 @@ issues:
7980
- hack
8081
- charts
8182
- designs
83+
- pkg/alt/knative # copy
84+
- pkg/alt/karpenter-core/pkg/webhooks # copy
85+
exclude-files:
86+
- pkg/alt/karpenter-core/pkg/operator/logger.go # copy

.pre-commit-config.yaml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,23 @@
11
repos:
22
- repo: https://github.com/gitleaks/gitleaks
3-
rev: v8.18.1
3+
rev: v8.20.1
44
hooks:
55
- id: gitleaks
66
- repo: https://github.com/golangci/golangci-lint
7-
rev: v1.55.2
7+
rev: v1.61.0
88
hooks:
99
- id: golangci-lint
1010
- repo: https://github.com/jumanjihouse/pre-commit-hooks
1111
rev: 3.0.0
1212
hooks:
1313
- id: shellcheck
1414
- repo: https://github.com/crate-ci/typos
15-
rev: v1.17.2
15+
rev: v1.26.0
1616
hooks:
1717
- id: typos
18+
args: [--write-changes, --force-exclude, --exclude, go.mod]
1819
- repo: https://github.com/pre-commit/pre-commit-hooks
19-
rev: v4.5.0
20+
rev: v5.0.0
2021
hooks:
2122
- id: end-of-file-fixer
2223
- id: trailing-whitespace

Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ GOFLAGS ?= $(LDFLAGS)
77
WITH_GOFLAGS = GOFLAGS="$(GOFLAGS)"
88

99
# # CR for local builds of Karpenter
10-
KARPENTER_NAMESPACE ?= karpenter
10+
KARPENTER_NAMESPACE ?= kube-system
1111

1212
# Common Directories
1313
# TODO: revisit testing tools (temporarily excluded here, for make verify)
@@ -80,9 +80,12 @@ verify: toolchain tidy download ## Verify code. Includes dependencies, linting,
8080
cp $(KARPENTER_CORE_DIR)/pkg/apis/crds/* pkg/apis/crds
8181
yq -i '(.spec.versions[0].additionalPrinterColumns[] | select (.name=="Zone")) .jsonPath=".metadata.labels.karpenter\.azure\.com/zone"' \
8282
pkg/apis/crds/karpenter.sh_nodeclaims.yaml
83+
hack/validation/kubelet.sh
8384
hack/validation/labels.sh
8485
hack/validation/requirements.sh
8586
hack/validation/common.sh
87+
cp pkg/apis/crds/* charts/karpenter-crd/templates
88+
hack/mutation/conversion_webhooks_injection.sh
8689
hack/github/dependabot.sh
8790
$(foreach dir,$(MOD_DIRS),cd $(dir) && golangci-lint run $(newline))
8891
@git diff --quiet ||\

charts/karpenter-crd/templates/karpenter.azure.com_aksnodeclasses.yaml

Lines changed: 0 additions & 1 deletion
This file was deleted.
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
---
2+
apiVersion: apiextensions.k8s.io/v1
3+
kind: CustomResourceDefinition
4+
metadata:
5+
annotations:
6+
controller-gen.kubebuilder.io/version: v0.16.4
7+
name: aksnodeclasses.karpenter.azure.com
8+
spec:
9+
group: karpenter.azure.com
10+
names:
11+
categories:
12+
- karpenter
13+
kind: AKSNodeClass
14+
listKind: AKSNodeClassList
15+
plural: aksnodeclasses
16+
shortNames:
17+
- aksnc
18+
- aksncs
19+
singular: aksnodeclass
20+
scope: Cluster
21+
versions:
22+
- name: v1alpha2
23+
schema:
24+
openAPIV3Schema:
25+
description: AKSNodeClass is the Schema for the AKSNodeClass API
26+
properties:
27+
apiVersion:
28+
description: |-
29+
APIVersion defines the versioned schema of this representation of an object.
30+
Servers should convert recognized schemas to the latest internal value, and
31+
may reject unrecognized values.
32+
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
33+
type: string
34+
kind:
35+
description: |-
36+
Kind is a string value representing the REST resource this object represents.
37+
Servers may infer this from the endpoint the client submits requests to.
38+
Cannot be updated.
39+
In CamelCase.
40+
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
41+
type: string
42+
metadata:
43+
type: object
44+
spec:
45+
description: |-
46+
AKSNodeClassSpec is the top level specification for the AKS Karpenter Provider.
47+
This will contain configuration necessary to launch instances in AKS.
48+
properties:
49+
imageFamily:
50+
default: Ubuntu2204
51+
description: ImageFamily is the image family that instances use.
52+
enum:
53+
- Ubuntu2204
54+
- AzureLinux
55+
type: string
56+
kubelet:
57+
description: |-
58+
Kubelet defines args to be used when configuring kubelet on provisioned nodes.
59+
They are a subset of the upstream types, recognizing not all options may be supported.
60+
Wherever possible, the types and names should reflect the upstream kubelet types.
61+
properties:
62+
allowedUnsafeSysctls:
63+
description: |-
64+
A comma separated whitelist of unsafe sysctls or sysctl patterns (ending in `*`).
65+
Unsafe sysctl groups are `kernel.shm*`, `kernel.msg*`, `kernel.sem`, `fs.mqueue.*`,
66+
and `net.*`. For example: "`kernel.msg*,net.ipv4.route.min_pmtu`"
67+
Default: []
68+
items:
69+
type: string
70+
type: array
71+
containerLogMaxFiles:
72+
default: 5
73+
description: |-
74+
containerLogMaxFiles specifies the maximum number of container log files that can be present for a container.
75+
Default: 5
76+
format: int32
77+
minimum: 2
78+
type: integer
79+
containerLogMaxSize:
80+
default: 50Mi
81+
description: |-
82+
containerLogMaxSize is a quantity defining the maximum size of the container log
83+
file before it is rotated. For example: "5Mi" or "256Ki".
84+
Default: "10Mi"
85+
AKS CustomKubeletConfig has containerLogMaxSizeMB (with units), defaults to 50
86+
pattern: ^\d+(E|P|T|G|M|K|Ei|Pi|Ti|Gi|Mi|Ki)$
87+
type: string
88+
cpuCFSQuota:
89+
default: true
90+
description: |-
91+
CPUCFSQuota enables CPU CFS quota enforcement for containers that specify CPU limits.
92+
Note: AKS CustomKubeletConfig uses cpuCfsQuota (camelCase)
93+
type: boolean
94+
cpuCFSQuotaPeriod:
95+
default: 100ms
96+
description: |-
97+
cpuCfsQuotaPeriod sets the CPU CFS quota period value, `cpu.cfs_period_us`.
98+
The value must be between 1 ms and 1 second, inclusive.
99+
Default: "100ms"
100+
type: string
101+
cpuManagerPolicy:
102+
default: none
103+
description: cpuManagerPolicy is the name of the policy to use.
104+
enum:
105+
- none
106+
- static
107+
type: string
108+
imageGCHighThresholdPercent:
109+
description: |-
110+
ImageGCHighThresholdPercent is the percent of disk usage after which image
111+
garbage collection is always run. The percent is calculated by dividing this
112+
field value by 100, so this field must be between 0 and 100, inclusive.
113+
When specified, the value must be greater than ImageGCLowThresholdPercent.
114+
Note: AKS CustomKubeletConfig does not have "Percent" in the field name
115+
format: int32
116+
maximum: 100
117+
minimum: 0
118+
type: integer
119+
imageGCLowThresholdPercent:
120+
description: |-
121+
ImageGCLowThresholdPercent is the percent of disk usage before which image
122+
garbage collection is never run. Lowest disk usage to garbage collect to.
123+
The percent is calculated by dividing this field value by 100,
124+
so the field value must be between 0 and 100, inclusive.
125+
When specified, the value must be less than imageGCHighThresholdPercent
126+
Note: AKS CustomKubeletConfig does not have "Percent" in the field name
127+
format: int32
128+
maximum: 100
129+
minimum: 0
130+
type: integer
131+
podPidsLimit:
132+
description: |-
133+
podPidsLimit is the maximum number of PIDs in any pod.
134+
AKS CustomKubeletConfig uses PodMaxPids, int32 (!)
135+
Default: -1
136+
format: int64
137+
type: integer
138+
topologyManagerPolicy:
139+
default: none
140+
description: |-
141+
topologyManagerPolicy is the name of the topology manager policy to use.
142+
Valid values include:
143+
144+
- `restricted`: kubelet only allows pods with optimal NUMA node alignment for requested resources;
145+
- `best-effort`: kubelet will favor pods with NUMA alignment of CPU and device resources;
146+
- `none`: kubelet has no knowledge of NUMA alignment of a pod's CPU and device resources.
147+
- `single-numa-node`: kubelet only allows pods with a single NUMA alignment
148+
of CPU and device resources.
149+
enum:
150+
- restricted
151+
- best-effort
152+
- none
153+
- single-numa-node
154+
type: string
155+
type: object
156+
x-kubernetes-validations:
157+
- message: imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent
158+
rule: 'has(self.imageGCHighThresholdPercent) && has(self.imageGCLowThresholdPercent)
159+
? self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent :
160+
true'
161+
maxPods:
162+
description: MaxPods is an override for the maximum number of pods
163+
that can run on a worker node instance.
164+
format: int32
165+
minimum: 0
166+
type: integer
167+
osDiskSizeGB:
168+
default: 128
169+
description: osDiskSizeGB is the size of the OS disk in GB.
170+
format: int32
171+
minimum: 100
172+
type: integer
173+
tags:
174+
additionalProperties:
175+
type: string
176+
description: Tags to be applied on Azure resources like instances.
177+
type: object
178+
vnetSubnetID:
179+
description: |-
180+
VNETSubnetID is the subnet used by nics provisioned with this nodeclass.
181+
If not specified, we will use the default --vnet-subnet-id specified in karpenter's options config
182+
pattern: (?i)^\/subscriptions\/[^\/]+\/resourceGroups\/[a-zA-Z0-9_\-().]{0,89}[a-zA-Z0-9_\-()]\/providers\/Microsoft\.Network\/virtualNetworks\/[^\/]+\/subnets\/[^\/]+$
183+
type: string
184+
type: object
185+
status:
186+
description: AKSNodeClassStatus contains the resolved state of the AKSNodeClass
187+
properties:
188+
conditions:
189+
description: Conditions contains signals for health and readiness
190+
items:
191+
description: Condition aliases the upstream type and adds additional
192+
helper methods
193+
properties:
194+
lastTransitionTime:
195+
description: |-
196+
lastTransitionTime is the last time the condition transitioned from one status to another.
197+
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
198+
format: date-time
199+
type: string
200+
message:
201+
description: |-
202+
message is a human readable message indicating details about the transition.
203+
This may be an empty string.
204+
maxLength: 32768
205+
type: string
206+
observedGeneration:
207+
description: |-
208+
observedGeneration represents the .metadata.generation that the condition was set based upon.
209+
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
210+
with respect to the current state of the instance.
211+
format: int64
212+
minimum: 0
213+
type: integer
214+
reason:
215+
description: |-
216+
reason contains a programmatic identifier indicating the reason for the condition's last transition.
217+
Producers of specific condition types may define expected values and meanings for this field,
218+
and whether the values are considered a guaranteed API.
219+
The value should be a CamelCase string.
220+
This field may not be empty.
221+
maxLength: 1024
222+
minLength: 1
223+
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
224+
type: string
225+
status:
226+
description: status of the condition, one of True, False, Unknown.
227+
enum:
228+
- "True"
229+
- "False"
230+
- Unknown
231+
type: string
232+
type:
233+
description: type of condition in CamelCase or in foo.example.com/CamelCase.
234+
maxLength: 316
235+
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
236+
type: string
237+
required:
238+
- lastTransitionTime
239+
- message
240+
- reason
241+
- status
242+
- type
243+
type: object
244+
type: array
245+
type: object
246+
type: object
247+
served: true
248+
storage: true
249+
subresources:
250+
status: {}

charts/karpenter-crd/templates/karpenter.sh_nodeclaims.yaml

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)