Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable auto-scaling #182

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
090a782
Enable Prometheus + Grafana using terraform
gregoriomartin Jun 13, 2023
878769c
Remove kube_dashboard block, update role_based_access_control block a…
gregoriomartin Jun 14, 2023
400000b
corrections
gregoriomartin Jun 15, 2023
68390aa
change backend state key
gregoriomartin Jun 15, 2023
3ec816a
Set correct AMW id
gregoriomartin Jun 15, 2023
e8f7733
Added 1 suffix
gregoriomartin Jun 19, 2023
0417bee
Add Prometheus Instrumentator to Stac and Tiler APIs
gregoriomartin Jun 19, 2023
120d7d9
Added Prometheus metrics to template
gregoriomartin Jun 19, 2023
8b531b5
helm prometheus
m-cappi Jun 19, 2023
3c5b31d
escape variables
m-cappi Jun 19, 2023
50d7c07
preinstall crd
m-cappi Jun 20, 2023
1081ed3
Merge branch 'k8s-auto-scaling' into prometheus-helm
m-cappi Jun 20, 2023
0b64421
converge prometheus version
m-cappi Jun 20, 2023
a6bb84c
fix namespace management
m-cappi Jun 20, 2023
512cc51
Merge pull request #1 from gregoriomartin/prometheus-helm
m-cappi Jun 20, 2023
b8a87f3
service monitor
m-cappi Jun 20, 2023
7957495
rotate prometheus version
m-cappi Jun 20, 2023
a8c7590
Set Auto Scaling to terraform
gregoriomartin Jun 20, 2023
35c5e35
checkpoint
m-cappi Jun 20, 2023
231c570
update pcstac scaling
m-cappi Jun 21, 2023
c663dfd
working: metrics scraped
m-cappi Jun 21, 2023
2c968a9
enable hpa
m-cappi Jun 21, 2023
48c282c
fix semantic name
m-cappi Jun 22, 2023
3d58eab
Move Instrumentator().instrument(app).expose(app) to the Startup event
gregoriomartin Jun 22, 2023
fb19872
Remove Managed Grafana and Prometheus
gregoriomartin Jun 22, 2023
408f04c
Enable Auto Scaling via terraform
gregoriomartin Jun 22, 2023
1588452
adjust http query
m-cappi Jun 22, 2023
e4fd792
enable aks virtual nodes
m-cappi Jun 22, 2023
e0400ad
fix network
m-cappi Jun 22, 2023
794cb6c
terraform scaling adaptation
m-cappi Jun 22, 2023
1e29192
Merge pull request #2 from gregoriomartin/v-nodes
m-cappi Jun 22, 2023
458c070
move prometheus adapter
m-cappi Jun 23, 2023
545b228
STAC virtual nodes affinities and tolerations
m-cappi Jun 23, 2023
2d84c1b
disable legacy prometheus
m-cappi Jun 23, 2023
bfa66e6
Update prometheusAdapter-customMetrics.yaml
gregoriomartin Jun 26, 2023
dea331c
overprovisioning
m-cappi Jun 27, 2023
8bcd414
virtual nodes manifest
m-cappi Jun 27, 2023
953a365
adjust overprovisioning
m-cappi Jun 28, 2023
ad5ad97
overprovisioning
m-cappi Jun 29, 2023
4d50a8a
Merge branch 'k8s-auto-scaling' into fix/prometheus
m-cappi Jun 29, 2023
6843d1d
Merge pull request #3 from gregoriomartin/fix/prometheus
m-cappi Jun 29, 2023
69edc75
prometheus with kubectl
m-cappi Jun 29, 2023
e0b2afd
blocked metric endpoint
andres64372 Jun 29, 2023
d8dc240
remove virtual nodes
m-cappi Jun 29, 2023
85ed66e
Merge branch 'k8s-auto-scaling' of https://github.com/gregoriomartin/…
andresouthworks Jun 29, 2023
4d2523a
overprovisioning with helm
m-cappi Jun 29, 2023
d3402c7
parametrize hpa
m-cappi Jun 29, 2023
8ce2ed6
tiler hpa
m-cappi Jun 29, 2023
070e622
revert aks network plugin
m-cappi Jun 30, 2023
30c12e6
overprovisioning config map
m-cappi Jun 30, 2023
4b787ce
Merge branch 'main' into k8s-auto-scaling
gregoriomartin Jun 30, 2023
455ca44
Remove TODOs
gregoriomartin Jun 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion deployment/bin/deploy
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -136,14 +136,22 @@ if [ "${BASH_SOURCE[0]}" = "${0}" ]; then

setup_helm

# Install namespaces

echo "Installing namespaces..."

helm upgrade --install \
pc-namespaces helm/pc-namespaces \
-n pc-namespaces \
--create-namespace

# Install cert-manager

echo "Installing cert-manager..."

helm upgrade --install \
cert-manager \
--namespace pc \
--create-namespace \
--version v1.6.0 \
--set installCRDs=true jetstack/cert-manager

Expand Down Expand Up @@ -193,6 +201,24 @@ if [ "${BASH_SOURCE[0]}" = "${0}" ]; then
--wait \
--timeout 2m0s

echo "====================="
echo "==== Prometheus ====="
echo "====================="

echo "Deploying prometheus crd..."
kubectl apply -f helm/prometheus-crd --server-side

echo "Deploying prometheus component..."
kubectl apply -f helm/pc-apis-prometheus

echo "==========================="
echo "==== Overprovisioning ====="
echo "==========================="

helm upgrade --install overprovisioning helm/overprovisioning \
--kube-context "${KUBE_CONTEXT}" \
--wait \

#########################
# Deploy Azure Function #
#########################
Expand Down
2 changes: 2 additions & 0 deletions deployment/helm/deploy-values.template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,12 @@ pcingress:
path: "/stac(/|$)(.*)"
name: "planetary-computer-stac"
port: "80"
blockMetrics: true
tiler:
path: "/data(/|$)(.*)"
name: "planetary-computer-tiler"
port: "80"
blockMetrics: true

cert:
secretName: "pqe-tls-secret"
Expand Down
6 changes: 6 additions & 0 deletions deployment/helm/overprovisioning/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v2
name: planetary-computer-stac
description: A Helm chart for the Overprovisioning virtual-nodes auxiliary
type: application
version: 0.1.1
appVersion: 0.1.0
118 changes: 118 additions & 0 deletions deployment/helm/overprovisioning/templates/overprovisioning.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
apiVersion: v1
kind: Namespace
metadata:
name: {{ .Values.namespace }}
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: overprovisioning
value: -10
globalDefault: false
description: "Priority class used by overprovisioning."
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: cluster-proportional-autoscaler-overprovision
namespace: {{ .Values.namespace }}
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cluster-proportional-autoscaler-overprovision
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list", "watch"]
- apiGroups: [""]
resources: ["replicationcontrollers/scale"]
verbs: ["get", "update"]
- apiGroups: ["extensions","apps"]
resources: ["deployments/scale", "replicasets/scale"]
verbs: ["get", "update"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "create"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cluster-proportional-autoscaler-overprovision
subjects:
- kind: ServiceAccount
name: cluster-proportional-autoscaler-overprovision
namespace: {{ .Values.namespace }}
roleRef:
kind: ClusterRole
name: cluster-proportional-autoscaler-overprovision
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: overprovisioning
namespace: {{ .Values.namespace }}
spec:
replicas: 1
selector:
matchLabels:
run: overprovisioning
template:
metadata:
labels:
run: overprovisioning
spec:
priorityClassName: overprovisioning
terminationGracePeriodSeconds: 0
containers:
- name: reserve-resources
image: registry.k8s.io/pause:3.9
resources:
{{- toYaml .Values.overprovision.deployment.resources | nindent 10 }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: overprovisioning-autoscaler
namespace: {{ .Values.namespace }}
data:
linear: |-
{
"coresPerReplica": {{ .Values.overprovision.hpa.coresPerReplica }},
"nodesPerReplica": {{ .Values.overprovision.hpa.nodesPerReplica }},
"min": {{ .Values.overprovision.hpa.minPods }},
"max": {{ .Values.overprovision.hpa.maxPods }},
"preventSinglePointFailure": {{ .Values.overprovision.hpa.preventSinglePointFailure }},
"includeUnschedulableNodes": {{ .Values.overprovision.hpa.includeUnschedulableNodes }}
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: overprovisioning-autoscaler
namespace: {{ .Values.namespace }}
labels:
app: overprovisioning-autoscaler
spec:
selector:
matchLabels:
app: overprovisioning-autoscaler
replicas: 1
template:
metadata:
labels:
app: overprovisioning-autoscaler
spec:
containers:
- image: registry.k8s.io/cluster-proportional-autoscaler-amd64:1.8.1
name: autoscaler
command:
- /cluster-proportional-autoscaler
- --namespace={{ .Values.namespace }}
- --configmap=overprovisioning-autoscaler
- --default-params={"linear":{"coresPerReplica":8,"nodesPerReplica":4,"preventSinglePointFailure":false,"includeUnschedulableNodes":true}}
- --target=deployment/overprovisioning
- --logtostderr=true
- --v=2
serviceAccountName: cluster-proportional-autoscaler-overprovision
18 changes: 18 additions & 0 deletions deployment/helm/overprovisioning/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
environment: "staging"
namespace: "overprovisioning"

overprovision:

deployment:
resources:
requests:
memory: "512Mi"
cpu: "400m"

hpa:
coresPerReplica: 8
nodesPerReplica: 4
minPods: 1
maxPods: 3
preventSinglePointFailure: false
includeUnschedulableNodes: true
6 changes: 6 additions & 0 deletions deployment/helm/pc-apis-ingress/templates/ingress.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ metadata:
{{- with .Values.pcingress.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- if .Values.pcingress.services.stac.blockMetrics -}}
nginx.ingress.kubernetes.io/server-snippet: |
if ($request_uri = {{ appRootPath }}/metrics) {
return 403;
}
{{- end }}
{{- end }}
spec:
tls:
Expand Down
36 changes: 36 additions & 0 deletions deployment/helm/pc-apis-prometheus/alertmanager-alertmanager.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.25.0
name: main
namespace: monitoring
spec:
image: quay.io/prometheus/alertmanager:v0.25.0
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.25.0
replicas: 3
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 4m
memory: 100Mi
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: alertmanager-main
version: 0.25.0
42 changes: 42 additions & 0 deletions deployment/helm/pc-apis-prometheus/alertmanager-networkPolicy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.25.0
name: alertmanager-main
namespace: monitoring
spec:
egress:
- {}
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: prometheus
ports:
- port: 9093
protocol: TCP
- port: 8080
protocol: TCP
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: alertmanager
ports:
- port: 9094
protocol: TCP
- port: 9094
protocol: UDP
podSelector:
matchLabels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
policyTypes:
- Egress
- Ingress
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.25.0
name: alertmanager-main
namespace: monitoring
spec:
maxUnavailable: 1
selector:
matchLabels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
Loading