Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(healthcheck): Update Hubble Readiness & Liveliness probes #1048

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,18 @@ spec:
memory: {{ .Values.resources.limits.memory | quote }}
cpu: {{ .Values.resources.limits.cpu | quote }}
readinessProbe:
httpGet:
path: /metrics
port: {{ .Values.agent.container.retina.ports.containerPort }}
initialDelaySeconds: 10
exec:
command:
- hubble
- status
initialDelaySeconds: 15
periodSeconds: 10
livenessProbe:
exec:
command:
- hubble
- status
initialDelaySeconds: 30
periodSeconds: 30
env:
- name: POD_NAME
Expand Down Expand Up @@ -189,11 +197,19 @@ spec:
- -command
- .\setkubeconfigpath.ps1; ./controller.exe --config ./retina/config.yaml --kubeconfig ./kubeconfig
readinessProbe:
httpGet:
path: /metrics
port: {{ .Values.agent.container.retina.ports.containerPort }}
exec:
command:
- hubble
mereta marked this conversation as resolved.
Show resolved Hide resolved
- status
initialDelaySeconds: 15
periodSeconds: 10
livenessProbe:
exec:
command:
- hubble
- status
initialDelaySeconds: 30
periodSeconds: 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does the hubble status cmd take to respond at scale? If the command is generally quick and fits within the 30s window, perfect, but if it's frequently greater than 30s for a healthy response then restarting the pod may do more harm than good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to run some scale tests. TBC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref: https://github.com/cilium/cilium/blob/c7347199af4fe426c87fc0f5572f493559259ea6/pkg/hubble/observer/local_observer.go#L210

The status should be pretty static w.r.t. scale. Should have the ring buffer size (which is fixed) as upper limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some adjustments needed to run scale tests for hubble atm

env:
- name: POD_NAME
valueFrom:
Expand Down
Loading