Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datadog-operator] User "system:serviceaccount:system-datadog:datadog-cluster-agent" cannot list resource "datadogmetrics" in API group "datadoghq.com" at the cluster scope #1561

Open
adlord opened this issue Oct 16, 2024 · 1 comment

Comments

@adlord
Copy link

adlord commented Oct 16, 2024

Describe what happened:
We observed these logs in datadog-cluster-agent pod

2024-10-16 07:59:59 UTC | CLUSTER | WARN | ([email protected]/tools/cache/reflector.go:535 in list) | pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list datadoghq.com/v1alpha1, Resource=datadogmetrics: datadogmetrics.datadoghq.com is forbidden: User "system:serviceaccount:system-datadog:datadog-cluster-agent" cannot list resource "datadogmetrics" in API group "datadoghq.com" at the cluster scope                                                                                                                                                        │
2024-10-16 07:59:59 UTC | CLUSTER | ERROR | ([email protected]/pkg/util/runtime/runtime.go:115 in logError) | pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch datadoghq.com/v1alpha1, Resource=datadogmetrics: failed to list datadoghq.com/v1alpha1, Resource=datadogmetrics: datadogmetrics.datadoghq.com is forbidden: User "system:serviceaccount:system-datadog:datadog-cluster-agent" cannot list resource "datadogmetrics" in API group "datadoghq.com" at the cluster scope

indeed, datadog-cluster-agent clusterRole (binded with the system:serviceaccount:system-datadog:datadog-cluster-agent) do not deals with datadogmetrics

❯ kubectl describe ClusterRole datadog-cluster-agent
Name:         datadog-cluster-agent
Labels:       app.kubernetes.io/instance=datadog
              app.kubernetes.io/managed-by=datadog-operator
              app.kubernetes.io/name=datadog-agent-deployment
              app.kubernetes.io/part-of=system--datadog-datadog
              app.kubernetes.io/version=
              operator.datadoghq.com/managed-by-store=true
Annotations:  <none>
PolicyRule:
  Resources                                                   Non-Resource URLs  Resource Names     Verbs
  ---------                                                   -----------------  --------------     -----
  mutatingwebhookconfigurations.admissionregistration.k8s.io  []                 []                 [create]
  mutatingwebhookconfigurations.admissionregistration.k8s.io  []                 [datadog-webhook]  [get list watch update]
  componentstatuses                                           []                 []                 [get list watch]
  configmaps                                                  []                 []                 [get list watch]
  endpoints                                                   []                 []                 [get list watch]
  events                                                      []                 []                 [get list watch]
  namespaces                                                  []                 []                 [get list watch]
  nodes                                                       []                 []                 [get list watch]
  pods                                                        []                 []                 [get list watch]
  services                                                    []                 []                 [get list watch]
  clusterresourcequotas.quota.openshift.io                    []                 []                 [get list]
                                                              [/healthz]         []                 [get]
                                                              [/version]         []                 [get]
  namespaces                                                  []                 [kube-system]      [get]
  daemonsets.apps                                             []                 []                 [get]
  deployments.apps                                            []                 []                 [get]
  replicasets.apps                                            []                 []                 [get]
  statefulsets.apps                                           []                 []                 [get]
  extendeddaemonsetreplicasets.datadoghq.com                  []                 []                 [get]
  cronjobs.batch                                              []                 []                 [list watch get]
  jobs.batch                                                  []                 []                 [list watch get]
  horizontalpodautoscalers.autoscaling                        []                 []                 [list watch]

Describe what you expected:
No error logs

Steps to reproduce the issue:
following the different step in https://arapulido.github.io/blog/2024/08/19/keda-cluster-agent/

and using Datadog Agent configuration :

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
  namespace: datadog
spec:
  global:
    kubelet:
      tlsVerify: false # This is only needed for self-signed certificates
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
      appSecret:
        secretName: datadog-secret
        keyName: app-key
  features:
    externalMetricsServer:
      enabled: true
      useDatadogMetrics: true
      registerAPIService: false
  override:
    clusterAgent:
      env: [{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: "false"}]

Workaround :

use datadog-operator service account, as the corresponding clusterRole can interact with datadogmetrics

❯ kubectl describe ClusterRole datadog-operator | grep -i datadogmetric
  datadogmetrics.datadoghq.com                                  []                 []              [create delete list watch]
  datadogmetrics.datadoghq.com/status                           []                 []              [update]

Used configuration of DatadogAgent :

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
  namespace: datadog
spec:
  global:
    kubelet:
      tlsVerify: false # This is only needed for self-signed certificates
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
      appSecret:
        secretName: datadog-secret
        keyName: app-key
  features:
    externalMetricsServer:
      enabled: true
      useDatadogMetrics: true
      registerAPIService: false
  override:
    clusterAgent:
      createRbac: false
      serviceAccountName: datadog-operator
      env: [{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: "false"}]

Additional environment details (Operating System, Cloud provider, etc):
Datadog-operator chart version: datadog-operator-1.8.6
Cloud Provider : GCP
Kubernetes Cluster version : v1.30.5-gke.1014001

@adlord
Copy link
Author

adlord commented Oct 17, 2024

One thing : on the datadog helm chart (not the operator), there is a datadogmetric mention (on particular conditions) in the cluster-agent rbac, see https://github.com/DataDog/helm-charts/blob/main/charts/datadog/templates/cluster-agent-rbac.yaml#L227

Is something missing on datadog-operator chart ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant