[ArgoCD] Openmetrics integration in Datadog times out after 10 seconds #17599

ricardojdsilva87 · 2024-05-20T20:29:43Z

Hello,

We get the following error when the datadog agent is trying to scrape the ArgoCD controller openmetrics endpoint.

Additional environment details (Operating System, Cloud provider, etc):

Steps to reproduce the issue:
Configuration on the pod using the official ArgoCD helm chart:

podAnnotations:
  ad.datadoghq.com/application-controller.checks: |
    {
      "argocd": {
        "instances": [
          {
            "app_controller_endpoint": "http://%%host%%:8082/metrics"
          }
        ]
      }
    }

We use the same configuration as described in the documentation for Datadog.

Describe the results you expected:
When using versions above 2.9.6 and below 2.11.0 of ArgoCD we get the following error shown above

I've tried to add the setting prometheus_timeout to the openmetrics configuration like described on the documentation:
https://docs.datadoghq.com/integrations/guide/prometheus-host-collection/

With the same configuration all the needed metrics are sent to Datadog and with the default ArgoCD dashboard it's possible to see them. After changing the ArgoCD version between v2.9.7 and 2.11.0, the error starts to appear and there are no metrics reaching Datadog.

The Datadog agent version is v7.53.0
Also after adding the prometheus_timeout to 30 the same error appears with the message that it timed out after 10s, seeming not to have any effect.
Is there something I'm missing? Also with the different versions of ArgoCD it wasn't supposed to stop sending metrics.
I'll be doing some more tests in order to try and check if any other middle version might work correctly

Fyi, found this issue while investigating another one with ArgoCD itself, more information can be found here

Thanks!

UPDATE
Hello,
Just to add some more information. It seems that the issue happens if the parameter controller.sharding.algorithm: "round-robin" documented here is added to the ArgoCD configuration.
I suppose that this different mode might generate alot more metrics than the legacy configuration and that might be causing the timeout after the 10 seconds.
If there is any setting to increase this timeout, I can try it out in the configuration.
Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ArgoCD] Openmetrics integration in Datadog times out after 10 seconds #17599

[ArgoCD] Openmetrics integration in Datadog times out after 10 seconds #17599

ricardojdsilva87 commented May 20, 2024 •

edited

Loading

[ArgoCD] Openmetrics integration in Datadog times out after 10 seconds #17599

[ArgoCD] Openmetrics integration in Datadog times out after 10 seconds #17599

Comments

ricardojdsilva87 commented May 20, 2024 • edited Loading

ricardojdsilva87 commented May 20, 2024 •

edited

Loading