No alerts are triggered when scraping job times out #126

przemeklal · 2023-12-05T10:27:16Z

In a real-life deployment, scraping hardware-exporter may take more than 10s, e.g.

time curl localhost:10000/metrics
...
real	0m25.534s
user	0m0.008s
sys	0m0.000s

By default, grafana-agent's scrape_timeout is 10s. If the scrape takes more than 10s, it fails silently, there are no alerts about missing metrics and any firing alerts get resolved because there are no metrics in Prometheus since grafana-agent doesn't push them.

As a reliability engineer, I didn't know this was happening until I ran queries in Prometheus to check whether the metrics were there.

The workaround is to bump global_scrape_timeout in grafana-agent juju config to something like 60s to make sure the scrape jobs are actually executed.

The text was updated successfully, but these errors were encountered:

jneo8 · 2023-12-08T10:26:49Z

We can try to add timeout mechanism on exporter to have this timeout metrics.

aieri · 2024-09-09T17:36:09Z

This may be fixed by canonical/grafana-agent-operator#147

Pjack added enhancement New feature or request bug Something isn't working and removed enhancement New feature or request labels Dec 5, 2023

Pjack added enhancement New feature or request and removed bug Something isn't working labels Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No alerts are triggered when scraping job times out #126

No alerts are triggered when scraping job times out #126

przemeklal commented Dec 5, 2023

jneo8 commented Dec 8, 2023

aieri commented Sep 9, 2024

No alerts are triggered when scraping job times out #126

No alerts are triggered when scraping job times out #126

Comments

przemeklal commented Dec 5, 2023

jneo8 commented Dec 8, 2023

aieri commented Sep 9, 2024