Skip to content

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Nov 20, 2024

What does this PR do?

Use failure_threshold introduced in elastic/beats#41570 in self-monitoring configuration to avoid elastic-agent reporting DEGRADED if it fails to fetch metrics due to a component starting/stopping.
The default value for the failure threshold is set to 2 but it can be configured via config file or fleet policy.

Why is it important?

It is important to avoid a misrepresentation of agent status due to a single metrics fetch erroring out once.
See #5332

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

This is an automatic backport of pull request #5999 done by [Mergify](https://mergify.com).

* Add failureThreshold to elastic-agent self-monitoring config

(cherry picked from commit 2a46509)
@mergify mergify bot added the backport label Nov 20, 2024
@mergify mergify bot requested a review from a team as a code owner November 20, 2024 08:27
@mergify mergify bot requested review from andrzej-stencel and pchila and removed request for a team November 20, 2024 08:27
@mergify mergify bot assigned pchila Nov 20, 2024
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Nov 20, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@pchila pchila merged commit c5c78e3 into 8.x Nov 20, 2024
15 of 16 checks passed
@pchila pchila deleted the mergify/bp/8.x/pr-5999 branch November 20, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants