Skip to content

[alerter]bugfix: preserve group alert recovery events#4168

Open
hutiefang76 wants to merge 1 commit into
apache:masterfrom
hutiefang76:codex/fix-alarm-group-resolve
Open

[alerter]bugfix: preserve group alert recovery events#4168
hutiefang76 wants to merge 1 commit into
apache:masterfrom
hutiefang76:codex/fix-alarm-group-resolve

Conversation

@hutiefang76

Copy link
Copy Markdown
Contributor

What's changed

Close #4160.

This fixes two group alert lifecycle cases:

  • do not let the firing repeat interval suppress pending resolved alerts in a mixed firing group
  • recompute the persisted group status from all known member alerts, so one recovered alert does not mark the whole group resolved while another member is still firing

Regression tests cover both the in-memory group reducer path and the database store path.

Verification

  • JAVA_HOME=$(/usr/libexec/java_home -v 25 2>/dev/null || /usr/libexec/java_home) ./mvnw -pl hertzbeat-alerter -am test
  • git diff --check

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes two alert-group lifecycle edge cases in the alerter pipeline: ensuring recovery (resolved) events are not suppressed by the firing repeat-interval throttle, and ensuring persisted group status reflects the aggregate state of all known member alerts (not just the current push window).

Changes:

  • Bypass the group repeat-interval throttle when the group payload includes any resolved member alerts.
  • Recompute a group’s persisted status in the DB store layer from all member alert statuses (by fingerprint), and add regression tests for both reducer and store paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/reduce/AlarmGroupReduce.java Adjusts repeat-throttle behavior so mixed firing groups with resolved members are still dispatched.
hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/notice/impl/DbAlertStoreHandlerImpl.java Recomputes persisted group status from stored member alerts before saving.
hertzbeat-alerter/src/test/java/org/apache/hertzbeat/alert/reduce/AlarmGroupReduceTest.java Adds a regression test for “resolved bypasses repeat throttle”.
hertzbeat-alerter/src/test/java/org/apache/hertzbeat/alert/notice/impl/DbAlertStoreHandlerImplTest.java Adds a regression test ensuring group status stays firing when other members still fire.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 294 to 296
AlertGroupConverge ruleConfig = groupDefines.get(cache.getGroupDefineName());
long repeatInterval = ruleConfig.getRepeatInterval() != null
? ruleConfig.getRepeatInterval() * MS_PER_SECOND : DEFAULT_REPEAT_INTERVAL;
Comment on lines +142 to +147
List<SingleAlert> alerts = singleAlertDao.findSingleAlertsByFingerprintIn(alertFingerprints);
if (alerts == null) {
return;
}
boolean hasFiringAlert = alerts.stream()
.anyMatch(alert -> CommonConstants.ALERT_STATUS_FIRING.equals(alert.getStatus()));
Comment on lines 128 to 131
// Save alert group
groupAlert.setAlertFingerprints(alertFingerprints.stream().toList());
refreshGroupStatus(groupAlert);
GroupAlert savedGroupAlert = groupAlertDao.save(groupAlert);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Alarm group convergence: group wrongly flips to resolved and recovery events get silently dropped

2 participants