Alerting: Add backend support for keep_firing_for (#100750)

What is this feature?

This PR introduces a new alert rule configuration option, keep_firing_for (Prometheus documentation).

keep_firing_for prevents alerts from resolving immediately after the alert condition returns to normal. Instead, they transition into a "Recovering" state and are not considered resolved by the Alertmanager. Once the recovery period ends (or after the next evaluation if it is bigger than keep_firing_for), the alert transitions to "Normal" if it doesn't start alerting again:

Before                                          

+----------+     +----------+                    
| Alerting |---->|  Normal  |                    
+----------+     +----------+                    

-----
After

+----------+      +------------+     +----------+
| Alerting |----->| Recovering |---->|  Normal  |
+----------+      +------------+     +----------+                                                 

Why do we need this feature?

This feature prevents flapping alerts by adding a recovery period. This helps avoid false resolutions caused by brief alert
This commit is contained in:
Alexander Akhmetov
2025-03-18 11:24:48 +01:00
committed by GitHub
parent 9491fa1895
commit 695ac91290
31 changed files with 1280 additions and 53 deletions
@@ -5,6 +5,7 @@
{
"expr": "",
"for": "5m",
"keep_firing_for": "0s",
"labels": {
"label1": "test-label"
},
@@ -51,6 +52,7 @@
{
"expr": "",
"for": "5m",
"keep_firing_for": "0s",
"labels": {
"label1": "test-label"
},