Commit Graph

17 Commits

Author SHA1 Message Date
Alexander Akhmetov 149f02aebe Alerting: Add rule_group label to grafana_alerting_rule_group_rules metric (#88289)
* Alerting: Add rule_group label to grafana_alerting_rule_group_rules metric (#62361)

* Alerting: Delete rule group metrics when the rule group is deleted

This commit addresses the issue where the GroupRules metric (a GaugeVec)
keeps its value and is not deleted when an alert rule is removed from the rule registry.
Previously, when an alert rule with orgID=1 was active, the metric was:

  grafana_alerting_rule_group_rules{org="1",state="active"} 1

However, after deleting this rule, subsequent calls to updateRulesMetrics
did not update the gauge value, causing the metric to incorrectly remain at 1.

The fix ensures that when updateRulesMetrics is called it
also deletes the group rule metrics with the corresponding label values if needed.
2024-08-13 13:27:23 +02:00
Matthew Jacobson 3228b64fe6 Alerting: Resend resolved notifications for ResolvedRetention duration (#88938)
* Simple replace of State.Resolved with State.ResolvedAt

* Retain ResolvedAt time between Normal->Normal transition

* Introduce ResolvedRetention to keep sending recently resolved alerts

* Make ResolvedRetention configurable with resolved_alert_retention

* Tick-based LastSentAt for testing of ResendDelay and ResolvedRetention

* Do not reset ResolvedAt during Normal->Pending transition

Initially this was done to be inline with Prom ruler. However, Prom ruler
doesn't keep track of Inactive->Pending/Alerting using the same alert instance,
so it's more understandable that they choose not to retain ResolvedAt. In our
case, since we use the same cached instance to represent the transition, it
makes more sense to retain it.

This should help alleviate some odd situations where temporarily entering
Pending will stop future resolved notifications that would have happened
because of ResolvedRetention.

* Pointers for ResolvedAt & LastSentAt

To avoid awkward time.Time{}.Unix() defaults on persist
2024-06-20 16:33:03 -04:00
Diego Augusto Molina 9c29e1a783 Alerting: Fix data races and improve testing (#81994)
* Alerting: fix race condition in (*ngalert/sender.ExternalAlertmanager).Run

* Chore: Fix data races when accessing members of *ngalert/state.FakeInstanceStore

* Chore: Fix data races in tests in ngalert/schedule and enable some parallel tests

* Chore: fix linters

* Chore: add TODO comment to remove loopvar once we move to Go 1.22
2024-02-14 12:45:39 -03:00
Yuri Tseretyan 131c72d655 Alerting: Fix scheduler to group folders by the unique key (orgID and UID) (#81303) 2024-01-30 17:14:11 -05:00
Yuri Tseretyan bad4f28d0d Alerting: update test TestAlertingTicker to not rely on clock (#58544)
* extract method processTick
* make processTick return scheduled rules
* move state manager tests to state manager
* update test
* move all tests into one file
* remove unused fields
2022-11-09 15:08:57 -05:00
Alexander Weaver e6f99fc418 Alerting: Decouple schedule package from store (#55858)
* Separate out fake for scheduler tests

* Delete extracted methods from older fake
2022-09-27 13:48:12 -05:00
Yuriy Tseretyan 02f8e99ca1 Alerting: move fake stores to store package (#45428)
* make fake storage public
* move fake storages to store package
2022-02-15 17:24:39 -05:00
George Robinson 67a3e1d6fd Add context.Context to InstanceStore (#45049) 2022-02-08 13:49:04 +00:00
George Robinson a9399ab3cd Alerting: Add context.Context to RuleStore (#45004)
Alerting: Add context.Context to RuleStore
2022-02-08 08:52:03 +00:00
Yuriy Tseretyan ed5c664e4a Alerting: Stop firing of alert when it is updated (#39975)
* Update API to call the scheduler to remove\update an alert rule. When a rule is updated by a user, the scheduler will remove the currently firing alert instances and clean up the state cache. 
* Update evaluation loop in the scheduler to support one more channel that is used to communicate updates to it.
* Improved rule deletion from the internal registry. 
* Move alert rule version from the internal registry (structure alertRuleInfo) closer rule evaluation loop (to evaluation task structure), which will make the registry values immutable.
* Extract notification code to a separate function to reuse in update flow.
2022-01-11 11:39:34 -05:00
gotjosh 357e9ed1ea Alerting: Fix Annotation Creation when the alerting state changes (#42479)
* Fix Annotation creation
- Remove validation of panelID, now annotations are created irrespective on whether they're attached to a panel or not.
- Alwasy attach the annotation to an AlertID

* Fix annotation creation

* fix tests
2021-12-01 11:04:54 +00:00
Yuriy Tseretyan 1b5b747885 Alerting: Additional Tests for State Manager (#41291)
* rename fakeInstanceStore to FakeInstanceStore

* update test for state manager to initialize instance store with FakeInstanceStore
2021-11-04 15:15:56 -04:00
Yuriy Tseretyan 6709359148 Alerting: Tests for rule evaluation routine (#40646)
* add fake stores to record queries
2021-10-26 13:22:07 -04:00
Marcus Efraimsson fa9857499b Chore: GetDashboardQuery should be dispatched using DispatchCtx (#36877)
* Chore: GetDashboardQuery should be dispatched using DispatchCtx

* Fix after merge

* Changes after review

* Various fixes

* Use GetDashboardCtx function instead of GetDashboard
2021-09-14 16:08:04 +02:00
David Parrott 7fbeefc090 Alerting: create wrapper for Alertmanager to enable org level isolation (#37320)
Introduces org-level isolation for the Alertmanager and its components.

Silences, Alerts and Contact points are not separated by org and are not shared between them.

Co-authored with @davidmparrott and @papagian
2021-08-24 11:28:09 +01:00
gotjosh f3f3fcc727 Alerting: Introduces /api/v1/ngalert/alertmanagers to expose discovered and dropped Alertmanager(s) (#37632)
* Alerting: Expose discovered and dropped Alertmanagers

Exposes the API for discovered and dropped Alertmanagers.

* make admin config poll interval configurable

* update after rebase

* wordsmith

* More wordsmithing

* change name of the config

* settings package too
2021-08-13 13:14:36 +01:00
gotjosh f83cd401e5 Alerting: Send alerts to external Alertmanager(s) (#37298)
* Alerting: Send alerts to external Alertmanager(s)

Within this PR we're adding support for registering or unregistering
sending to a set of external alertmanagers. A few of the things that are
going are:

- Introduce a new table to hold "admin" (either org or global)
  configuration we can change at runtime.
- A new periodic check that polls for this configuration and adjusts the
  "senders" accordingly.
- Introduces a new concept of "senders" that are responsible for
  shipping the alerts to the external Alertmanager(s). In a nutshell,
this is the Prometheus notifier (the one in charge of sending the alert)
mapped to a multi-tenant map.

There are a few code movements here and there but those are minor, I
tried to keep things intact as much as possible so that we could have an
easier diff.
2021-08-06 13:06:56 +01:00