Recording rule fields were not being copied correctly when duplicating an alert rule. This manifests as missing `TargetDataSourceUID` fields from the `Record` part of the rule when rules in a group are re-ordered.
Added some additional tests to ensure we cover the generation of recording rules in tests and fixed the copying logic to ensure all fields are copied correctly.
* Alerting: Add Extended List Query for Alert Rules w/pagination
This adds an extended query which allows filtering by the kind of rule (Recording or Alerting) and supports pagination.
Pagination tokens will allow us to list all rules paginated, regardless of the rule group.
---------
Co-authored-by: William Wernert <william.wernert@grafana.com>
This reintroduces store level pagination, without using it in the prometheus API yet.
Related to #108633
Co-authored-by: William Wernert <william.wernert@grafana.com>
* Ensure field validators return the proper type
This ensures correct error propagation through services up to
the API layer.
* Move error wrapping up to call site
What is this feature?
A follow-up for #101184, adds AlertRule.MissingSeriesEvalsToResolve to the APIs.
missing_series_evals_to_resolve must be specified too and it must be > 0.
POST /api/ruler/grafana/api/v1/rules/{folderUID} works in the following way:
If missing_series_evals_to_resolve is not sent or null, the rule keeps its existing value
If missing_series_evals_to_resolve > 0: updates to that value
If missing_series_evals_to_resolve = 0: resets to default (nil).
AlertRule.MissingSeriesEvalsToResolve can't be 0, so I used it to reset
In the Provisioning API, the value is just set if present and > 0. Otherwise it's reset:
PUT to /api/v1/provisioning/alert-rules/{UID}:
If missing_series_evals_to_resolve is nil, it's reset to the default value
If missing_series_evals_to_resolve > 0, it's updated
What is this feature?
This PR introduces a new alert rule configuration option, keep_firing_for (Prometheus documentation).
keep_firing_for prevents alerts from resolving immediately after the alert condition returns to normal. Instead, they transition into a "Recovering" state and are not considered resolved by the Alertmanager. Once the recovery period ends (or after the next evaluation if it is bigger than keep_firing_for), the alert transitions to "Normal" if it doesn't start alerting again:
Before
+----------+ +----------+
| Alerting |---->| Normal |
+----------+ +----------+
-----
After
+----------+ +------------+ +----------+
| Alerting |----->| Recovering |---->| Normal |
+----------+ +------------+ +----------+
Why do we need this feature?
This feature prevents flapping alerts by adding a recovery period. This helps avoid false resolutions caused by brief alert
Extend the recording rule definition to include the target data source, allowing
configuration of where the output of the recording rule is written to. Also
extends the relevant interfaces in preparation for the next set of changes.
* add column guid to alert rule table and rule_guid to rule version table
+ populate the new field with UUID
* update storage and domain models
* patch GUID
* ignore GUID in fingerprint tests
Initially, Metadata had only the EditorSettings, and HasMetadata was used to understand if the incoming update request had metadata in the body because it could be omitted if it was empty. For example, when the rule is updated via the provisioning API or has only false values. If it was in the request, we used that; if not, we used the metadata from the existing rule from the database. If the rule was updated via the AlertRuleService, we didn't change Metadata at all if the rule already existed.
But now, Metadata also has the Prometheus rule definition, and we always need to update it with the new version of the AlertRuleService when the rule exists in the DB and has the same UID. HasMetadata is renamed to HasEditorSettings to keep the old behaviour only for EditorSettings.
Now, the provisioning API and the conversion API will overwrite everything except EditorSettings with the new data.
* introduce new fields created_by in rule tables
* update domain model and compat layer to support UpdatedBy
* add alert rule generator mutators for UpdatedBy
* ignore UpdatedBy in diff and hash calculation
* Add user context to alert rule insert/update operations
Updated InsertAlertRules and UpdateAlertRules methods to accept a user context parameter. This change ensures auditability and better tracking of user actions when creating or updating alert rules. Adjusted all relevant calls and interfaces to pass the user context accordingly.
* set UpdatedBy in PreSave because this is where Updated is set
* Use nil userID for system-initiated updates
This ensures differentiation between system and user-initiated changes for better traceability and clarity in update origins.
---------
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* Add health fields to rules and an aggregator method to the scheduler
* Move health, last error, and last eval time in together to minimize state processing
* Wire up a readonly scheduler to prom api
* Extract to exported function
* Use health in api_prometheus and fix up tests
* Rename health struct to status
* Fix tests one more time
* Several new tests
* Handle inactive rules
* Push state mapping into state manager
* rename to StatusReader
* Rectify cyclo complexity rebase
* Convert existing package local status implementation to models one
* fix tests
* undo RuleDefs rename
* introduce storage model for alert rule tables
* remove AlertRuleVersion from models because it's not used anywhere other than in storage
* update historian xorm store to use alerting store to fetch rules
* fix folder tests
---------
Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>
* Alerting: Add rule_group label to grafana_alerting_rule_group_rules metric (#62361)
* Alerting: Delete rule group metrics when the rule group is deleted
This commit addresses the issue where the GroupRules metric (a GaugeVec)
keeps its value and is not deleted when an alert rule is removed from the rule registry.
Previously, when an alert rule with orgID=1 was active, the metric was:
grafana_alerting_rule_group_rules{org="1",state="active"} 1
However, after deleting this rule, subsequent calls to updateRulesMetrics
did not update the gauge value, causing the metric to incorrectly remain at 1.
The fix ensures that when updateRulesMetrics is called it
also deletes the group rule metrics with the corresponding label values if needed.
* add support of metadata to condition and adding it to request headers
* support for additional metadata when condition is built
* add additionall context to conditions: source and folder title
* add version
* use percent-encoding for header values
* Check if a time interval is used in alert rules before deleting it
* Add time interval to parameters of ListAlertRulesQuery and ListNotificationSettings of DbStore
== Refacorings ==
* refactor isMuteTimeInUse to accept a single route
* update getMuteTiming to not return err
* update delete to get the mute timing from config first
* add method CanReadAllRules to rule authorization service
* add alias type Namespace for Folder in ngalert's models package. It implements the Namespacer interface that is used by authz logic
* update state history's backends to authorize access to rules.
* update Loki to add folders UIDs to query.
* Update BuildLogQuery to drop filter by folders if it's too long and fall back to in-memory filtering.
* add test for the bug
* remove unused struct
* update db store to post process filters by group using go-lang's case-sensitive string comparison
--------
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* Support record struct in provisioning API
* Update api spec
* Use record field
* Restrict API endpoints following toggle
* Fix swagger spec
* Add recording rule validation to store validator
* Folders: Optionally include fullpath in service responses
* Alerting: Export folder fullpath instead of title
* Escape separator in folder title
* Add support for provisiong alret rules into subfolders
* Use FolderService for creating folders during provisioning
* Export WithFullpath() folder service function
---------
Co-authored-by: Tania B <yalyna.ts@gmail.com>
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* Alerting: Add optional metadata to GET silence responses
- ruleMetadata: to request rule metadata.
- accesscontrol: to request access control metadata.