* add column guid to alert rule table and rule_guid to rule version table
+ populate the new field with UUID
* update storage and domain models
* patch GUID
* ignore GUID in fingerprint tests
update scheduler's aler rule to accept regular Evaluation in update channel
This makes it accept the full rule definition, which is required in reset state.
* introduce new fields created_by in rule tables
* update domain model and compat layer to support UpdatedBy
* add alert rule generator mutators for UpdatedBy
* ignore UpdatedBy in diff and hash calculation
* Add user context to alert rule insert/update operations
Updated InsertAlertRules and UpdateAlertRules methods to accept a user context parameter. This change ensures auditability and better tracking of user actions when creating or updating alert rules. Adjusted all relevant calls and interfaces to pass the user context accordingly.
* set UpdatedBy in PreSave because this is where Updated is set
* Use nil userID for system-initiated updates
This ensures differentiation between system and user-initiated changes for better traceability and clarity in update origins.
---------
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
* Add health fields to rules and an aggregator method to the scheduler
* Move health, last error, and last eval time in together to minimize state processing
* Wire up a readonly scheduler to prom api
* Extract to exported function
* Use health in api_prometheus and fix up tests
* Rename health struct to status
* Fix tests one more time
* Several new tests
* Handle inactive rules
* Push state mapping into state manager
* rename to StatusReader
* Rectify cyclo complexity rebase
* Convert existing package local status implementation to models one
* fix tests
* undo RuleDefs rename
* Add group and type labels to rule_group_rules metric
* Don't include group to avoid high cardinality
* Add comments
* Reset rule_group_rules before recording new values
* Edit description for rule_group_rules
* Include ruleGroup combo key in labels
* Fix lint
* Alerting: Add rule_group label to grafana_alerting_rule_group_rules metric (#62361)
* Alerting: Delete rule group metrics when the rule group is deleted
This commit addresses the issue where the GroupRules metric (a GaugeVec)
keeps its value and is not deleted when an alert rule is removed from the rule registry.
Previously, when an alert rule with orgID=1 was active, the metric was:
grafana_alerting_rule_group_rules{org="1",state="active"} 1
However, after deleting this rule, subsequent calls to updateRulesMetrics
did not update the gauge value, causing the metric to incorrectly remain at 1.
The fix ensures that when updateRulesMetrics is called it
also deletes the group rule metrics with the corresponding label values if needed.
* handle metadata map nil
* remove double context
* clean up logging in scheduler
* do not reuse loggers from previous ticks
* log the dropped tick
* log tick instead of ticknum
* replace with processing tick logs
* log sending notifications
* update logging in persister to fetch context
* logs to historian
moved them upstream to be able to log when store is overridden
* Add success case and tests for writer using metrics
* Use testable version of clock
* Assert a specific series was written
* Fix linter
* Fix manually constructed writer
* add support of metadata to condition and adding it to request headers
* support for additional metadata when condition is built
* add additionall context to conditions: source and folder title
* add version
* use percent-encoding for header values
* Create some integration testing infra for RRs
* whoops
* Require no error in responding
* fix linter
* Panic, no need to pass testing around
* Extend status test
* Simple replace of State.Resolved with State.ResolvedAt
* Retain ResolvedAt time between Normal->Normal transition
* Introduce ResolvedRetention to keep sending recently resolved alerts
* Make ResolvedRetention configurable with resolved_alert_retention
* Tick-based LastSentAt for testing of ResendDelay and ResolvedRetention
* Do not reset ResolvedAt during Normal->Pending transition
Initially this was done to be inline with Prom ruler. However, Prom ruler
doesn't keep track of Inactive->Pending/Alerting using the same alert instance,
so it's more understandable that they choose not to retain ResolvedAt. In our
case, since we use the same cached instance to represent the transition, it
makes more sense to retain it.
This should help alleviate some odd situations where temporarily entering
Pending will stop future resolved notifications that would have happened
because of ResolvedRetention.
* Pointers for ResolvedAt & LastSentAt
To avoid awkward time.Time{}.Unix() defaults on persist
* Make MakeDependencyError public for tests in another package
* Create tests for errors in eval results
* Extract logic to pull frame errors out into exported function
* Maybe we can drop cyclomatic complexity lint suppression now?
* extract frame errors and fail recording rules if frames contain error
* Fix up retry logic to actually work
* Do not retry non retryable errors
* Basic eval flow
* Wiring-up
* fix
* Extend todo
* Start with tests
* Include some relevant tests, skip ones that seem to have timing-based race conditions
* Some tests, touch up linter and todo
* Solve TODO
* Add tracing
* Tests to make sure an eval went through
* Wire up feature toggles
* Update pkg/services/ngalert/schedule/recording_rule.go
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
* Update pkg/services/ngalert/schedule/recording_rule_test.go
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
* Update pkg/services/ngalert/schedule/recording_rule_test.go
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
* Update pkg/services/ngalert/schedule/recording_rule_test.go
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
---------
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
* Add shim rule implementation for recording rules
* Give ruleFactory access to the original rule definition
* Schedule shim implementation if the rule is a recording rule
* Fix or suppress linter
* Fix nolint