grafana

Author	SHA1	Message	Date
Mariell Hoversholm	757be6365a	CI: Bump golangci-lint to 2.0.2 (#103572 )	2025-04-10 14:42:23 +02:00
Yuri Tseretyan	dc0083d879	Alerting: Sequential evaluation of rules in group (#98829 ) * introduce RulesGroupComparer * extract runJob method * implement sequential evaluation * Make sequence building testable & add comments * Also run callback in recording rules + add tests * Improve tests * Address PR comments --------- Co-authored-by: William Wernert <william.wernert@grafana.com>	2025-04-02 23:10:32 +03:00
Alexander Akhmetov	695ac91290	Alerting: Add backend support for keep_firing_for (#100750 ) What is this feature? This PR introduces a new alert rule configuration option, keep_firing_for (Prometheus documentation). keep_firing_for prevents alerts from resolving immediately after the alert condition returns to normal. Instead, they transition into a "Recovering" state and are not considered resolved by the Alertmanager. Once the recovery period ends (or after the next evaluation if it is bigger than keep_firing_for), the alert transitions to "Normal" if it doesn't start alerting again: Before +----------+ +----------+ \| Alerting \|---->\| Normal \| +----------+ +----------+ ----- After +----------+ +------------+ +----------+ \| Alerting \|----->\| Recovering \|---->\| Normal \| +----------+ +------------+ +----------+ Why do we need this feature? This feature prevents flapping alerts by adding a recovery period. This helps avoid false resolutions caused by brief alert	2025-03-18 11:24:48 +01:00
Alexander Akhmetov	7dd6f52630	Alerting: Add MissingSeriesEvalsToResolve option to the AlertRule (#101184 )	2025-03-11 22:12:06 +01:00
Steve Simpson	bbab62ce39	Alerting: Select remote write path dependent on metrics backend type. (#101891 ) The remote write path differs based on whether the data source is actually Prometheus, Mimir, Cortex, or an older version of Cortex. We do not want users to have to specify the path, so this change determines the path as best it can. It may be in the future we have to make this configurable per-datasource to cater for setups where it's impossible to determine the correct path.	2025-03-11 13:45:16 +01:00
Steve Simpson	cc80681beb	Alerting: Extend recording rules test to exercise writing with data sources. (#101775 ) The change to use WriteDatasource was done in a previous commit, this adds a test case using DatasourceWriter, in addition to the one using PrometheusWriter.	2025-03-07 13:51:50 +01:00
Steve Simpson	eed07cf503	Alerting: Refactor NewPrometheusWriter function. (#101706 ) * Alerting: Refactor NewPrometheusWriter function. In order to re-use PrometheusWriter, changing the function take a PrometheusWriterConfig instead of RecordingRulesSettings, and adapt the old interface onto the new interface. * Make linter happy	2025-03-06 16:13:22 +01:00
Steve Simpson	b7dcfcedcb	Alerting: Extend recording rule definitions/interfaces with data source. (#101678 ) Extend the recording rule definition to include the target data source, allowing configuration of where the output of the recording rule is written to. Also extends the relevant interfaces in preparation for the next set of changes.	2025-03-06 14:09:17 +01:00
Alexander Akhmetov	d44728f4e5	Alerting: Metric to count imported from Prometheus rules (#100847 )	2025-03-05 14:02:28 +01:00
Yuri Tseretyan	879b121136	Alerting: Add GUID to alert rule tables (#101321 ) * add column guid to alert rule table and rule_guid to rule version table + populate the new field with UUID * update storage and domain models * patch GUID * ignore GUID in fingerprint tests	2025-02-28 09:47:25 -05:00
Yuri Tseretyan	32fde6dba4	Alerting: Update scheduler to provide full specification to rule update channel (#101375 ) update scheduler's aler rule to accept regular Evaluation in update channel This makes it accept the full rule definition, which is required in reset state.	2025-02-26 14:39:39 -05:00
Yuri Tseretyan	4cac3158c7	Alerting: Fix alert rule copy to include metadata (#100212 ) * copy metadata * add tests for copy and generator * extract copy rule to a production method and update usages * fix tests	2025-02-11 09:46:02 -05:00
Yuri Tseretyan	33b11d5c76	Alerting: Remove ID and OrgID from hash calculation (#100140 )	2025-02-05 14:15:02 -05:00
Alexander Akhmetov	a0bf9202f5	Alerting: Clear the state cache when the alert routine stops (#99681 )	2025-01-28 21:15:19 +02:00
Alexander Akhmetov	a28328d764	Alerting: Call the deletion reason provider even if the rule is no longer scheduled (#99571 ) Alerting: Call the deletion reason provider even if the rule is not scheduled anymore	2025-01-28 11:34:26 +01:00
Alexander Akhmetov	12bda63871	Alerting: Optional function to find the rule deletion reason (#99422 )	2025-01-27 11:35:52 +01:00
Yuri Tseretyan	92d6762a3a	Alerting: Store information about user that created\updated alert rule (#99395 ) * introduce new fields created_by in rule tables * update domain model and compat layer to support UpdatedBy * add alert rule generator mutators for UpdatedBy * ignore UpdatedBy in diff and hash calculation * Add user context to alert rule insert/update operations Updated InsertAlertRules and UpdateAlertRules methods to accept a user context parameter. This change ensures auditability and better tracking of user actions when creating or updating alert rules. Adjusted all relevant calls and interfaces to pass the user context accordingly. * set UpdatedBy in PreSave because this is where Updated is set * Use nil userID for system-initiated updates This ensures differentiation between system and user-initiated changes for better traceability and clarity in update origins. --------- Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>	2025-01-24 12:09:17 -05:00
Santiago	ea6cb8f139	Alerting: Panic when rule being evaluated has unexpected key (#99002 )	2025-01-15 14:59:50 +02:00
Santiago	86e8147df3	Alerting: Use AlertRuleKey for comparison before rule evaluation (#98808 ) (WIP) Alerting: Use AlertRuleKey for comparison before rule evaluation	2025-01-10 15:31:03 +01:00
Yuri Tseretyan	f851379f7d	Alerting: Add traceID to rule evalutor logger (#98549 )	2025-01-06 15:00:25 -05:00
Alexander Akhmetov	bb713cf8e4	Alerting: Add simplified_notifications_section setting to grafana_alerting_simplified_editor_rules metric (#98053 )	2024-12-17 11:13:31 +01:00
Alexander Akhmetov	324503ee8b	Alerting: Add simplified_notifications_section field to the alert rule metadata (#95988 )	2024-11-14 12:55:54 +01:00
Will Browne	25abd57029	Plugins: Update to latest go plugin SDK (0.256.0) (#95065 ) * update to latest go plugin SDK * make update-workspace * update alerting tests	2024-10-22 15:44:53 +01:00
Alexander Akhmetov	0a4e6ff86b	Alerting: Add SaveAlertInstancesForRule instance store method (#94505 ) Alerting: Add SaveAlertInstancesForRule method to the InstanceStore interface	2024-10-11 13:47:44 +02:00
Alexander Weaver	393faa8732	Alerting: Move rule evaluation status logic out of prometheus API and into scheduler (#89141 ) * Add health fields to rules and an aggregator method to the scheduler * Move health, last error, and last eval time in together to minimize state processing * Wire up a readonly scheduler to prom api * Extract to exported function * Use health in api_prometheus and fix up tests * Rename health struct to status * Fix tests one more time * Several new tests * Handle inactive rules * Push state mapping into state manager * rename to StatusReader * Rectify cyclo complexity rebase * Convert existing package local status implementation to models one * fix tests * undo RuleDefs rename	2024-09-30 16:52:49 -05:00
Alexander Akhmetov	0ed70d0b2f	Alerting: Add a metric to track the number of rules with simplified editor settings (#93511 ) * Alerting: Add a metric to track the number of rules with simplified editor settings	2024-09-20 17:56:40 +02:00
Alexander Akhmetov	9f5b05f936	Alerting: Add metadata field with editor_settings to alert rule (#93245 )	2024-09-19 16:43:41 +02:00
William Wernert	efe62086f9	Alerting: Add `type` label `rule_group_rules` metric (#91425 ) * Add group and type labels to rule_group_rules metric * Don't include group to avoid high cardinality * Add comments * Reset rule_group_rules before recording new values * Edit description for rule_group_rules * Include ruleGroup combo key in labels * Fix lint	2024-09-12 17:27:09 +03:00
Alexander Akhmetov	152d3540db	Alerting: Log number of dimensions instead of all evaluation results (#92733 )	2024-08-30 12:35:02 +02:00
Alexander Weaver	490d6ba2fd	Alerting: Extend scheduler user with datasources:read (#92410 ) Add permission	2024-08-26 10:59:54 -05:00
Alexander Akhmetov	d32e1e009b	Alerting: Update prometheus/client_golang to v1.20 (#92070 ) Update prometheus/client_golang to v1.20	2024-08-20 11:26:06 +02:00
Alexander Weaver	ac5ebe6e4d	Alerting: Add enablement flag for recording rules (#92032 ) * Add enablement flag * Disable if toggle not enabled	2024-08-19 12:01:00 -05:00
Alexander Weaver	34ab5fe1f3	Alerting: Restart rule routines if the type changes (#90867 ) * Restart when types change * Wire up test hooks correctly * testing	2024-08-14 14:57:47 -05:00
Alexander Akhmetov	149f02aebe	Alerting: Add rule_group label to grafana_alerting_rule_group_rules metric (#88289 ) * Alerting: Add rule_group label to grafana_alerting_rule_group_rules metric (#62361) * Alerting: Delete rule group metrics when the rule group is deleted This commit addresses the issue where the GroupRules metric (a GaugeVec) keeps its value and is not deleted when an alert rule is removed from the rule registry. Previously, when an alert rule with orgID=1 was active, the metric was: grafana_alerting_rule_group_rules{org="1",state="active"} 1 However, after deleting this rule, subsequent calls to updateRulesMetrics did not update the gauge value, causing the metric to incorrectly remain at 1. The fix ensures that when updateRulesMetrics is called it also deletes the group rule metrics with the corresponding label values if needed.	2024-08-13 13:27:23 +02:00
Yuri Tseretyan	ee78bb653f	Alerting: Log rule evaluation error in scheduler (#91585 )	2024-08-06 19:27:02 +03:00
Alexander Weaver	72ecde5045	Alerting: Make orgID a direct arg of writer interface (#91422 ) make orgID a direct arg of writer interface	2024-08-02 09:37:28 -05:00
William Wernert	a1ee84f757	Alerting: Remove duplicate tracing middleware from prom writer (#91353 ) Remove duplicate tracing middleware from prom writer	2024-08-01 11:57:14 -04:00
Alexander Weaver	4c71cadd5f	Alerting: Detach condition validator from condition evaluator (#91150 ) * Detach validator from evaluator * Drop unnecessary interface and type	2024-07-30 10:55:37 -05:00
Yuri Tseretyan	8323b688c6	Alerting: Improve logging in scheduler and states (#91003 ) * handle metadata map nil * remove double context * clean up logging in scheduler * do not reuse loggers from previous ticks * log the dropped tick * log tick instead of ticknum * replace with processing tick logs * log sending notifications * update logging in persister to fetch context * logs to historian moved them upstream to be able to log when store is overridden	2024-07-29 16:01:48 -04:00
William Wernert	45f298120e	Alerting: Return error when writing recorded metrics instead of default writing NaN (#90743 ) * Return error instead of default writing NaN	2024-07-22 15:47:02 -04:00
Alexander Weaver	418b077c59	Alerting: Integration testing for recording rules including writes (#90390 ) * Add success case and tests for writer using metrics * Use testable version of clock * Assert a specific series was written * Fix linter * Fix manually constructed writer	2024-07-18 17:14:49 -05:00
Alexander Weaver	88ed77e7e8	Alerting: More graceful handling of NoData in recording rules (#90312 ) * Handle NoData as its own case * Debug * Scalars parseable by CollectionReader * fix linter * Orgit add pkg/git add pkg/ not and	2024-07-17 15:24:03 -05:00
Yuri Tseretyan	c3b9c9b239	Alerting: Send information about alert rule to data source in headers (#90344 ) * add support of metadata to condition and adding it to request headers * support for additional metadata when condition is built * add additionall context to conditions: source and folder title * add version * use percent-encoding for header values	2024-07-17 22:55:12 +03:00
Alexander Weaver	111ebd4fb2	Alerting: Create integration testing infra for recording rules (#90306 ) * Create some integration testing infra for RRs * whoops * Require no error in responding * fix linter * Panic, no need to pass testing around * Extend status test	2024-07-11 14:59:52 -05:00
Alexander Weaver	ab32183e18	Alerting: Track recording rule health and last eval info ephemerally (#90247 ) * Track health and last eval info * Read method for status * Minor tests	2024-07-11 14:05:09 -05:00
Yuri Tseretyan	c3b5cabb14	Alerting: Refactor scheduler's rule evaluator to store rule key (#89925 )	2024-07-01 16:43:23 -04:00
Yuri Tseretyan	655e477c20	Alerting: Fix flaky test in scheduler's tests (#89923 )	2024-07-01 13:31:03 -04:00
Matthew Jacobson	47c9259d75	Alerting: Ensure we update State.LastSentAt before persisting (#89427 )	2024-06-25 13:01:26 -04:00
William Wernert	fcfa89f864	Alerting: Implement Prometheus remote write for recording rules (#89189 ) * Fix timestamp recorded by rule * Implement prometheus remote write * Create http client instead of transport * Address PR comments * Remove status code label	2024-06-25 17:23:42 +03:00
Matthew Jacobson	3228b64fe6	Alerting: Resend resolved notifications for ResolvedRetention duration (#88938 ) * Simple replace of State.Resolved with State.ResolvedAt * Retain ResolvedAt time between Normal->Normal transition * Introduce ResolvedRetention to keep sending recently resolved alerts * Make ResolvedRetention configurable with resolved_alert_retention * Tick-based LastSentAt for testing of ResendDelay and ResolvedRetention * Do not reset ResolvedAt during Normal->Pending transition Initially this was done to be inline with Prom ruler. However, Prom ruler doesn't keep track of Inactive->Pending/Alerting using the same alert instance, so it's more understandable that they choose not to retain ResolvedAt. In our case, since we use the same cached instance to represent the transition, it makes more sense to retain it. This should help alleviate some odd situations where temporarily entering Pending will stop future resolved notifications that would have happened because of ResolvedRetention. * Pointers for ResolvedAt & LastSentAt To avoid awkward time.Time{}.Unix() defaults on persist	2024-06-20 16:33:03 -04:00

1 2 3 4 5 ...

279 Commits