grafana

Author	SHA1	Message	Date
George Robinson	19ebb079ba	Alerting: Add limits and filters to Prometheus Rules API (#66627 ) This commit adds support for limits and filters to the Prometheus Rules API. Limits: It adds a number of limits to the Grafana flavour of the Prometheus Rules API: - `limit` limits the maximum number of Rule Groups returned - `limit_rules` limits the maximum number of rules per Rule Group - `limit_alerts` limits the maximum number of alerts per rule It sorts Rule Groups and rules within Rule Groups such that data in the response is stable across requests. It also returns summaries (totals) for all Rule Groups, individual Rule Groups and rules. Filters: Alerts can be filtered by state with the `state` query string. An example of an HTTP request asking for just firing alerts might be `/api/prometheus/grafana/api/v1/rules?state=alerting`. A request can filter by two or more states by adding additional `state` query strings to the URL. For example `?state=alerting&state=normal`. Like the alert list panel, the `firing`, `pending` and `normal` state are first compared against the state of each alert rule. All other states are ignored. If the alert rule matches then its alert instances are filtered against states once more. Alerts can also be filtered by labels using the `matcher` query string. Like `state`, multiple matchers can be provided by adding additional `matcher` query strings to the URL. The match expression should be parsed using existing regular expression and sent to the API as URL-encoded JSON in the format: { "name": "test", "value": "value1", "isRegex": false, "isEqual": true } The `isRegex` and `isEqual` options work as follows: \| IsEqual \| IsRegex \| Operator \| \| ------- \| -------- \| -------- \| \| true \| false \| = \| \| true \| true \| =~ \| \| false \| true \| !~ \| \| false \| false \| != \|	2023-04-17 17:45:06 +01:00
gotjosh	2bbf0c9de4	Alerting: Allow Rules to Schedule to be filtered by Rule Group (#59990 ) * Alerting: Allow Rules to Schedule to be filtered by Rule Group	2023-04-13 12:55:42 +01:00
George Robinson	bd29071a0d	Revert "Alerting: Add limits to the Prometheus Rules API" (#65842 )	2023-04-03 15:20:37 +00:00
George Robinson	d96b0a71d3	Alerting: Add limits to the Prometheus Rules API (#65169 ) This commit adds a number of limits to the Grafana flavor of the Prometheus Rules API: 1. `limit` limits the maximum number of Rule Groups returned 2. `limit_rules` limits the maximum number of rules per Rule Group 3. `limit_alerts` limits the maximum number of alerts per rule It sorts Rule Groups and rules within Rule Groups such that data in the response is stable across requests. It also returns summaries (totals) for all Rule Groups, individual Rule Groups and rules.	2023-04-03 10:17:02 +01:00
Yuri Tseretyan	9eaffdf5a8	Alerting: Remove dependency on alerting package in definitions (#65390 ) * move export rules to definitions package * move provisioning contact point methods to provisioning package * move AlertRuleGroupWithFolderTitle to ngalert models and adapter functions to api's compat * move rule_types files back to where they were before.	2023-03-29 13:34:59 -04:00
Serge Zaitsev	0beb768427	Chore: Remove result fields from ngalert (#65410 ) * remove result fields from ngalert * remove duplicate imports	2023-03-28 10:34:35 +02:00
Yuri Tseretyan	f066e8cdcd	Alerting: Update to alerting 20230203015918-0e4e2675d7aa (after refactoring) (#62823 ) * add alerting prefix to some packages from alerting that have similar names in prometheus alertmanager	2023-02-03 11:36:49 -05:00
Alex Moreno	53945afedf	Alerting: Allow alert rule pausing from API (#62326 ) * Add is_paused attr to the POST alert rule group endpoint * Add is_paused to alerting API POST alert rule group * Fixed tests * Add is_paused to alerting gettable endpoints * Fix integration tests * Alerting: allow to pause existing rules (#62401) * Display Pause Rule switch in Editing Rule form * add isPaused property to form interface and dto * map isPaused prop with is_paused value from DTO Also update test snapshots * Append '(Paused)' text on alert list state column when appropriate * Change Switch styles according to discussion with UX Also adding a tooltip with info what this means * Adjust styles * Fix alignment and isPaused type definition Co-authored-by: gillesdemey <gilles.de.mey@gmail.com> * Fix test * Fix test * Fix RuleList test --------- Co-authored-by: gillesdemey <gilles.de.mey@gmail.com> * wip * Fix tests and add comments to clarify AlertRuleWithOptionals * Fix one more test * Fix tests * Fix typo in comment * Fix alert rule(s) cannot be paused via API * Add integration tests for alerting api pausing flow * Remove duplicated integration test --------- Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com> Co-authored-by: gillesdemey <gilles.de.mey@gmail.com> Co-authored-by: George Robinson <george.robinson@grafana.com>	2023-02-01 13:15:03 +01:00
Serge Zaitsev	d6d4097567	Chore: Fix goimports grouping in alerting (#62424 ) * fix goimports * fix goimports order	2023-01-30 09:55:35 +01:00
Yuri Tseretyan	05bf241952	Alerting: Update state manager to return StateTransitions when Delete or Reset (#62264 ) * update Delete and Reset methods to return state transitions this will be used by notifier code to decide whether alert needs to be sent or not. * update scheduler to provide reason to delete states and use transitions * update FromAlertsStateToStoppedAlert to accept StateTransition and filter by old state * fixup * fix tests	2023-01-27 09:46:21 +01:00
Alex Moreno	531b439cf1	Alerting: Add alert pausing feature (#60734 ) * Add field in alert_rule model, add state to alert_instance model, and state to eval * Remove paused state from eval package * Skip paused alert rules in scheduler * Add migration to add is_paused field to alert_rule table * Convert to postable alerts only if not normal, pernding, or paused * Handle paused eval results in state manager * Add Paused state to eval package * Add paused alerts logic in scheduler * Skip alert on scheduler * Remove paused status from eval package * Apply suggestions from code review Co-authored-by: George Robinson <george.robinson@grafana.com> * Remove state * Rethink schedule and manager for paused alerts * Change return to continue * Remove unused var * Rethink alert pausing * Paused alerts storing annotations * Only add one state transition * Revert boolean method renaming refactor * Revert take image refactor * Make registry errors public * Revert method extraction for getting a folder title * Revert variable renaming refactor * Undo unnecessary changes * Revert changes in test * Remove IsPause check in PatchPartiLAlertRule function * Use SetNormal to set state * Fix text by returning to old behaviour on alert rule deletion * Add test in schedule_unit_test.go to test ticks with paused alerts * Add coment to clarify usage of context.Background() * Add comment to clarify resetStateByRuleUID method usage * Move rule get to a more limited scope * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * rum gofmt on pkg/services/ngalert/schedule/schedule.go * Remove defer cancel for context * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/testing.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * skip scheduler rule state clean up on paused alert rule * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Fix mock in test * Add (hopefully) final suggestions * Use error channel from recordAnnotationsSync to cancel context * Run make gen-cue * Place pause alert check in channel update after version check * Reduce branching un update channel select * Add if for error and move code inside if in state manager ResetStateByRuleUID * Add reason to logs * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Do not delete alert rule routine, just exit on eval if is paused * Reduce branching and create-close a channel to avoid deadlocks * Separate state deletion and state reset (includes history saving) * Add current pause state in rule route in scheduler * Split clearState and bring errCh closer to RecordStatesAsync call * Change rule to ruleMeta in RecordStatesAsync * copy state to be able to modify it * Add timeout to context creation * Shorten the timeout * Use resetState is rule is paused and deleteState if rule is not paused * Remove Empty state reason * Save every rule change in historian * Add tests for DeleteStateByRuleUID and ResetStateByRuleUID * Remove useless line * Remove outdated comment Co-authored-by: George Robinson <george.robinson@grafana.com> Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>	2023-01-26 18:29:10 +01:00
George Robinson	2a291afbae	Alerting: Use consts from alerting package (#61241 )	2023-01-10 19:59:13 +00:00
Alex Moreno	174c61b949	Alerting: Set Dashboard and Panel IDs on rule group replacement (#60374 ) * Set Dashboard and Panel IDs on rule group replacement * fix comments and abbreviate test variable name * Update pkg/services/ngalert/provisioning/alert_rules.go Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com> Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>	2022-12-16 11:47:25 +01:00
George Robinson	76601f3ae7	Alerting: Better define how we set states (#59977 ) This commit better defines how we set states in resultNormal, resultAlerting, resultError and resultNoData. It changes the existing code to call methods such as SetAlerting, SetPending, SetNormal, SetError and NoData instead of assigning values to each individual field whenever the state is changed. This should make it easier to understand what fields should be set for which states and avoid cases where states are missing, or have additional unexpected fields.	2022-12-08 20:12:13 +00:00
Sofia Papagiannaki	9855e74b92	Chore: Refactor quota service (#58643 ) Chore: Refactor quota service (#57586) * Chore: refactore quota service * Apply suggestions from code review	2022-11-14 21:08:10 +02:00
George Robinson	c5ae1bcfe0	Alerting: Fix logging pointer address of DashboardUID and PanelID variables (#58539 )	2022-11-10 09:58:38 +00:00
Alexander Weaver	2bfdda5b68	Alerting: Break dependency between state and image packages (#58381 ) * Refactor state and manager to not depend directly on image interface * Move generic errors to models package * Move NotAvailableImageService to state as its only references are in state tests * Move NoopImageService to state package * Move mock to state package * Fix linter error * Fix comment styling * Fix a couple added references introduced by rebase * Empty commit to kick build	2022-11-09 15:06:49 -06:00
Kristin Laemmert	ef7145e4aa	feat(nested folders): Add CountAlertRulesInFolder to ngalert store (#58269 ) * chore: refactor CountDashboardsInFolder to use the more efficient Count() sql function * feat(nested folders): Add CountAlertRulesInFolder to ngalert store This commit adds CountAlertRulesInFolder and a new model for the CountAlertRulesQuery. It returns a count of alert rules associated with a given orgID and parent folder UID. (the namespace referenced inside alert rules is the parent folder). I'm not sure where this belongs in the ngalert service, so that will come in a future commit.	2022-11-08 11:51:00 +01:00
Sofia Papagiannaki	96cdf77995	Revert "Chore: Refactor quota service (#57586 )" (#58394 ) This reverts commit `326ea86a57`.	2022-11-08 11:52:07 +02:00
Sofia Papagiannaki	326ea86a57	Chore: Refactor quota service (#57586 ) * Chore: refactore quota service * Apply suggestions from code review	2022-11-08 10:25:34 +02:00
Neel	db1fd10ff1	Alerting: Append org ID to alert notification URLs (#57123 )	2022-11-07 16:03:25 +00:00
Yuriy Tseretyan	0a4121cef8	Alerting: Contextual log provider for rule key (#57476 ) * create contextual log context provider * use contextual provider in scheduler * init logger in the package * use context for log context * use context in state manager	2022-10-26 19:16:02 -04:00
George Robinson	802d67eeca	Alerting: Support values in notification templates (#56457 ) We have received a lot of feedback regarding the ValueString in alert notifications. Perhaps one of the most frequent complaints about ValueString is that it is difficult to read because it contains a lot of information, and the information is shown as a JSON-like string. Users have often asked how it can be templated and the answer is that it can't. Until now users have been able to add custom annotations to their alert rules which contains values via the $values variable added in previous versions of Grafana. However, these custom annotations must be added for each of the user's alert rule, instead of once in a template that all of their alerts can be notified via. This commit adds then the much requested feature to support values in notification templates. Users can then create a single template that prints the annotations, labels and values of their alerts in a format of their choice!	2022-10-10 13:40:21 +01:00
Alexander Weaver	d66ed6fe35	Alerting: Move stray model structs in store package to model package (#55968 ) * Move stray command structs to model package like the rest * Fix broken references	2022-09-29 15:47:56 -05:00
Alexander Weaver	d17ab82b98	Alerting: Break up store.RuleStore interface, delete dead code (#55776 ) * Refactor state manager to not depend on rule store interface * Refactor grafana and proxied ruler APIs to not depend on store.RuleStore * Refactor folder subscription logic to not use store.RuleStore * Delete dead code * Delete store.RuleStore	2022-09-27 08:56:30 -05:00
Yuriy Tseretyan	2d38664fe6	Alerting: Improve validation of query and expressions on rule submit (#53258 ) * Improve error messages of server-side expression * move validation of alert queries and a condition to eval package	2022-09-21 15:14:11 -04:00
Yuriy Tseretyan	199996cbf9	Alerting: Resolve stale state + add state reason to notifications (#49352 ) * adds a new reserved annotation `grafana_state_reason` * explicitly resolve stale states	2022-09-21 13:24:47 -04:00
Timur Olzhabayev	b5b41988cf	Docs: Deprecating packages_api and removing it from our pipelines (#54473 )	2022-09-01 18:15:44 +02:00
Yuriy Tseretyan	76ea0b15ae	Alerting: Scheduler to fetch folders along with rules (#52842 ) * Update GetAlertRulesForScheduling to query for folders (if needed) * Update scheduler's alertRulesRegistry to cache folder titles along with rules * Update rule eval loop to take folder title from the * Extract interface RuleStore * Pre-fetch the rule keys with the version to detect changes, and query the full table only if there are changes.	2022-08-31 11:08:19 -04:00
Yuriy Tseretyan	9f90a7b54d	Alerting: State manager to use InstanceStore (#53852 ) * move saving the state to state manager when scheduler stops * move saving state to ProcessEvalResults * add GetRuleKey to State * add LogContext to AlertRuleKey	2022-08-18 09:40:33 -04:00
Alexander Weaver	f093c249ac	Alerting: Fix incorrect embedded DTO being returned when handling rule groups (#53701 ) * Fix DTO embedding when getting/putting alert rule groups * Drop usage of word 'Domain' * Rename var as well	2022-08-12 16:36:50 -05:00
Jean-Philippe Quéméner	54217a2037	Alerting: set dashboard and panel id using annotations in provisioning api (#53221 )	2022-08-03 16:05:32 +02:00
Yuriy Tseretyan	5fb778814c	Alerting: Update rules version when folder title is updated (#53013 ) * remove support for bus from scheduler * rename event to FolderTitleUpdated and fire only if title has changed * add method to increase version of all rules that belong to a folder * update ngalert service to subscribe to folder title change event call data store and update scheduler * add tests	2022-08-01 19:28:38 -04:00
Yuriy Tseretyan	a081764fd8	Alerting: Scheduler to use AlertRule (#52354 ) * update GetAlertRulesForSchedulingQuery to have result AlertRule * update fetcher utils and registry to support AlertRule * alertRuleInfo to use alert rule instead of version * update updateCh hanlder of ruleRoutine to just clean up the state. The updated rule will be provided at the next evaluation * update evalCh handler of ruleRoutine to use rule from the message and clear state as well as update extra labels * remove unused function in ruleRoutine * remove unused model SchedulableAlertRule * store rule version in ruleRoutine instead of rule * do not call the sender if nothing to send	2022-07-26 09:40:06 -04:00
Yuriy Tseretyan	6e1e4a4215	Alerting: Update DbStore to use disabled orgs from the config (#52156 ) * update DbStore to use UnifiedAlerting settings * remove disabled orgs from scheduler and use config in db store instead * remove test	2022-07-15 14:13:30 -04:00
Alexander Weaver	2d7389c34d	Alerting: Provisioning API respects global rule quota (#52180 ) * Inject interface for quota service and create mock * Check quota and return 403 if limit exceeded * Implement tests for quota being exceeded	2022-07-13 17:36:17 -05:00
Yuriy Tseretyan	554ebd647b	Alerting: Refactor Evaluator (#51673 ) * AlertRule to return condition * update ConditionEval to not return an error because it's always nil * make getExprRequest private * refactor executeCondition to just converter and move execution to the ConditionEval as this makes code more readable. * log error if results have errors * change signature of evaluate function to not return an error	2022-07-12 16:51:32 -04:00
George Robinson	6844ac9879	Alerting: Change __alertScreenshotToken__ to __alertImageToken__ (#50771 )	2022-07-04 06:05:36 -04:00
Yuriy Tseretyan	8b3b667a47	Alerting: Fix rule API to accept 0 duration of field `For` (#50992 ) * make 'for' pointer to distinguish between missing field and 0 * set 'for' to -1 if the value is missing but not allow negative in the request + path -1 with the value from original rule * update store validation to not allow negative 'for' * update usages to use pointer	2022-06-30 11:46:26 -04:00
Yuriy Tseretyan	4d02f73e5f	Alerting: Persist rule position in the group (#50051 ) Migrations: * add a new column alert_group_idx to alert_rule table * add a new column alert_group_idx to alert_rule_version table * re-index existing rules during migration API: * set group index on update. Use the natural order of items in the array as group index * sort rules in the group on GET * update the version of all rules of all affected groups. This will make optimistic lock work in the case of multiple concurrent request touching the same groups. UI: * update UI to keep the order of alerts in a group	2022-06-22 10:52:46 -04:00
Matthew Jacobson	5dee2ed24c	Alerting: Add first Grafana reserved label grafana_folder (#50262 ) * Alerting: Add first Grafana reserved label g_label g_label holds the title of the folder container the alert. The intention of this label is to use it as part of the new default notification policy groupBy. * Add nil check on updateRule labels map * Disable gocyclo lint on schedule.ruleRoutine will remove later in a separate refactoring PR to reduce complexity. * Address doc suggestions * Update g_folder for rules in folder when folder title changes * Remove global bus in FolderService * Modify tests to fit new common g_folder label * Add changelog entry * Fix merge conflicts * Switch GrafanaReservedLabelPrefix from `g_` to `grafana_`	2022-06-17 13:10:49 -04:00
Yuriy Tseretyan	c314ce48c7	Alerting: Support for optimistic locking for alert rules (#50274 ) * add support for optimistic locking for alert_rule table * return 409 in the case of opitimistic lock	2022-06-13 12:15:28 -04:00
Jean-Philippe Quéméner	862f51216b	Alerting: improve provisioning docs (#50347 ) * Alerting: improve provisioning docs * add new provisioning page * add api docs * fix formatting and add better descriptions * fix typo	2022-06-10 16:25:15 +02:00
Jean-Philippe Quéméner	cf684ed38f	Alerting: bump rule version when updating rule group interval (#50295 ) * Alerting: move group update to alert rule service * rename validateAlertRuleInterval to validateRuleGroupInterval * init baseinterval correctly * add seconds suffix * extract validation function for reusability * add context to err message	2022-06-09 09:28:32 +02:00
Yuriy Tseretyan	a89d4a5be7	Alerting: Scheduler to drop ticks if a rule's evaluation is too slow (#48885 ) * drop ticks if evaluation of a rule is too slow. * add metric schedule_rule_evaluations_missed_total	2022-06-08 12:50:44 -04:00
Yuriy Tseretyan	49d93fb67e	Alerting: Update alert rule diff to not see difference between nil and empty map (#50192 )	2022-06-03 21:27:29 +02:00
Yuriy Tseretyan	ad25e2a20c	Alerting: Update RBAC for alert rules to consider access to rule as access to group it belongs (#49033 ) * update authz to exclude entire group if user does not have access to rule * change rule update authz to not return changes because if user does not have access to any rule in group, they do not have access to the rule * a new query that returns alerts in group by UID of alert that belongs to that group * collect all affected groups during calculate changes * update authorize to check access to groups * update tests for calculateChanges to assert new fields * add authorization tests	2022-06-01 10:23:54 -04:00
Joe Blubaugh	1d724810de	Alerting: State Manager takes screenshots. (#49338 ) The State Manager will now take screenshots when an alert instance switches to an Alerting or Resolved state. Signed-off-by: Joe Blubaugh joe.blubaugh@grafana.com	2022-05-23 10:53:41 +08:00
George Robinson	43358c7248	Alerting: Keep private annotations across evaluations (#49080 )	2022-05-18 11:21:18 +02:00
Yuriy Tseretyan	952cb4fc0b	Alerting: introduce AlertRuleGroupKey and use it in API handlers (#48945 ) * create AlertGroupKey structure * update PrometheusSrv. - extract creation of RuleGroup to a separate method. Use group key for grouping * update RuleSrv - update calculateChanges to use groupKey - authorize to use groupkey	2022-05-16 15:45:45 -04:00

1 2

78 Commits