grafana

Author	SHA1	Message	Date
Will Browne	a8577c21ba	Plugins: Migrate PluginStore mock to pre-existing fakes package (#71664 ) * migrate to existing fakes package * fix imports	2023-07-17 10:21:44 +00:00
George Robinson	7edbe72483	Alerting: Support concurrent queries for saving alert instances (#70525 ) This commit adds support for concurrent queries when saving alert instances to the database. This is an experimental feature in response to some customers experiencing delays between rule evaluation and sending alerts to Alertmanager, resulting in flapping. It is disabled by default.	2023-06-23 11:36:07 +01:00
Matthew Jacobson	ba3994d338	Alerting: Repurpose rule testing endpoint to return potential alerts (#69755 ) * Alerting: Repurpose rule testing endpoint to return potential alerts This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations. The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent. This means that the endpoint will, among other things: - Attach static annotations and labels from the rule configuration to the alert instances. - Attach dynamic annotations from the datasource to the alert instances. - Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances. - Interpolate templated annotations / labels and accept allowed template functions.	2023-06-08 18:59:54 -04:00
Yuri Tseretyan	9eb10bee1f	Alerting: Scheduler use rule fingerprint instead of version (#66531 ) * implement calculation of fingerprint for ruleWithFolder * update scheduler to use fingerprint instead of rule's version	2023-04-28 10:42:16 -04:00
Kyle Brandt	840fb32ad8	SSE: (Instrumentation) Add Tracing (#66700 ) spans are prefixed `SSE.`	2023-04-18 08:04:51 -04:00
Kyle Brandt	2f13c851e4	SSE: (Chore/Instrumentation) Add ds_queries_total metric and move met… (#66695 ) * SSE: (Chore/Instrumentation) Add ds_queries_total metric and move metrics to service	2023-04-17 16:12:44 -07:00
Kyle Brandt	e78be44e1a	SSE: Dataplane Compliance (#65927 ) Takes a specific code path for data that identifies itself as dataplane instead of "guessing" what the data is. The data must identify itself by being in the dataplane by having both the following frame metadata properties: - TypeVersion property that is greater than 0.0 - 'Type' property The flag is disableSSEDataplane and disables this functionality and uses the old code for all queries regardless. See https://github.com/grafana/grafana-plugin-sdk-go/blob/main/data/contract_docs/contract.md for dataplane details.	2023-04-12 12:24:34 -04:00
Yuri Tseretyan	85a954cd81	Alerting: Update scheduler to get updates only from database (#64635 ) * stop using the scheduler's Update and Delete methods all communication must be via the database * update scheduler's registry to calculate diff before re-setting the cache * update fetcher to return the diff generated by registry * update processTick to update rule eval routine if the rule was updated and it is not going to be evaluated at this tick. * remove references to the scheduler from api package * remove unused methods in the scheduler	2023-03-14 18:02:51 -04:00
Alex Moreno	f60dc4441f	Alerting: Add status label to GroupRules metric (#63454 ) * Add status label to GroupRules metric * Add state (active and paused) label to GrouRules * Add active/paused metrics tests	2023-02-23 12:38:27 +01:00
Steve Simpson	4d1a2c3370	Alerting: Move `rule_groups_rules` metric from State to Scheduler. (#63144 ) The `rule_groups_rules` metric is currently defined and computed by `State`. It makes more sense for this metric to be computed off of the configured rule set, not based on the rule evaluation state. There could be an edge condition where a rule does not have a state yet, and so is uncounted. Additionally, we would like this metric (and others), to have a `rule_group` label, and this is much easier to achieve if the metric is produced from the `Scheduler` package.	2023-02-09 17:05:19 +01:00
Yuri Tseretyan	f066e8cdcd	Alerting: Update to alerting 20230203015918-0e4e2675d7aa (after refactoring) (#62823 ) * add alerting prefix to some packages from alerting that have similar names in prometheus alertmanager	2023-02-03 11:36:49 -05:00
ismail simsek	91221bc436	Expressions: Fixes the issue showing expressions editor (#62510 ) * Use suggested value for uid * update the snapshot * use __expr__ * replace all -100 with __expr__ * update snapshot * more changes * revert redundant change * Use expr.DatasourceUID where it's possible * generate files	2023-01-31 18:50:10 +01:00
Alex Moreno	531b439cf1	Alerting: Add alert pausing feature (#60734 ) * Add field in alert_rule model, add state to alert_instance model, and state to eval * Remove paused state from eval package * Skip paused alert rules in scheduler * Add migration to add is_paused field to alert_rule table * Convert to postable alerts only if not normal, pernding, or paused * Handle paused eval results in state manager * Add Paused state to eval package * Add paused alerts logic in scheduler * Skip alert on scheduler * Remove paused status from eval package * Apply suggestions from code review Co-authored-by: George Robinson <george.robinson@grafana.com> * Remove state * Rethink schedule and manager for paused alerts * Change return to continue * Remove unused var * Rethink alert pausing * Paused alerts storing annotations * Only add one state transition * Revert boolean method renaming refactor * Revert take image refactor * Make registry errors public * Revert method extraction for getting a folder title * Revert variable renaming refactor * Undo unnecessary changes * Revert changes in test * Remove IsPause check in PatchPartiLAlertRule function * Use SetNormal to set state * Fix text by returning to old behaviour on alert rule deletion * Add test in schedule_unit_test.go to test ticks with paused alerts * Add coment to clarify usage of context.Background() * Add comment to clarify resetStateByRuleUID method usage * Move rule get to a more limited scope * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * rum gofmt on pkg/services/ngalert/schedule/schedule.go * Remove defer cancel for context * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/testing.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/schedule/schedule_unit_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Update pkg/services/ngalert/models/instance_test.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * skip scheduler rule state clean up on paused alert rule * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> * Fix mock in test * Add (hopefully) final suggestions * Use error channel from recordAnnotationsSync to cancel context * Run make gen-cue * Place pause alert check in channel update after version check * Reduce branching un update channel select * Add if for error and move code inside if in state manager ResetStateByRuleUID * Add reason to logs * Update pkg/services/ngalert/schedule/schedule.go Co-authored-by: George Robinson <george.robinson@grafana.com> * Do not delete alert rule routine, just exit on eval if is paused * Reduce branching and create-close a channel to avoid deadlocks * Separate state deletion and state reset (includes history saving) * Add current pause state in rule route in scheduler * Split clearState and bring errCh closer to RecordStatesAsync call * Change rule to ruleMeta in RecordStatesAsync * copy state to be able to modify it * Add timeout to context creation * Shorten the timeout * Use resetState is rule is paused and deleteState if rule is not paused * Remove Empty state reason * Save every rule change in historian * Add tests for DeleteStateByRuleUID and ResetStateByRuleUID * Remove useless line * Remove outdated comment Co-authored-by: George Robinson <george.robinson@grafana.com> Co-authored-by: Santiago <santiagohernandez.1997@gmail.com> Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>	2023-01-26 18:29:10 +01:00
Santiago	b5fa9e3501	Chore: Fix "manger" typo (#61649 ) fix mangers -> managers	2023-01-17 23:13:27 +00:00
Yuri Tseretyan	86b5fbbf60	Alerting: Introduce state manager config structure (#61249 )	2023-01-10 16:26:15 -05:00
George Robinson	2a291afbae	Alerting: Use consts from alerting package (#61241 )	2023-01-10 19:59:13 +00:00
Yuri Tseretyan	48f1db63ff	Alerting: Add support for tracing to alerting scheduler (#61057 )	2023-01-06 21:21:43 -05:00
Yuri Tseretyan	c5ee4e4ae1	Alerting: Improve rule validation to check if rule uses backend datasources (#58986 ) * validate if rule uses backend datasources * add backend datasource to test * fix tests * another forgotten import * remove unused var	2022-12-08 10:44:02 +01:00
Alexander Weaver	9977c7ea43	Alerting: Simplify scheduler configuration and remove dependency on Grafana-wide settings (#59735 ) * Make scheduler not depend directly on grafana-wide settings * Re-add missing interval	2022-12-02 16:02:07 -06:00
Alexander Weaver	2bfdda5b68	Alerting: Break dependency between state and image packages (#58381 ) * Refactor state and manager to not depend directly on image interface * Move generic errors to models package * Move NotAvailableImageService to state as its only references are in state tests * Move NoopImageService to state package * Move mock to state package * Fix linter error * Fix comment styling * Fix a couple added references introduced by rebase * Empty commit to kick build	2022-11-09 15:06:49 -06:00
Yuri Tseretyan	bad4f28d0d	Alerting: update test TestAlertingTicker to not rely on clock (#58544 ) * extract method processTick * make processTick return scheduled rules * move state manager tests to state manager * update test * move all tests into one file * remove unused fields	2022-11-09 15:08:57 -05:00
Yuri Tseretyan	3621cf5a12	Alerting: Update handling of stale state (#58276 ) * delete all stale states in one lock * do not use touched states to detect stale rely only on LastEvaluationTime maintained correctly * fix tests to use correct eval time * delete unused method	2022-11-07 11:03:53 -05:00
Yuri Tseretyan	dce8879145	Alerting: Update state manager to accept rule store as Warm method argument (#58244 )	2022-11-04 14:23:08 -04:00
Yuriy Tseretyan	e3a4bde622	Alerting: Condition evaluator with cached pipeline (#57479 ) * create rule evaluator * load header from the context * init one factory * update scheduler	2022-11-02 10:13:39 -04:00
Yuriy Tseretyan	0a4121cef8	Alerting: Contextual log provider for rule key (#57476 ) * create contextual log context provider * use contextual provider in scheduler * init logger in the package * use context for log context * use context in state manager	2022-10-26 19:16:02 -04:00
Alexander Weaver	de46c1b002	Alerting: Improve logs in state manager and historian (#57374 ) * Touch up log statements, fix casing, add and normalize contexts * Dedicated logger for dashboard resolver * Avoid injecting logger to historian * More minor log touch-ups * Dedicated logger for state manager * Use rule context in annotation creator * Rename base logger and avoid redundant contextual loggers	2022-10-21 16:16:51 -05:00
Yuriy Tseretyan	f3c219a980	Alerting: update format of logs in scheduler (#57302 ) * Change the severity level of the log messages	2022-10-20 13:43:48 -04:00
George Robinson	52965de369	Alerting: Add doc comments to state struct and normalize fields (#56647 )	2022-10-11 09:30:33 +01:00
Joe Blubaugh	b476ae62fb	Alerting: Write and Delete multiple alert instances. (#55350 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. These new transactions are off by default, guarded by the feature toggle "alertingBigTransactions" Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances.	2022-10-06 14:22:58 +08:00
Alexander Weaver	8df830557a	Alerting: Move annotation functionality behind a history persistence interface (#56133 ) * Move annotation functionality behind a history persistence interface * Rename to RecordState * Fix lint error in import aliasing * One more import linter error	2022-10-05 15:32:20 -05:00
Alexander Weaver	e6f99fc418	Alerting: Decouple schedule package from store (#55858 ) * Separate out fake for scheduler tests * Delete extracted methods from older fake	2022-09-27 13:48:12 -05:00
Alexander Weaver	a00879ae21	Alerting: Refactor store to not export its own interface for InstanceStore, delete dead dependency injection (#55772 ) * Add consumer-side store interface to state manager * Remove dead dependency * Delete dead dependency in API struct * Delete store-layer InstanceStore interface * Move fake for state's InstanceStore interface to state package	2022-09-26 13:55:05 -05:00
Sofia Papagiannaki	754eea20b3	Chore: SQL store split for annotations (#55089 ) * Chore: SQL store split for annotations * Apply suggestion from code review	2022-09-19 10:54:37 +03:00
Joe Blubaugh	22c937340e	Revert "Alerting: Write and Delete multiple alert instances. (#54072 )" (#54885 ) This reverts commit `5e4fd94413`.	2022-09-09 17:44:06 +02:00
Joe Blubaugh	5e4fd94413	Alerting: Write and Delete multiple alert instances. (#54072 ) Prior to this change, all alert instance writes and deletes happened individually, in their own database transaction. This change batches up writes or deletes for a given rule's evaluation loop into a single transaction before applying it. Before: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 398 2991381 ns/op 1133537 B/op 27703 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: FovKXiRVzm} with title: "an alert definition FTvFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: foDFXmRVkm} with title: "an alert definition fovFXmRVkz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: VQvFuigVkm} with title: "an alert definition VwDKXmR4kz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.619s ``` After: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/grafana/pkg/services/ngalert/store BenchmarkAlertInstanceOperations-8 1440 816484 ns/op 352297 B/op 6529 allocs/op --- BENCH: BenchmarkAlertInstanceOperations-8 util.go:127: alert definition: {orgID: 1, UID: 302r_igVzm} with title: "an alert definition q0h9lmR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: 71hrlmR4km} with title: "an alert definition nJ29_mR4zz" interval: 60 created util.go:127: alert definition: {orgID: 1, UID: Cahr_mR4zm} with title: "an alert definition ja2rlmg4zz" interval: 60 created PASS ok github.com/grafana/grafana/pkg/services/ngalert/store 1.383s ``` So we cut time by about 75% and memory allocations by about 60% when storing and deleting 100 instances. This change also updates some of our tests so that they run successfully against postgreSQL - we were using random Int64s, but postgres integers, which our tables use, max out at 2^31-1	2022-09-02 11:17:20 +08:00
Yuriy Tseretyan	76ea0b15ae	Alerting: Scheduler to fetch folders along with rules (#52842 ) * Update GetAlertRulesForScheduling to query for folders (if needed) * Update scheduler's alertRulesRegistry to cache folder titles along with rules * Update rule eval loop to take folder title from the * Extract interface RuleStore * Pre-fetch the rule keys with the version to detect changes, and query the full table only if there are changes.	2022-08-31 11:08:19 -04:00
Marcus Efraimsson	87afd9cadc	Plugins: Remove various custom headers logic (#54146 ) Removes various custom headers logic sprinkled around in the backend. It should automatically be applied to outgoing HTTP requests via the CustomHeadersMiddleware. This also removes decryption of SecureJSONData to populate custom headers in ngalert which seemed to have caused a ton of CPU usage.	2022-08-26 11:56:10 +02:00
Yuriy Tseretyan	03e746d9df	Alerting: Delete state from the database on reset (#53919 ) * make ResetStatesByRuleUID return states * delete rule states when reset * rule eval routine to clean up the state only when rule is deleted	2022-08-25 14:12:22 -04:00
Yuriy Tseretyan	9f90a7b54d	Alerting: State manager to use InstanceStore (#53852 ) * move saving the state to state manager when scheduler stops * move saving state to ProcessEvalResults * add GetRuleKey to State * add LogContext to AlertRuleKey	2022-08-18 09:40:33 -04:00
Yuriy Tseretyan	5fb778814c	Alerting: Update rules version when folder title is updated (#53013 ) * remove support for bus from scheduler * rename event to FolderTitleUpdated and fire only if title has changed * add method to increase version of all rules that belong to a folder * update ngalert service to subscribe to folder title change event call data store and update scheduler * add tests	2022-08-01 19:28:38 -04:00
Yuriy Tseretyan	a081764fd8	Alerting: Scheduler to use AlertRule (#52354 ) * update GetAlertRulesForSchedulingQuery to have result AlertRule * update fetcher utils and registry to support AlertRule * alertRuleInfo to use alert rule instead of version * update updateCh hanlder of ruleRoutine to just clean up the state. The updated rule will be provided at the next evaluation * update evalCh handler of ruleRoutine to use rule from the message and clear state as well as update extra labels * remove unused function in ruleRoutine * remove unused model SchedulableAlertRule * store rule version in ruleRoutine instead of rule * do not call the sender if nothing to send	2022-07-26 09:40:06 -04:00
Yuriy Tseretyan	054fe54b03	Alerting: Split Scheduler and AlertRouter tests (#52416 ) * move fake FakeExternalAlertmanager to sender package * move tests from scheduler to router * update alerts router to have all fields private * update scheduler tests to use sender mock	2022-07-19 09:32:54 -04:00
Yuriy Tseretyan	a7509ba18b	Alerting: rule evaluation loop's update channel to provide version (#52170 ) * handler for update message in rule evaluation routine ignores the message if its version greater or equal. * replace messages to update the channel if it is not empty	2022-07-15 12:32:52 -04:00
Yuriy Tseretyan	e5e8747ee9	Alerting: Update state manager to accept reserved labels (#52189 ) * add tests for cache getOrCreate * update ProcessEvalResults to accept extra lables * extract to getRuleExtraLabels * move populating of constant rule labels to extra labels	2022-07-14 15:59:59 -04:00
Yuriy Tseretyan	a6b1090879	Alerting: refactor scheduler and separate notification logic (#48144 ) * Introduce AlertsRouter in the sender package, and move all fields and methods related to notifications out of the scheduler to this router. * Introduce a new interface AlertsSender in the schedule package and replace calls of anonymous function `notify` inside the ruleRoutine to calling methods of that interface. * Rename interface Scheduler in api package to ExternalAlertmanagerProvider, and replace scheduler with AlertRouter as struct that implements the interface.	2022-07-12 15:13:04 -04:00
Matthew Jacobson	28dd413c1d	Alerting: Add config disabled_labels to disable reserved labels (#51832 ) * Alerting: Add config disabled_labels to disable reserved labels [unified_alerting.reserved_labels] disabled_labels * Replace IsGrafanaFolderDisabled with more generic IsReservedLabelDisabled * Simplify SchedulerCfg by including UnifiedAlertingSettings	2022-07-11 12:41:40 -04:00
Yuriy Tseretyan	94e709fdcb	Alerting: Simplify eval.Evaluator interface (#51463 ) * remove ExpressionService from argument list of Evaluator's methods	2022-06-27 17:40:44 -04:00
Yuriy Tseretyan	4b42cd3c1d	Alerting: State manager to use clock (#51219 ) * manager to use clock, to be able to mock real time	2022-06-22 12:18:42 -04:00
Yuriy Tseretyan	157c12211d	Alerting: State manager to use tick time to determine stale states (#50991 ) * use correct stale timestamp * calculate stale using tick time instead of time.now * remove unused dependency on sql store	2022-06-22 00:16:53 +02:00
Matthew Jacobson	5dee2ed24c	Alerting: Add first Grafana reserved label grafana_folder (#50262 ) * Alerting: Add first Grafana reserved label g_label g_label holds the title of the folder container the alert. The intention of this label is to use it as part of the new default notification policy groupBy. * Add nil check on updateRule labels map * Disable gocyclo lint on schedule.ruleRoutine will remove later in a separate refactoring PR to reduce complexity. * Address doc suggestions * Update g_folder for rules in folder when folder title changes * Remove global bus in FolderService * Modify tests to fit new common g_folder label * Add changelog entry * Fix merge conflicts * Switch GrafanaReservedLabelPrefix from `g_` to `grafana_`	2022-06-17 13:10:49 -04:00

1 2

85 Commits