Commit Graph

353 Commits

Author SHA1 Message Date
Grot (@grafanabot) 787940f32e Rename evalCtx to avoid confusion with context.Context (#45144) (#45893)
(cherry picked from commit 2ca79ca0c7)

Co-authored-by: George Robinson <george.robinson@grafana.com>
2022-02-25 11:20:35 +01:00
George Robinson cdf8bab022 Alerting: Create annotation if Firing alert is removed (#45703) (#45865)
This commit changes staleResultsHandler to create an annotation if the current state is Alerting and the result is being removed from the state cache as it has not been updated since 2x the evaluation interval.

(cherry picked from commit feae959c9d)
2022-02-25 07:38:31 +01:00
George Robinson 37b6fc7067 Alerting: Use expanded labels in dashboard annotations (#45726) (#45858) 2022-02-24 17:55:37 +00:00
Grot (@grafanabot) f443777309 Alerting: add field for custom slack endpoint (#45751) (#45812)
* add field for custom slack endpoint

* add test for using custom endpoint

* Update pkg/services/ngalert/notifier/channels/slack.go

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>

* specify description for endpoint

* remove brittle string constants

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
(cherry picked from commit f9701d78b1)

Co-authored-by: Nathan Rodman <nathanrodman@gmail.com>
2022-02-24 11:05:44 +01:00
Grot (@grafanabot) daf7c5fe93 Add context.Context to AlertingStore (#45069) (#45121)
(cherry picked from commit 4e3a72fc2a)

Co-authored-by: George Robinson <george.robinson@grafana.com>
2022-02-09 09:40:05 +00:00
Grot (@grafanabot) 62a3b5a94d Add context.Context to InstanceStore (#45049) (#45065)
(cherry picked from commit 67a3e1d6fd)

Co-authored-by: George Robinson <george.robinson@grafana.com>
2022-02-08 15:08:36 +01:00
George Robinson 8291389f6c Alerting: Add context.Context to RuleStore (#45004) (#45046) 2022-02-08 13:48:15 +00:00
Grot (@grafanabot) 08ad99c36e API: Extract OpenAPI specification from source code using go-swagger (#40528) (#45061)
* API: Using go-swagger for extracting OpenAPI specification from source code

* Merge Grafana Alerting spec

* Include enterprise endpoints (if enabled)

* Serve SwaggerUI under feature flag

* Fix building dev docker images

* Configure swaggerUI

* Add missing json tags

Co-authored-by: Ying WANG <ying.wang@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
(cherry picked from commit 35fe58de37)
2022-02-08 13:52:05 +01:00
Grot (@grafanabot) ce185ce6a3 Fix evaluation of alert rules for datasources with custom headers (#44862) (#44912)
* Fix evaluation of alert rules for datasources with custom headers

* Fix unit tests

* Fix integration tests

* Evaluator fields should be package private

(cherry picked from commit 9df43abbb5)

Co-authored-by: George Robinson <george.robinson@grafana.com>
2022-02-04 18:32:57 +01:00
Grot (@grafanabot) e031568aa4 Do not store EvaluationString in Evaluation. (#44606) (#44795)
* do not store evaluation string in Evaluation.
* reduce number of buckets to store for a single state

(cherry picked from commit 984c95de63)

Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>
2022-02-02 19:29:13 +01:00
George Robinson 924deda589 Fix Discord Webhook URL for invalid template (#44763)
This commit fixes an issue where an invalid template for Discord would change the Webhook URL to "" and cause "unsupported protocol scheme" errors.
2022-02-02 14:28:41 +01:00
Santiago 04d93751b8 Alerting: send alerts to external, internal, or both alertmanagers (#40341)
* (WIP) send alerts to external, internal, or both alertmanagers

* Modify admin configuration endpoint, update swagger docs

* Integration test for admin config updated

* Code review changes

* Fix alertmanagers choice not changing bug, add unit test

* Add AlertmanagersChoice as enum in swagger, code review changes

* Fix API and tests errors

* Change enum from int to string, use 'SendAlertsTo' instead of 'AlertmanagerChoice' where necessary

* Fix tests to reflect last changes

* Keep senders running when alerts are handled just internally

* Check if any external AM has been discovered before sending alerts, update tests

* remove duplicate data from logs

* update comment

* represent alertmanagers choice as an int instead of a string

* default alertmanagers choice to all alertmanagers, test cases

* update definitions and generate spec
2022-02-01 20:36:55 -03:00
George Robinson 5e2280ceee Add metrics to ngalert scheduler (#44602)
This pull request adds metrics to the ngalert scheduler so we can see how long it takes to evaluate a tick.
2022-01-31 16:56:43 +00:00
idafurjes 12420260ef Remove bus from org invite api (#44530)
* Remove bus from org invite api

* Fix lint

* Remove comment
2022-01-31 17:24:52 +01:00
Serge Zaitsev 84a5910e56 Chore: Remove bus from ngalert (#44465)
* pass notification service down to the notifiers

* add ns to all notifiers

* remove bus from ngalert notifiers

* use smaller interfaces for notificationservice

* attempt to fix the tests

* remove unused struct field

* simplify notification service mock

* trying to resolve issues in the tests

* make linter happy

* make linter even happier

* linter, you are annoying
2022-01-26 16:42:40 +01:00
Jean-Philippe Quéméner 8ee3f59cd4 Alerting: recognize Cortex datasources correctly in the frontend (#44316)
* Alerting: always use msg field for user facing errors

* fix: revert front-end Cortex detection

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
2022-01-21 15:44:11 +01:00
ying-jeanne 7422789ec7 Remove Macaron ParamsInt64 function from code base (#43810)
* draft commit

* change all calls

* Compilation errors
2022-01-15 00:55:57 +08:00
Yuriy Tseretyan ed5c664e4a Alerting: Stop firing of alert when it is updated (#39975)
* Update API to call the scheduler to remove\update an alert rule. When a rule is updated by a user, the scheduler will remove the currently firing alert instances and clean up the state cache. 
* Update evaluation loop in the scheduler to support one more channel that is used to communicate updates to it.
* Improved rule deletion from the internal registry. 
* Move alert rule version from the internal registry (structure alertRuleInfo) closer rule evaluation loop (to evaluation task structure), which will make the registry values immutable.
* Extract notification code to a separate function to reuse in update flow.
2022-01-11 11:39:34 -05:00
Yuriy Tseretyan ea478dec22 Alerting: Remove bridge between log15 and go-kit logger (#43769)
* remove bridge between log15 and go-kit logger.

* fix tests
2022-01-07 09:40:09 +01:00
ying-jeanne a8eef45a44 Logger migration from log15 to gokit/log (#41636)
* migrate log15 to gokit/log

* fix console log

* update some unittest

* fix all unittest

* fix the build

* Update pkg/infra/log/log.go

Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>

* general type vector

* correct the level key

Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>
2022-01-06 22:28:05 +08:00
Alexander Weaver fd583a0e3b Alerting: Allow customization of Google chat message (#43568)
* Allow customizable googlechat message via optional setting

* Add optional message field in googlechat contact point configurator

* Fix strange error message on send if template fails to fully evaluate

* Elevate template evaluation failure logs to Warn level

* Extract default.title template embed from all channels to shared constant
2022-01-05 09:47:08 -06:00
idafurjes 8e6d6af744 Rename DispatchCtx to Dispatch (#43563) 2021-12-28 17:36:22 +01:00
idafurjes 7936c4c522 Rename AddHandlerCtx to AddHandler (#43557) 2021-12-28 16:08:07 +01:00
idafurjes 56c3875bb9 Chore: Remove context.TODO (#43458)
* Remove context.TODO() from services

* Fix live test
2021-12-28 10:26:18 +01:00
Alexander Weaver 56b3dc5445 Alerting: Allow configuration of non-ready alertmanagers (#43063)
* Create API test for overwriting invalid alertmanager config

* Avoid requiring alertmanager readiness for config changes

* AlertmanagerSrv depends on functionality rather than concrete types

* Add test for non-ready alertmanagers

* Additional cleanup and polish

* Back out previous integration test changes

* Refactor of tests incorrectly caused a test to become redundant

* Use pre-existing fake secret service

* Drop unused interface

* Test against concrete MultiOrgAlertmanager re-using fake infra from other tests

* Fix linter error

* Empty commit to rerun checks
2021-12-27 17:01:17 -06:00
Alexander Weaver 9abdaf251f Alerting: Fix global state sensitivity in notifier channel tests (#43508) 2021-12-27 11:58:17 -06:00
idafurjes b8852ef6a3 Chore: Remove context.TODO() (#43409)
* Remove context.TODO() from services

* Fix live test

* Remove context.TODO
2021-12-22 11:02:42 +01:00
Jean-Philippe Quéméner ffc72aa255 Alerting: fix gosec warning that is not valid (#43425) 2021-12-21 19:47:47 +01:00
idafurjes ff3cf94b56 Chore: Remove context.TODO() from services (#42555)
* Remove context.TODO() from services

* Fix live test
2021-12-20 17:05:33 +01:00
Yuriy Tseretyan 1a762083d7 Alerting: make alert rule routine evaluation control be thread-safe (#41220)
* change registry.delete to return deleted struct
* use pointer to alertRuleInfo instead copying.
* do not access evaluation channel when routine is stopped
* remove stopCh and use context cancellation
* do not return ctx.Err when channel is cancelled because it cancels all other routines
* make alertRuleInfo fields and functions package private
2021-12-16 14:52:47 -05:00
Ryan McKinley 2754e4fdf0 Expressions: use datasource model from the query (#41376)
* refactor datasource loading

* refactor datasource loading

* pass uid

* use dscache in alerting to get DS

* remove expr/translate pacakge

* remove dup injection entry

* fix DS type on metrics endpoint, remove SQL DS lookup inside SSE

* update test and adapter

* comment fix

* Make eval run as admin when getting datasource info

Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>

* fmt and comment

* remove unncessary/redundant code

Co-authored-by: Kyle Brandt <kyle@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2021-12-16 13:51:46 -03:00
Jean-Philippe Quéméner b605340668 Alerting: log errors happening in the API on server side (#43192)
* Alerting: log errors happening in the API on server side

* adapt tests to reflect changed payload
2021-12-16 13:33:10 +01:00
Gilles De Mey bb3b5c10e7 Alerting: fix WeCom channel notifier test assertion (#43173) 2021-12-15 19:45:12 +01:00
Gilles De Mey cbbbb505b4 Alerting: use HTML-safe characters for the default template (#43148) 2021-12-15 17:57:08 +01:00
smallpath aec14cba42 Alerting: Support WeCom as a contact point type (#40975)
* add wecom notifier

* fix backend lint

* fix alerting channel test

* update wecom doc

* update notifiers

* update wecom notifier test

* Apply suggestions from code review

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* unify wecom alerting

* fix backend lint

* fix front lint

* fix wecom test

* update docs

* Update pkg/services/ngalert/notifier/channels/wecom.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Update docs/sources/alerting/old-alerting/notifications.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* remove old wecom notifier

* remove old notifier doc

* fix backend test

* Update docs/sources/alerting/unified-alerting/contact-points.md

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* fix doc style

Co-authored-by: gotjosh <josue.abreu@gmail.com>
2021-12-15 16:42:03 +00:00
Yuriy Tseretyan 1db9b1e6a9 Improve bridge for Alertmanager logger (#42958)
* Implement go-kit/log.Logger for internal logger.
2021-12-13 09:41:53 -05:00
Sofia Papagiannaki c6483cd8ed Alerting: Refactor API handlers to use web.Bind (#42600)
* Alerting: Refactor API handlers to use web.Bind

* lint
2021-12-13 09:22:57 +01:00
gotjosh bdab1d1f1f Fix flaky tests in several notifiers (#42668)
* Fix flaky tests in several notifiers

- Non-mocked time in sensu go tests
- Close server in Slack tests
- Use a mutex for writing responses in the fake slack server

* Remove mutex at the fake slack server
2021-12-03 12:34:31 +00:00
George Robinson c932dc959c Alerting: Add Ref ID to DatasourceNoData and DatasourceError alerts (#42630) 2021-12-03 09:55:16 +00:00
gotjosh 5b64c4f684 Alerting: Fix panic while proxying 4xx responses of requests to cortex/loki (#42570)
Fixes a panic that would ocurr as we proxy 4xx responses. When this happens and the content type of the response is JSON we try to check if the response has a "message" key. Then, we assume that the key will contain a value of string but we don't take into account that this value can potentially be `null`.

This adds a type assertion check to to this assumption so that we can keep the original JSON body as the response if we're unable to extract an `message`.
2021-12-01 13:53:29 +00:00
gotjosh 357e9ed1ea Alerting: Fix Annotation Creation when the alerting state changes (#42479)
* Fix Annotation creation
- Remove validation of panelID, now annotations are created irrespective on whether they're attached to a panel or not.
- Alwasy attach the annotation to an AlertID

* Fix annotation creation

* fix tests
2021-12-01 11:04:54 +00:00
Sofia Papagiannaki 9c7b52fd36 Alerting: Fix API specification (#42282)
* Alerting: Fix API specification
2021-11-30 20:55:54 +01:00
Santiago a21d1e50f1 avoid template execution errors on missing values (#41617) 2021-11-29 15:26:51 -03:00
gotjosh dd5a2e5128 Alerting: Clear alerting rule evaluation errors after intermittent failures (#42386)
* Alerting: Clear alerting rule evaluation errors after intermittent failures

When an alert transitioned in a way that `alerting -> error -> (alerting|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.
2021-11-26 17:58:19 +00:00
George Robinson 1b26d4d88e Alerting: Create DatasourceError alert if evaluation returns error (#41869)
* Alerting: Create DatasourceError alert if evaluation returns error

* Alerting: Add docs for DatasourceError alert

* Alerting: Fix DatasourceError alert does not have dashboard_uid label

* Alerting: Add break when datasource_uid found

* Alerting: Update TestProcessEvalResults
2021-11-25 11:46:47 +01:00
George Robinson 1e5b0e64ac Alerting: Add comments to ScheduleService interface (#42228) 2021-11-25 10:12:04 +00:00
Armand Grillet 6523486122 Alerting: Make Unified Alerting enabled by default for those who do not use legacy alerting (#42200)
* update AlertingEnabled and UnifiedAlertingSettings.Enabled to be pointers
* add a pseudo migration to fix the AlertingEnabled and UnifiedAlertingSettings.Enabled if the latter is not defined
* update the default configuration file to make default value for both 'enabled' flags be undefined

Misc
* update Migrator to expose DB engine. This is needed for a ualert migration to access the database while the list of migrations is created.
* add more verbose failure when migrations do not match

Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
2021-11-24 14:56:07 -05:00
Jean-Philippe Quéméner cec2d965ec Alerting: validate mute timings in the alertmanager configuration (#42125)
* Alerting: check for uniqueness of mutetime names

* add some testing

* add name validation

* add root route validation

* add tests for validation

* add check for root route mute_time_intervals

* add duplicate test

* remove useless yaml test

* refactor table test
2021-11-23 16:25:20 +01:00
George Robinson 9122e7f647 Alerting: Check for nil model.Settings and models.SecureSettings (#37738) 2021-11-22 11:56:18 +00:00
Peter Holmberg 97978a7c02 Alerting: Add value to notifier template (#41951)
* add value to email template

* add value to default template

* update test string

* test: fix ngalert test suite

* test: run CI

Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
2021-11-22 08:45:44 +01:00