This commit changes staleResultsHandler to create an annotation if the current state is Alerting and the result is being removed from the state cache as it has not been updated since 2x the evaluation interval.
(cherry picked from commit feae959c9d)
* add field for custom slack endpoint
* add test for using custom endpoint
* Update pkg/services/ngalert/notifier/channels/slack.go
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
* specify description for endpoint
* remove brittle string constants
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
(cherry picked from commit f9701d78b1)
Co-authored-by: Nathan Rodman <nathanrodman@gmail.com>
* API: Using go-swagger for extracting OpenAPI specification from source code
* Merge Grafana Alerting spec
* Include enterprise endpoints (if enabled)
* Serve SwaggerUI under feature flag
* Fix building dev docker images
* Configure swaggerUI
* Add missing json tags
Co-authored-by: Ying WANG <ying.wang@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
(cherry picked from commit 35fe58de37)
* Fix evaluation of alert rules for datasources with custom headers
* Fix unit tests
* Fix integration tests
* Evaluator fields should be package private
(cherry picked from commit 9df43abbb5)
Co-authored-by: George Robinson <george.robinson@grafana.com>
* do not store evaluation string in Evaluation.
* reduce number of buckets to store for a single state
(cherry picked from commit 984c95de63)
Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>
* (WIP) send alerts to external, internal, or both alertmanagers
* Modify admin configuration endpoint, update swagger docs
* Integration test for admin config updated
* Code review changes
* Fix alertmanagers choice not changing bug, add unit test
* Add AlertmanagersChoice as enum in swagger, code review changes
* Fix API and tests errors
* Change enum from int to string, use 'SendAlertsTo' instead of 'AlertmanagerChoice' where necessary
* Fix tests to reflect last changes
* Keep senders running when alerts are handled just internally
* Check if any external AM has been discovered before sending alerts, update tests
* remove duplicate data from logs
* update comment
* represent alertmanagers choice as an int instead of a string
* default alertmanagers choice to all alertmanagers, test cases
* update definitions and generate spec
* pass notification service down to the notifiers
* add ns to all notifiers
* remove bus from ngalert notifiers
* use smaller interfaces for notificationservice
* attempt to fix the tests
* remove unused struct field
* simplify notification service mock
* trying to resolve issues in the tests
* make linter happy
* make linter even happier
* linter, you are annoying
* Update API to call the scheduler to remove\update an alert rule. When a rule is updated by a user, the scheduler will remove the currently firing alert instances and clean up the state cache.
* Update evaluation loop in the scheduler to support one more channel that is used to communicate updates to it.
* Improved rule deletion from the internal registry.
* Move alert rule version from the internal registry (structure alertRuleInfo) closer rule evaluation loop (to evaluation task structure), which will make the registry values immutable.
* Extract notification code to a separate function to reuse in update flow.
* Allow customizable googlechat message via optional setting
* Add optional message field in googlechat contact point configurator
* Fix strange error message on send if template fails to fully evaluate
* Elevate template evaluation failure logs to Warn level
* Extract default.title template embed from all channels to shared constant
* Create API test for overwriting invalid alertmanager config
* Avoid requiring alertmanager readiness for config changes
* AlertmanagerSrv depends on functionality rather than concrete types
* Add test for non-ready alertmanagers
* Additional cleanup and polish
* Back out previous integration test changes
* Refactor of tests incorrectly caused a test to become redundant
* Use pre-existing fake secret service
* Drop unused interface
* Test against concrete MultiOrgAlertmanager re-using fake infra from other tests
* Fix linter error
* Empty commit to rerun checks
* change registry.delete to return deleted struct
* use pointer to alertRuleInfo instead copying.
* do not access evaluation channel when routine is stopped
* remove stopCh and use context cancellation
* do not return ctx.Err when channel is cancelled because it cancels all other routines
* make alertRuleInfo fields and functions package private
* refactor datasource loading
* refactor datasource loading
* pass uid
* use dscache in alerting to get DS
* remove expr/translate pacakge
* remove dup injection entry
* fix DS type on metrics endpoint, remove SQL DS lookup inside SSE
* update test and adapter
* comment fix
* Make eval run as admin when getting datasource info
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* fmt and comment
* remove unncessary/redundant code
Co-authored-by: Kyle Brandt <kyle@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
* Fix flaky tests in several notifiers
- Non-mocked time in sensu go tests
- Close server in Slack tests
- Use a mutex for writing responses in the fake slack server
* Remove mutex at the fake slack server
Fixes a panic that would ocurr as we proxy 4xx responses. When this happens and the content type of the response is JSON we try to check if the response has a "message" key. Then, we assume that the key will contain a value of string but we don't take into account that this value can potentially be `null`.
This adds a type assertion check to to this assumption so that we can keep the original JSON body as the response if we're unable to extract an `message`.
* Fix Annotation creation
- Remove validation of panelID, now annotations are created irrespective on whether they're attached to a panel or not.
- Alwasy attach the annotation to an AlertID
* Fix annotation creation
* fix tests
* Alerting: Clear alerting rule evaluation errors after intermittent failures
When an alert transitioned in a way that `alerting -> error -> (alerting|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.
* update AlertingEnabled and UnifiedAlertingSettings.Enabled to be pointers
* add a pseudo migration to fix the AlertingEnabled and UnifiedAlertingSettings.Enabled if the latter is not defined
* update the default configuration file to make default value for both 'enabled' flags be undefined
Misc
* update Migrator to expose DB engine. This is needed for a ualert migration to access the database while the list of migrations is created.
* add more verbose failure when migrations do not match
Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* add value to email template
* add value to default template
* update test string
* test: fix ngalert test suite
* test: run CI
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>