Commit Graph

83 Commits

Author SHA1 Message Date
Joe Blubaugh 56f40bd413 Alerting: Add Go error message to warning log for screenshots. (#49870)
Makes debugging problems with alert screenshotting easier.
2022-05-31 20:56:22 +08:00
Joe Blubaugh 9e8efaa459 Alerting: Add stored screenshot utilities to the channels package. (#49470)
Adds three functions:
`withStoredImages` iterates over a list of models.Alerts, extracting a stored image's data from storage, if available, and executing a user-provided function.
`withStoredImage` does this for an image attached to a specific alert.
`openImage` finds and opens an image file on disk.

Moves `store.Image` to `models.Image`
Simplifies `channels.ImageStore` interface and updates notifiers that use it to use the simpler methods.
Updates all pkg/alert/notifier/channels to use withStoredImage routines.
2022-05-26 13:29:56 +08:00
Joe Blubaugh 1cc034d960 Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259)
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.

Fixes #42606

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-05-23 16:49:49 +08:00
Joe Blubaugh 1d724810de Alerting: State Manager takes screenshots. (#49338)
The State Manager will now take screenshots when an alert instance
switches to an Alerting or Resolved state.

Signed-off-by: Joe Blubaugh joe.blubaugh@grafana.com
2022-05-23 10:53:41 +08:00
Joe Blubaugh 687e79538b Alerting: Add a general screenshot service and alerting-specific image service. (#49293)
This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service.

The screenshot package has the following services, most of which can be composed:

BrowserScreenshotService (Takes screenshots with headless Chrome)
CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService)
NoopScreenshotService (A no-op screenshot service for tests)
SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel)
ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable)
UploadingScreenshotService (A screenshot service that uploads taken screenshots)

The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296

This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.
2022-05-22 22:33:49 +08:00
George Robinson 43358c7248 Alerting: Keep private annotations across evaluations (#49080) 2022-05-18 11:21:18 +02:00
Kristin Laemmert 1df340ff28 backend/services: Move GetDashboard from sqlstore to dashboard service (#48971)
* rename folder to match package name
* backend/sqlstore: move GetDashboard into DashboardService

This is a stepping-stone commit which copies the GetDashboard function - which lets us remove the sqlstore from the interfaces in dashboards - without changing any other callers.
* checkpoint: moving GetDashboard calls into dashboard service
* finish refactoring api tests for dashboardService.GetDashboard
2022-05-17 14:52:22 -04:00
Yuriy Tseretyan 4b417c8f3e use NaN if condition value is nil (#48370) 2022-04-27 15:59:13 -03:00
George Robinson c5547123bc Remove redundant queries in GetAlertRules and GetOrgAlertRules and replace with ListAlertRules (#48108) 2022-04-25 11:42:42 +01:00
Yuriy Tseretyan 884c885289 Alerting: Support OK option for Error state (#47670)
* support OK state for Error
2022-04-13 14:45:29 -04:00
gotjosh cb6124c921 Alerting: Accurately set value for prom-compatible APIs (#47216)
* Alerting: Accurately set value for prom-compatible APIs

Sets the value fields for the prometheus compatible API based on a combination of condition `refID` and the values extracted from the different frames.

* Fix an extra test

* Ensure a consitent ordering

* Address review comments

* address review comments
2022-04-05 19:36:42 +01:00
George Robinson 79769132c0 Alerting: Alert rule should wait For duration when execution error state is Alerting (#47052)
Alerting: Alert rule should wait For duration when execution error state is Alerting
2022-03-31 09:57:58 +01:00
gotjosh 84e5f336fe Alerting: Classic conditions can now display multiple values (#46971)
* Alerting: Extract classic condition values by RefID

* uncapitalise function

* update documentation

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/state/state.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/state/state.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Update pkg/services/ngalert/eval/extract_md.go

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Run prettier

Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2022-03-29 20:33:03 +01:00
gotjosh a338c78ca8 Alerting: Remove internal labels from prometheus compatible API responses (#46548)
* Alerting: Remove internal labels from prometheus compatible API responses

* Appease the linter

* Fix integration tests

* Fix API documentation & linter

* move removal of internal labels to the models
2022-03-16 16:04:19 +00:00
gotjosh 8d4a0a0396 Alerting: Include annotations in prometheus Alert response. (#45970)
* Alerting: Include annotations in prometheus Alert response.

* add tests

* re-order depedencies
2022-03-09 18:20:29 +00:00
George Robinson 789cfc31e3 Alerting: Fix use of > instead of >= when checking the For duration (#46011) 2022-03-01 17:06:42 +00:00
George Robinson feae959c9d Alerting: Create annotation if Firing alert is removed (#45703)
This commit changes staleResultsHandler to create an annotation if the current state is Alerting and the result is being removed from the state cache as it has not been updated since 2x the evaluation interval.
2022-02-24 16:25:28 +00:00
George Robinson 8d57318941 Alerting: Use expanded labels in dashboard annotations (#45726) 2022-02-24 10:58:54 +00:00
Yuriy Tseretyan 02f8e99ca1 Alerting: move fake stores to store package (#45428)
* make fake storage public
* move fake storages to store package
2022-02-15 17:24:39 -05:00
George Robinson 67a3e1d6fd Add context.Context to InstanceStore (#45049) 2022-02-08 13:49:04 +00:00
George Robinson a9399ab3cd Alerting: Add context.Context to RuleStore (#45004)
Alerting: Add context.Context to RuleStore
2022-02-08 08:52:03 +00:00
idafurjes 7a23700e1a Remove unused GetDashboard method (#44890)
* Remove unused GetDashboard method

* Uncomment test

* Fix dashboard service integration test

* Remove comment
2022-02-04 17:21:06 +01:00
Yuriy Tseretyan 984c95de63 Do not store EvaluationString in Evaluation. (#44606)
* do not store evaluation string in Evaluation.
* reduce number of buckets to store for a single state
2022-02-02 19:18:20 +01:00
George Robinson 5e2280ceee Add metrics to ngalert scheduler (#44602)
This pull request adds metrics to the ngalert scheduler so we can see how long it takes to evaluate a tick.
2022-01-31 16:56:43 +00:00
idafurjes 56c3875bb9 Chore: Remove context.TODO (#43458)
* Remove context.TODO() from services

* Fix live test
2021-12-28 10:26:18 +01:00
George Robinson c932dc959c Alerting: Add Ref ID to DatasourceNoData and DatasourceError alerts (#42630) 2021-12-03 09:55:16 +00:00
gotjosh 357e9ed1ea Alerting: Fix Annotation Creation when the alerting state changes (#42479)
* Fix Annotation creation
- Remove validation of panelID, now annotations are created irrespective on whether they're attached to a panel or not.
- Alwasy attach the annotation to an AlertID

* Fix annotation creation

* fix tests
2021-12-01 11:04:54 +00:00
Santiago a21d1e50f1 avoid template execution errors on missing values (#41617) 2021-11-29 15:26:51 -03:00
gotjosh dd5a2e5128 Alerting: Clear alerting rule evaluation errors after intermittent failures (#42386)
* Alerting: Clear alerting rule evaluation errors after intermittent failures

When an alert transitioned in a way that `alerting -> error -> (alerting|nodata)`, the error provided by the `error` state would never be cleared thus the API and UI would show the health as an error.
2021-11-26 17:58:19 +00:00
George Robinson 1b26d4d88e Alerting: Create DatasourceError alert if evaluation returns error (#41869)
* Alerting: Create DatasourceError alert if evaluation returns error

* Alerting: Add docs for DatasourceError alert

* Alerting: Fix DatasourceError alert does not have dashboard_uid label

* Alerting: Add break when datasource_uid found

* Alerting: Update TestProcessEvalResults
2021-11-25 11:46:47 +01:00
Santiago a45e4ff73f graphLink and tableLink template functions (#41369)
* graphLink and tableLink functions, docs updated

* Code review changes

* extract query struct outside of graphLink and tableLink functions

* Fix docs

* Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md

Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>

* Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md

Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>

* Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Fix linting errors

Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2021-11-10 10:36:03 -03:00
Yuriy Tseretyan 610643a668 Alerting: Special alert instance if rule is in state NoData (#40540)
* do not suppress NoData state
* extract conversion of state to postable alert + tests
* create a special alert instance if nodata 
* use NoData when converting from Keep Last State instead of Alerting
* add silence during migration if NoData is mapped to KeepLastState.
2021-11-04 16:42:34 -04:00
Yuriy Tseretyan 1b5b747885 Alerting: Additional Tests for State Manager (#41291)
* rename fakeInstanceStore to FakeInstanceStore

* update test for state manager to initialize instance store with FakeInstanceStore
2021-11-04 15:15:56 -04:00
Yuriy Tseretyan 5836def6c2 Alerting: declare constants for __dashboardUid__ and __panelId__ literals (#39976) 2021-10-07 17:30:06 -04:00
idafurjes 2759b16ef5 Chore: Add context for dashboards (#39844)
* Add context for dashboards

* Remove GetDashboardCtx

* Remove ctx.TODO
2021-10-05 13:26:24 +02:00
Santiago 562cd9e44e Alerting template functions (#39261)
* Alerting: (wip) add template funcs

* Alerting: (wip) numeric template functions

* Alerting: (wip) template functions

* Test for the "args" function

* Alerting: (wip) Documentation for template functions

* Alerting: template functions - refactor

* code review changes

* disable linter error

* Use Prometheus implementation of TemplateExpander

* Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* change templateCaptureValue to support using template functions

* Update pkg/services/ngalert/state/template.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>

* Test and documentation added for reReplaceAll template function

* complete missing functions, documentation and tests

* Use the alert instance's evaluation time for expanding the template

* strvalue graphlink and tablelink functions

* delete duplicate test

* make strvalue return an empty string

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2021-10-04 15:04:37 -03:00
Sofia Papagiannaki 012d4f0905 Alerting: Remove ngalert feature toggle and introduce two new settings for enabling Grafana 8 alerts and disabling them for specific organisations (#38746)
* Remove `ngalert` feature toggle

* Update frontend

Remove all references of ngalert feature toggle

* Update docs

* Disable unified alerting for specific orgs

* Add backend tests

* Apply suggestions from code review

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Disabled unified alerting by default

* Ensure backward compatibility with old ngalert feature toggle

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-29 16:16:40 +02:00
George Robinson 27609dc2c5 Fix alerts with evaluation interval more than 30 seconds resolving in Alertmanager (#39513) 2021-09-22 14:55:46 +01:00
gotjosh fcbcfd232b Alerting: Move spammy log line to debug in the state manager (#39410) 2021-09-20 16:05:55 +01:00
Santiago c3cf95f383 Revert "Alerting: add template funcs (#38404)" (#39258)
This reverts commit d6fb0181fb.
2021-09-15 19:47:22 -03:00
Santiago d6fb0181fb Alerting: add template funcs (#38404)
* Alerting: (wip) add template funcs

* Alerting: (wip) numeric template functions

* Alerting: (wip) template functions

* Test for the "args" function

* Alerting: (wip) Documentation for template functions

* Alerting: template functions - refactor

* code review changes

* disable linter error

* Use Prometheus implementation of TemplateExpander

* Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2021-09-15 18:48:29 -03:00
Marcus Efraimsson fa9857499b Chore: GetDashboardQuery should be dispatched using DispatchCtx (#36877)
* Chore: GetDashboardQuery should be dispatched using DispatchCtx

* Fix after merge

* Changes after review

* Various fixes

* Use GetDashboardCtx function instead of GetDashboard
2021-09-14 16:08:04 +02:00
gotjosh a2f4344bf2 Alerting: Refactor & fix unified alerting metrics structure (#39151)
* Alerting: Refactor & fix unified alerting metrics structure

Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.

This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
2021-09-14 12:55:01 +01:00
George Robinson 5caf6cb369 Change templateCaptureValue to support using template functions (#38766)
* Change templateCaptureValue to support using template functions

This commit changes templateCaptureValue to use float64 for the value
instead of *float64. This change means that annotations and labels can
use the float64 value with functions such as printf and avoid having to
check for nil. It also means that absent values are now printed as 0.

* Use math.NaN() instead of 0 for absent value
2021-09-08 10:46:15 +01:00
gotjosh dd502f22eb Alerting: Fix alert flapping in the internal alertmanager (#38648)
* Alerting: Fix alert flapping in the alertmanager

fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager.
Mostly due to a combination of `EndsAt` and resend delay.

The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard
back from the alert generation system.

Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself,
and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing.

This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.
2021-09-02 16:22:59 +01:00
Arve Knudsen 78596a6756 Migrate to Wire for dependency injection (#32289)
Fixes #30144

Co-authored-by: dsotirakis <sotirakis.dim@gmail.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com>
Co-authored-by: spinillos <selenepinillos@gmail.com>
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Co-authored-by: Leonard Gram <leo@xlson.com>
2021-08-25 15:11:22 +02:00
Kyle Brandt aef67994a1 Annotations: Fix alerting annotation coloring (#37412)
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
2021-08-12 09:37:54 -07:00
Kyle Brandt aa904a5a04 NGAlert: Send resolve signal to alertmanager on alerting -> Normal (#37363) 2021-07-29 20:29:17 +02:00
David Parrott b5f464412d Alerting: automatically remove stale alerting states (#36767)
* initial attempt at automatic removal of stale states

* test case, need espected states

* finish unit test

* PR feedback

* still multiply by time.second

* pr feedback
2021-07-26 18:12:04 +02:00
George Robinson 2f4c893cf3 Expand the value string in annotations and labels of alerts (#37051)
This commit makes it possible to use the value string in
annotations and labels for alerts with "{{ $value }}"
2021-07-22 15:20:44 +01:00