Commit Graph

271 Commits

Author SHA1 Message Date
Grot (@grafanabot) 3110e11330 fix: check lotex endpoint URL (#41429) (#41585)
* fix: check lotex endpoint URL

* Add validation for data sources URLs

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
(cherry picked from commit dbe78e47b1)

Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
2021-11-11 08:19:44 +01:00
Grot (@grafanabot) 5f5e962b2d Alerting: Special alert instance if rule is in state NoData (#40540) (#41525)
* do not suppress NoData state
* extract conversion of state to postable alert + tests
* create a special alert instance if nodata
* use NoData when converting from Keep Last State instead of Alerting
* add silence during migration if NoData is mapped to KeepLastState.

(cherry picked from commit 610643a668)

Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
2021-11-10 12:15:23 +01:00
Grot (@grafanabot) d80013022e Alerting: Parse App URL only once (#39855) (#41522)
(cherry picked from commit 2b4e51f478)

Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
2021-11-10 10:44:35 +00:00
Grot (@grafanabot) 3b8be57b4f Alerting: fix bug where user is able to access rules from namespaces user is not part of (#41403) (#41406)
* Add fix
* Add tests
(cherry picked from commit 6220872633)

Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: gotjosh <josue@grafana.com>
2021-11-08 18:57:51 +01:00
gotjosh abd1050f98 [8.2.x] Alerting: Validate contact point configuration during migration to Unified Alerting (#40717) (#40801)
* Alerting: Validate contact point configuration during migration to Unified Alerting (#40717)

* Alerting: Validate contact point configuration during the migration

This minimises the chances of generating broken configuration as part of the migration. Originally, we wanted to generate it and not produce a hard stop in Grafana but this strategy has the chance to avoid delivering notifications for our users.

We now think it's better to hard stop the migration and let the user take care of resolving the configuration manually.

(cherry picked from commit 74fb491b6a)
2021-10-22 12:16:36 +01:00
Armand Grillet f3b8c1a89d Fix panic when Slack API sends unexpected response (#40721) (#40741)
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2021-10-21 10:49:07 +03:00
Grot (@grafanabot) d67b6e23ea Alerting: delete orphaned records from kvstore (#40337) (#40450)
(cherry picked from commit 153c356993)

Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
2021-10-14 12:32:57 +02:00
Grot (@grafanabot) a652ffa9b4 [v8.2.x] Alerting: Fixes a bug when trying to sync broken alertmanager config (#40342)
* Alerting: Fixes a bug when trying to sync broken alertmanager config (#40338)

* Alerting: Fixes a bug when trying to sync broken alertmanager config

Broken alertmanager configuration has the potential to be introduced as part of a migration e.g. due to incompatible data between what grafana accepts and what the Alertmanager expects. When this happens, we expect an eventually consistent behaviour where we'll keep trying to apply the configuration until it works.

As part of change in https://github.com/grafana/grafana/pull/39237 we introduced a regression that modified this behaviour and instead tried to create a new Alertmanager for that organization everytime, which eventually ended up in a panic due to a duplicate metrics being registered.

This PR fixes that and introduces a test to catch further regressions.

* Remove disable orgs

(cherry picked from commit 48d73cb148)

* remove decryptFn that is not known in 8.2 branch

Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
2021-10-12 15:26:00 -04:00
George Robinson e775fba146 Alerting: Fix error message in ngalert when notifications cannot be sent to alertmanager (#40158) (#40317)
(cherry picked from commit 8318e45452)
2021-10-12 15:28:45 +01:00
Grot (@grafanabot) 128981fb21 Alerting: cleanup alert resources on org removal (#39938) (#40321)
(cherry picked from commit e1dfec49f9)

Co-authored-by: Jean-Philippe Quéméner <JohnnyQQQQ@users.noreply.github.com>
2021-10-12 12:46:03 +02:00
George Robinson 29638a485b Panel ID annotation cannot be set without Dashboard UID (#40019) (#40063)
(cherry picked from commit 935bd34a30)
2021-10-06 12:17:29 +01:00
George Robinson 265714866b You can now get alert rules for a dashboard or a panel using /api/v1/rules endpoints. (#39476) (#40008)
Get alert rules for a dashboard and panel in /api/v1/rules

(cherry picked from commit 2a4c1b1aa6)
2021-10-06 11:38:26 +01:00
Domas b5521a9eaf Alerting: Alertmanager datasource support for upstream Prometheus AM implementation (#39775) (#39989)
(cherry picked from commit a1d4be0700)
2021-10-05 12:13:11 +03:00
Grot (@grafanabot) 32481e75c5 Alerting: make /api/prometheus/grafana/api/v1/rules faster (#39660) (#39986)
(cherry picked from commit e343b62665)

Co-authored-by: Domas <domas.lapinskas@grafana.com>
2021-10-05 11:11:15 +03:00
Kyle Brandt 4be5dd6391 Alerting: Allow more characters in label names so notifications are sent (#38629) (#39965)
Remove validation for labels to be accepted in the Alertmanager, This helps with datasources that produce non-compatible labels.

Adds an "object_matchers" to alert manager routers so we can support labels names with extended characters beyond prometheus/openmetrics. It only does this for the internal Grafana managed Alert Manager.

This requires a change to alert manager, so for now we use grafana/alertmanager which is a slight fork, with the intention of going back to upstream.

The frontend handles the migration of "matchers" -> "object_matchers" when the route is edited and saved. Once this is done, downgrades will not work old versions will not recognize the "object_matchers".

Co-authored-by: Kyle Brandt <kyle@grafana.com>
Co-authored-by: Nathan Rodman <nathanrodman@gmail.com>
(cherry picked from commit 6572017ec7)

cleanup

Co-authored-by: gotjosh <josue@grafana.com>
2021-10-04 09:07:04 -07:00
Sofia Papagiannaki 368742ab04 Alerting: Remove ngalert feature toggle and introduce two new settings for enabling Grafana 8 alerts and disabling them for specific organisations (#38746) (#39793)
* Remove `ngalert` feature toggle

* Update frontend

Remove all references of ngalert feature toggle

* Update docs

* Disable unified alerting for specific orgs

* Add backend tests

* Apply suggestions from code review

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Disabled unified alerting by default

* Ensure backward compatibility with old ngalert feature toggle

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
(cherry picked from commit 012d4f0905)
2021-09-29 11:21:25 -04:00
Grot (@grafanabot) 2e86425ed9 Alerting: Move alertmanager default config to UnifiedAlertingSettings (#39597) (#39714)
(cherry picked from commit 05eb30e323)

Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Sofia Papagiannaki <sofia@grafana.com>
2021-09-28 11:01:18 -04:00
Grot (@grafanabot) 5aaef25a33 Alerting: Optimization of fetching data in multiorg alertmanager (#39237) (#39720)
* Add method GetAllLatestAlertmanagerConfiguration to DBStore
* add method ApplyConfig to AlertManager
* update multiorg alert manager to load all alertmanager configs at once

(cherry picked from commit 1910d85ae0)

Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
2021-09-28 09:05:22 -04:00
Grot (@grafanabot) 35dad9c267 Provide reader to alertmanager silence instead of file path (#39305) (#39721)
(cherry picked from commit e1aae0549e)

Co-authored-by: Yuriy Tseretyan <tceretian@gmail.com>
2021-09-28 09:04:22 -04:00
Sofia Papagiannaki c89e1236fe Alerting: tune rule evaluation via configuration (#35623) (#39712)
* Alerting: Configure max evaluation retries

* Alerting: Enforce minimum rule evaluation interval

* Alerting: Disable rule evaluation from configuration

* Update docs

* Alerting: Configure rule evaluation timeout

* Move options on unified_alerting config section

* Apply suggestions from code review

Co-authored-by: gotjosh <josue@grafana.com>
(cherry picked from commit f6f3a54742)
2021-09-28 14:58:31 +03:00
Grot (@grafanabot) 06cb288848 Alerting: Move spammy log line to debug in the state manager (#39410) (#39434)
(cherry picked from commit fcbcfd232b)

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-27 15:36:52 +01:00
Grot (@grafanabot) 4ffa29d959 Fix alerts with evaluation interval more than 30 seconds resolving in Alertmanager (#39513) (#39523)
(cherry picked from commit 27609dc2c5)

Co-authored-by: George Robinson <george.robinson@grafana.com>
2021-09-23 13:06:27 +02:00
Grot (@grafanabot) 10c44b4f8d Alerting: Move the unified alerting settings to its own struct (#39350) (#39400)
(cherry picked from commit 2ad82b9354)

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-20 11:02:07 +01:00
Grot (@grafanabot) 6f52226c66 Alerting: Metrics should have the label org instead of user (#39353) (#39365)
An user within Grafana has a completely different meaning. Multi-tenancy is done via Organizations as a top-level concept.

(cherry picked from commit 35e5bfce40)

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-17 18:52:54 +02:00
Grot (@grafanabot) 8a369feb63 Alerting: Support Unified Alerting with Grafana HA (#37920) (#39342)
* Alerting: Support Unified Alerting in Grafana's HA mode.

(cherry picked from commit 7db97097c9)

Co-authored-by: gotjosh <josue@grafana.com>
2021-09-17 13:23:51 +01:00
Santiago c3cf95f383 Revert "Alerting: add template funcs (#38404)" (#39258)
This reverts commit d6fb0181fb.
2021-09-15 19:47:22 -03:00
Santiago 0d2e68537c Alerting: Cleanup template, silence and notification files created du… (#39007)
* Alerting: Cleanup template, silence and notification files created during tests

* Create tempdir for testing, delete afterwards and check for errors

* Refactoring error checks

* Update docs/sources/enterprise/access-control/fine-grained-access-control-references.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Update docs/sources/administration/configuration.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

* Update docs/sources/enterprise/access-control/fine-grained-access-control-references.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2021-09-15 18:48:52 -03:00
Santiago d6fb0181fb Alerting: add template funcs (#38404)
* Alerting: (wip) add template funcs

* Alerting: (wip) numeric template functions

* Alerting: (wip) template functions

* Test for the "args" function

* Alerting: (wip) Documentation for template functions

* Alerting: template functions - refactor

* code review changes

* disable linter error

* Use Prometheus implementation of TemplateExpander

* Update docs/sources/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule.md

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>

Co-authored-by: achatterjee-grafana <70489351+achatterjee-grafana@users.noreply.github.com>
2021-09-15 18:48:29 -03:00
Serge Zaitsev 063160aae2 Chore: pass url parameters through context.Context (#38826)
* pass url parameters through context.Context

* fix url param names without colon prefix

* change context params to vars

* replace url vars in tests using new api

* rename vars to params

* add some comments

* rename seturlvars to seturlparams
2021-09-14 18:34:56 +02:00
Marcus Efraimsson fa9857499b Chore: GetDashboardQuery should be dispatched using DispatchCtx (#36877)
* Chore: GetDashboardQuery should be dispatched using DispatchCtx

* Fix after merge

* Changes after review

* Various fixes

* Use GetDashboardCtx function instead of GetDashboard
2021-09-14 16:08:04 +02:00
gotjosh 2b1d3d27e4 Alerting: Fix bug not creating filepath for silences/nflog if it does not exist (#39174)
We created this filepath just as we're about persist the templates - with the latest change, we now need to create it sooner.
2021-09-14 14:40:59 +01:00
gotjosh a2f4344bf2 Alerting: Refactor & fix unified alerting metrics structure (#39151)
* Alerting: Refactor & fix unified alerting metrics structure

Fixes and refactors the metrics structure we have for the ngalert service. Now, each component has its own metric struct that includes the JUST the metrics it uses. Additionally, I have fixed the configuration metrics and added new metrics to determine if we have discovered and started all the necessary configurations of an instance.

This allows us to alert on `grafana_alerting_discovered_configurations - grafana_alerting_active_configurations != 0` to know whether an alertmanager instance did not start successfully.
2021-09-14 12:55:01 +01:00
Marcus Efraimsson 2cc0788187 Chore: Disable backend test for now since it adds 10 minutes extra in CI (#39150)
Ref #38586
2021-09-13 19:37:26 +02:00
Sofia Papagiannaki 7af329f385 Alerting: Fix API specification (#38753)
* Alerting: Fix API spec

* Add missing status codes
2021-09-10 12:46:02 +03:00
gotjosh 39a3bb8a1c Alerting: Persist notification log and silences to the database (#39005)
* Alerting: Persist notification log and silences to the database

This removes the dependency of having persistent disk to run grafana alerting. Instead of regularly flushing the notification log and silences to disk we now flush the binary content of those files to the database encoded as a base64 string.
2021-09-09 17:25:22 +01:00
Todd Treece 6e667cacee Alerting: Skip query cache for alert queries (#39010) 2021-09-09 16:16:05 +02:00
Yuriy Tseretyan 6c2884ac37 Alerting: Fix notifier tests to close the temp file (#38992) 2021-09-09 09:56:42 -04:00
George Robinson 5caf6cb369 Change templateCaptureValue to support using template functions (#38766)
* Change templateCaptureValue to support using template functions

This commit changes templateCaptureValue to use float64 for the value
instead of *float64. This change means that annotations and labels can
use the float64 value with functions such as printf and avoid having to
check for nil. It also means that absent values are now printed as 0.

* Use math.NaN() instead of 0 for absent value
2021-09-08 10:46:15 +01:00
Sofia Papagiannaki c19d65b1ad Alerting: some fixes for updating rules via the API (#38764)
* Alerting: Allow updating rules if quota are exceeded

* Check for rule UID uniqueness in POST request
2021-09-02 19:38:42 +03:00
gotjosh dd502f22eb Alerting: Fix alert flapping in the internal alertmanager (#38648)
* Alerting: Fix alert flapping in the alertmanager

fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager.
Mostly due to a combination of `EndsAt` and resend delay.

The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard
back from the alert generation system.

Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself,
and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing.

This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.
2021-09-02 16:22:59 +01:00
Serge Zaitsev 643c7fa0cb Chore: update all +build statements (#38782) 2021-09-01 17:38:56 +03:00
Serge Zaitsev c3ab2fdeb7 Macaron: remove custom Request type (#37874)
* remove macaron.Request, use http.Request instead

* remove com dependency from bindings module

* fix another c.Req.Request
2021-09-01 11:18:30 +02:00
Arve Knudsen 78596a6756 Migrate to Wire for dependency injection (#32289)
Fixes #30144

Co-authored-by: dsotirakis <sotirakis.dim@gmail.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Ida Furjesova <ida.furjesova@grafana.com>
Co-authored-by: Jack Westbrook <jack.westbrook@gmail.com>
Co-authored-by: Will Browne <wbrowne@users.noreply.github.com>
Co-authored-by: Leon Sorokin <leeoniya@gmail.com>
Co-authored-by: Andrej Ocenas <mr.ocenas@gmail.com>
Co-authored-by: spinillos <selenepinillos@gmail.com>
Co-authored-by: Karl Persson <kalle.persson@grafana.com>
Co-authored-by: Leonard Gram <leo@xlson.com>
2021-08-25 15:11:22 +02:00
gotjosh 2f27a5240b Alerting: Fix flake on test receiver tests (#38511)
* Alerting: Fix flake on test receiver tests

* Make the actual result from the API be sorted

* Use the correct letters
2021-08-24 17:22:11 +01:00
David Parrott 7fbeefc090 Alerting: create wrapper for Alertmanager to enable org level isolation (#37320)
Introduces org-level isolation for the Alertmanager and its components.

Silences, Alerts and Contact points are not separated by org and are not shared between them.

Co-authored with @davidmparrott and @papagian
2021-08-24 11:28:09 +01:00
Domas cb9912ec0a Alerting: button to test contact point (#37475) 2021-08-18 10:16:35 +03:00
George Robinson 3ca00f90b5 Contact point testing (#37308)
This commit adds contact point testing to ngalerts via a new API
endpoint. This endpoint accepts JSON containing a list of
receiver configurations which are validated and then tested
with a notification for a test alert. The endpoint returns JSON
for each receiver with a status and error message. It accepts
a configurable timeout via the Request-Timeout header (in seconds)
up to a maximum of 30 seconds.
2021-08-17 13:49:05 +01:00
Sofia Papagiannaki 7a01fb369d Alerting: Fix API spec generation (#37852)
* Alerting: Fix API spec generation

* Apply suggestion from code review

Co-authored-by: gotjosh <josue@grafana.com>
2021-08-13 16:15:53 +03:00
gotjosh f3f3fcc727 Alerting: Introduces /api/v1/ngalert/alertmanagers to expose discovered and dropped Alertmanager(s) (#37632)
* Alerting: Expose discovered and dropped Alertmanagers

Exposes the API for discovered and dropped Alertmanagers.

* make admin config poll interval configurable

* update after rebase

* wordsmith

* More wordsmithing

* change name of the config

* settings package too
2021-08-13 13:14:36 +01:00
Kyle Brandt aef67994a1 Annotations: Fix alerting annotation coloring (#37412)
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
2021-08-12 09:37:54 -07:00