Commit Graph

174 Commits

Author SHA1 Message Date
Steve Simpson 73873f5a8a Alerting: Optimize rule status gathering APIs when a limit is applied. (#86568)
* Alerting: Optimize rule status gathering APIs when a limit is applied.

The frontend very commonly calls the `/rules` API with `limit_alerts=16`. When
there are a very large number of alert instances present, this API is quite
slow to respond, and profiling suggests that a big part of the problem is
sorting the alerts by importance, in order to select the first 16.

This changes the application of the limit to use a more efficient heap-based
top-k algorithm. This maintains a slice of only the highest ranked items whilst
iterating the full set of alert instances, which substantially reduces the
number of comparisons needed. This is particularly effective, as the
`AlertsByImportance` comparison is quite complex.

I've included a benchmark to compare the new TopK function to the existing
Sort/limit strategy. It shows that for small limits, the new approach is
much faster, especially at high numbers of alerts, e.g.

100K alerts / limit 16: 1.91s vs 0.02s (-99%)

For situations where there is no effective limit, sorting is marginally faster,
therefore in the API implementation, if there is either a) no limit or b) no
effective limit, then we just sort the alerts as before. There is also a space
overhead using a heap which would matter for large limits.

* Remove commented test cases

* Make linter happy
2024-04-19 11:51:22 +02:00
Yuri Tseretyan b9abb8cabb Alerting: Update provisioning API to support regular permissions (#77007)
* allow users with regular actions access provisioning API paths
* update methods that read rules
skip new authorization logic if user CanReadAllRules to avoid performance impact on file-provisioning
update all methods to accept identity.Requester that contains all permissions and is required by access control.

* create deltas for single rul e 

* update modify methods
skip new authorization logic if user CanWriteAllRules to avoid performance impact on file-provisioning
update all methods to accept identity.Requester that contains all permissions and is required by access control.

* implement RuleAccessControlService in provisioning

* update file provisioning user to have all permissions to bypass authz

* update provisioning API to return errutil errors correctly

---------

Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2024-03-22 15:37:10 -04:00
Yuri Tseretyan e138ae3eb9 Alerting: Improve openAPI specification and docs for export endpoints (#85008) 2024-03-22 18:25:27 +02:00
Santiago c9bb18101c Alerting: Decrypt secrets before sending configuration to the remote Alertmanager (#83640)
* (WIP) Alerting: Decrypt secrets before sending configuration to the remote Alertmanager

* refactor, fix tests

* test decrypting secrets

* tidy up

* test SendConfiguration, quote keys, refactor tests

* make linter happy

* decrypt configuration before comparing

* copy configuration struct before decrypting

* reduce diff in TestCompareAndSendConfiguration

* clean up remote/alertmanager.go

* make linter happy

* avoid serializing into JSON to copy struct

* codeowners
2024-03-19 12:12:03 +01:00
Gilles De Mey 8765c48389 Alerting: Remove legacy alerting (#83671)
Removes legacy alerting, so long and thanks for all the fish! 🐟

---------

Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>
Co-authored-by: Sonia Aguilar <soniaAguilarPeiron@users.noreply.github.com>
Co-authored-by: Armand Grillet <armandgrillet@users.noreply.github.com>
Co-authored-by: William Wernert <rwwiv@users.noreply.github.com>
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2024-03-14 15:36:35 +01:00
gotjosh 948c8c45d6 Alerting: Use Alertmanager types extracted into grafana/alerting (#83824)
* Alerting: Use Alertmanager types extracted into grafana/alerting

We're in the process of exporting all Alertmanager types into grafana/alerting so that they can be imported in the Mimir Alertmanager, without a neeed to import Grafana directly.

This change introduces type aliasing for all Alertmanager types based on their 1:1 copy that now live in grafana/alerting.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-03-06 20:48:32 +00:00
Matthew Jacobson 2e8c514cfd Alerting: Stop persisting user-defined templates to disk (#83456)
Updates Grafana Alertmanager to work with new interface from grafana/alerting#161. This change stops passing user-defined templates to the Grafana Alertmanager by persisting them to disk and instead passes them by string.
2024-03-04 20:12:49 +02:00
Joe Blubaugh b905777ba9 Alerting: Support deleting rule groups in the provisioning API (#83514)
* Alerting: feat: support deleting rule groups in the provisioning API

Adds support for DELETE to the provisioning API's alert rule groups route, which allows deleting the rule group with a
single API call. Previously, groups were deleted by deleting rules one-by-one.

Fixes #81860

This change doesn't add any new paths to the API, only new methods.

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2024-02-28 10:19:02 -05:00
George Robinson a0353b237a Alerting: Update swagger specs (#83260) 2024-02-23 11:25:43 +00:00
George Robinson a564c8c439 Alerting: Keep order of time and mute time intervals consistent (#83257) 2024-02-22 16:57:20 +00:00
George Robinson 1ed1242358 Alerting: Basic support for time_intervals (#83216)
This commit adds basic support for time_intervals, as mute_time_intervals
is deprecated in Alertmanager and scheduled to be removed before 1.0.
It does not add support for time_intervals in API or file provisioning,
nor does it support exporting time intervals. This will be added in
later commits to keep the changes as simple as possible.
2024-02-22 15:58:56 +00:00
Yuri Tseretyan 1eebd2a4de Alerting: Support for simplified notification settings in rule API (#81011)
* Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping
* Support validation of notification settings when rules are updated

* Implement route generator for Alertmanager configuration. That fetches all notification settings.
* Update multi-tenant Alertmanager to run the generator before applying the configuration.

* Add notification settings labels to state calculation
* update the Multi-tenant Alertmanager to provide validation for notification settings

* update GET API so only admins can see auto-gen
2024-02-15 09:45:10 -05:00
Gokhan cf601fab09 Alerting: Enable sending notifications to a specific topic on Telegram (#79546)
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2024-02-06 17:19:22 +02:00
William Wernert 2ab7d3c725 Alerting: Receivers API (read only endpoints) (#81751)
* Add single receiver method

* Add receiver permissions

* Add single/multi GET endpoints for receivers

* Remove stable tag from time intervals

See end of PR description here: https://github.com/grafana/grafana/pull/81672
2024-02-05 20:12:15 +02:00
Yuri Tseretyan d1073deefd Alerting: Time intervals API (read only endpoints) (#81672)
* declare new API and models GettableTimeIntervals, PostableTimeIntervals
* add new actions alert.notifications.time-intervals:read and alert.notifications.time-intervals:write.
* update existing alerting roles with the read action. Add to all alerting roles.
* add integration tests
2024-02-01 15:17:13 -05:00
Julien Duchesne 40312c527b ngalert openapi: Fix ObjectMatchers definition (#79477)
These don't get marshalled and unmarshalled in the same way as they are represented in Go
This PR changes the OpenAPI spec to reflect what the API accepts and sends back
2024-01-19 14:37:11 -05:00
Julien Duchesne c9211fbd69 ngalert openapi: Use same basePath as rest of Grafana (#79025)
* ngalert openapi: Use same `basePath` as rest of Grafana
Currently, there are two issues that prevent easily merging `ngalert` and grafana openapi specs:
- The basePath is different. `grafana` has `/api` and `ngalert` has `/api/v1`. I changed `ngalert` to use `/api`
- The `ngalert` endpoints have their basePath in the each operation path. The basePath should actually be omitted
---------

Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>
2024-01-17 11:53:16 -05:00
Sofia Papagiannaki d1dab5828d Alerting: Update rule API to address folders by UID (#74600)
* Change ruler API to expect the folder UID as namespace

* Update example requests

* Fix tests

* Update swagger

* Modify FIle field in /api/prometheus/grafana/api/v1/rules

* Fix ruler export

* Modify folder in responses to be formatted as <parent UID>/<title>

* Add alerting test with nested folders

* Apply suggestion from code review

* Alerting: use folder UID instead of title in rule API (#77166)

Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>

* Drop a few more latent uses of namespace_id

* move getNamespaceKey to models package

* switch GetAlertRulesForScheduling to use folder table

* update GetAlertRulesForScheduling to return folder titles in format `parent_uid/title`.

* fi tests

* add tests for GetAlertRulesForScheduling when parent uid

* fix integration tests after merge

* fix test after merge

* change format of the namespace to JSON array

this is needed for forward compatibility, when we migrate to full paths

* update EF code to decode nested folder

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Virginia Cepeda <virginia.cepeda@grafana.com>
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
Co-authored-by: Alex Weaver <weaver.alex.d@gmail.com>
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
2024-01-17 11:07:39 +02:00
Julien Duchesne 2fb03dfa56 fix(swagger): Mute Timing PUT OK status is 202 (#80459) 2024-01-12 16:58:20 -05:00
Yuri Tseretyan 4479e7218d Alerting: MuteTiming service return errutil + GetTiming by name (#79772)
* add get mute timing by name to MuteTimingService
* update get mute timing request handler to use the service method

* replace validation, uniqueness and used errors with errutils
* update mute timing methods return errutil responses
* use the term "time interval" in errors bevause mute timings are deprecated in Alertmanager and will be replaced by time intervals in the future.

* update create and update methods to return struct instead of pointer
2024-01-12 21:23:44 +02:00
Matthew Jacobson aa03b8f8a7 Alerting: Guided legacy alerting upgrade dry-run (#80071)
This PR has two steps that together create a functional dry-run capability for the migration.

By enabling the feature flag alertingPreviewUpgrade when on legacy alerting it will:
    a. Allow all Grafana Alerting background services except for the scheduler to start (multiorg alertmanager, state manager, routes, …).
    b. Allow the UI to show Grafana Alerting pages alongside legacy ones (with appropriate in-app warnings that UA is not actually running).
    c. Show a new “Alerting Upgrade” page and register associated /api/v1/upgrade endpoints that will allow the user to upgrade their organization live without restart and present a summary of the upgrade in a table.
2024-01-05 18:19:12 -05:00
Alexander Weaver cf8e8852c3 Alerting: Drop NamespaceID from responses on unstable ngalert API endpoints in favor of NamespaceUID (#79359)
* Drop from API response

* Drop from swagger docs

* Drop from integration tests

* regenerate public swagger docs

* Drop from frontend

* Drop asserts for namespaceID field
2023-12-15 11:06:53 -06:00
Julien Duchesne 884e0427e6 ngalert openapi: Add X-Disable-Provenance to missing operations (#79278)
Swagger(ngalert): Add `X-Disable-Provenance` to missing operations
I added all functions that call the `determineProvenance` function

Schema changes are from:
`make` in `pkg/services/ngalert/api/tooling`
`make swagger-clean && make openapi3-gen` in root
2023-12-13 10:55:59 -05:00
Julien Duchesne f977e3faf5 ngalert swagger: Fix status code (#79415)
This endpoint returns a 202, not a 204
Let me know if we should instead change the response of the API
2023-12-12 13:40:36 -05:00
Yuri Tseretyan 8af08d0df2 Alerting: Add export of mute timings to file provisioning formats (#79225)
* add export of mute timings to file provisioning formats
* support export of mute timings to HCL
2023-12-11 21:36:51 -05:00
Yuri Tseretyan 2be7605794 Alerting: Fix fine-grained rule access control to use 403 for authorization error (#79239)
* use 403 for authorization error
* update silences API
* add ForbiddenError to rule API responses
2023-12-07 13:43:58 -05:00
Yuri Tseretyan 7e331c8507 Alerting: Support for condition field in /api/v1/eval (#79032)
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
2023-12-06 11:28:43 -05:00
Rodrigo Villablanca ab83bc7346 Alerting: Fix export of notification policy to JSON (#78021)
* Export Notification Policy correctly (#78020)

The JSON version of an exported Notification Policy now
inline correctly the policy in the same way the Yaml version
does.

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2023-12-04 16:57:37 -05:00
Yuri Tseretyan 48b55f39bf Alerting: Add support for responders to Opsgenie integration (#77159)
* add support for responders in opsgenie UI config
* update export model

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2023-10-27 13:06:46 -04:00
Steve Simpson a0476741f2 Alerting: Fix HCL export for alerts with non-zero "for" field. (#76739)
* Alerting: Fix HCL export for alerts with non-zero "for" field.

Fixes #76734

* fix tests

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2023-10-20 11:09:08 +02:00
Yuri Tseretyan 372082d254 Alerting: Export of contact points to HCL (#75849)
* add compat layer to convert from Export model to "new" API models
2023-10-12 22:33:57 +01:00
Yuri Tseretyan c4ac4eb41b Alerting: Export of notification policies to HCL (#76411) 2023-10-12 12:10:08 -04:00
Yuri Tseretyan 2497db4bd6 Alerting: Add UID of rules to response that were affected by update group request (#75985)
* update storage's method InstertRules to return ids of added rules as slice to keep the same order as rules in the argument
* schematize response of update rule group endpoint, add created, updated, deleted fields that contain UID of affected rules.
* update integration tests to use the new fields
2023-10-07 01:11:24 +03:00
Yuri Tseretyan 51499d7763 Alerting: Update alert rule export models to omit default values (#75918)
* do not include rule uid in response if it's empty
* make some fields of export models nillable
2023-10-05 15:16:44 -04:00
Yuri Tseretyan 027bd9356f Alerting: Rule Modify Export APIs (#75322)
* extend RuleStore interface to get namespace by UID
* add new export API endpoints
* implement request handlers
* update authorization and wire handlers to paths
* add folder error matchers to errorToResponse
* add tests for export methods
2023-10-02 11:47:59 -04:00
William Wernert 925f12d0ea Alerting: Add support for keep_firing_for field from external rulers (#75163)
* Add support for `keep_firing_for` in ruler proxy

* Don't delete `keep_firing_for` when editing a rule with the field set

Co-Authored-By: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>

---------

Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
2023-09-21 16:02:53 -04:00
Yuri Tseretyan 6f785f7269 Alerting: Support for single rule and multi-folder rule export (#74625) 2023-09-11 13:13:02 -04:00
Yuri Tseretyan dce492642a Alerting: Export of alert rules in HCL format (#73166)
* import hashicopr/hcl/v2
* add hcl package and export to HCL
* annotate export structs
---------

Co-authored-by: Konrad Lalik <konrad.lalik@grafana.com>
2023-09-11 11:48:23 -04:00
Yuri Tseretyan 99fd7b8141 Alerting: Update provisioning to validate user-defined UID on create (#73793)
* add ValidateUID to util
* provisioning to validate UID on rule creation

---------

Co-authored-by: brendamuir <100768211+brendamuir@users.noreply.github.com>
Co-authored-by: Alexander Weaver <weaver.alex.d@gmail.com>
2023-09-08 15:09:35 -04:00
Yuri Tseretyan 0df3647367 Alerting: extend rules export API to filter by folder and group (#74423)
update endpoint `GET /api/v1/provisioning/alert-rules/export` to accept query parameters `folderUid` and `group`
2023-09-07 17:34:32 -04:00
Ryan McKinley 025b2f3011 Chore: use any rather than interface{} (#74066) 2023-08-30 18:46:47 +03:00
Matthew Jacobson d31d175109 Alerting: Fix contact point testing with secure settings (#72235)
* Alerting: Fix contact point testing with secure settings

Fixes double encryption of secure settings during contact point testing and removes code duplication
that helped cause the drift between alertmanager and test endpoint. Also adds integration tests to cover
the regression.

Note: provisioningStore is created to remove cycle and the unnecessary dependency.
2023-07-25 10:04:27 -04:00
Matthew Jacobson cfb1656968 Alerting: Add notification policy provisioning file export (#70009)
* Alerting: Add notification policy provisioning file export

- Add provisioning API endpoint for exporting notification policies.
- Add option in notification policy view ellipsis dropdown for exporting.
- Update various provisioning documentation.
2023-07-24 17:56:53 -04:00
Matthew Jacobson 13121d3234 Alerting: Add contact point provisioning file export (#71692)
* Add contact point provisioning file export apis

* Regenerate api

* docs

* frontend

* add mock to tests

* Fix missing row-level export button on viewer role w/ prov. read

* Address review comments

---------

Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
2023-07-20 14:35:56 -04:00
Matthew Jacobson e3787de470 Alerting: Fix Alertmanager change detection for receivers with secure settings (#71307)
* Alerting: Make ApplyAlertmanagerConfiguration only decrypt/encrypt new/changed secure settings

Previously, ApplyAlertmanagerConfiguration would decrypt and re-encrypt all secure settings. However, this caused re-encrypted secure settings to be included in the raw configuration when applied to the embedded alertmanager, resulting in changes to the hash. Consequently, even if no actual modifications were made, saving any alertmanager configuration triggered an apply/restart and created a new historical entry in the database.

To address the issue, this modifies ApplyAlertmanagerConfiguration, which is called by POST `api/alertmanager/grafana/config/api/v1/alerts`, to decrypt and re-encrypt only new and updated secure settings. Unchanged secure settings are loaded directly from the database without alteration.

We determine whether secure settings have changed based on the following (already in-use) assumption: Only new or updated secure settings are provided via the POST `api/alertmanager/grafana/config/api/v1/alerts` request, while existing unchanged settings are omitted.

* Ensure saving a grafana-managed contact point will only send new/changed secure settings

Previously, when saving a grafana-managed contact point, empty string values were transmitted for all unset secure settings. This led to potential backend issues, as it assumed that only newly added or updated secure settings would be provided.

To address this, we now exclude empty ('', null, undefined) secure settings, unless there was a pre-existing entry in secureFields for that specific setting. In essence, this means we only transmit an empty secure setting if a previously configured value was cleared.

* Fix linting

* refactor omitEmptyUnlessExisting

* fixup

---------

Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
2023-07-11 08:23:07 +02:00
Matthew Jacobson ba3994d338 Alerting: Repurpose rule testing endpoint to return potential alerts (#69755)
* Alerting: Repurpose rule testing endpoint to return potential alerts

This feature replaces the existing no-longer in-use grafana ruler testing API endpoint /api/v1/rule/test/grafana. The new endpoint returns a list of potential alerts created by the given alert rule, including built-in + interpolated labels and annotations.

The key priority of this endpoint is that it is intended to be as true as possible to what would be generated by the ruler except that the resulting alerts are not filtered to only Resolved / Firing and ready to be sent.

This means that the endpoint will, among other things:

- Attach static annotations and labels from the rule configuration to the alert instances.
- Attach dynamic annotations from the datasource to the alert instances.
- Attach built-in labels and annotations created by the Grafana Ruler (such as alertname and grafana_folder) to the alert instances.
- Interpolate templated annotations / labels and accept allowed template functions.
2023-06-08 18:59:54 -04:00
Yuri Tseretyan ab5a3820d5 Alerting: Fix status code of successful response POST /api/alertmanager/grafana/api/v2/silences in swagger specs (#67951)
* update status code to reflect reality

* update docs
2023-05-15 11:23:30 -04:00
Matthew Jacobson 91471ac7ae Alerting: Template Testing API (#67450) 2023-04-28 15:56:59 +01:00
Matthew Jacobson eddd4f4508 Alerting: Add totalsFiltered to RuleResponse for hidden by filters count (#66883)
Alerting: Add totalsFiltered to RuleResponse to facilitate hidden by filters count

Currently, when both a limit_alerts and a matcher/state filter is applied, there is not enough information to determine how many alert instances were hidden by the filters. Only enough to determine the total hidden by the limit and filter combined.

This change adds a separate totalsFiltered field alongside the AlertRule totals that will contain the count of instances after filters but before limits.
2023-04-21 09:35:12 +01:00
George Robinson 19ebb079ba Alerting: Add limits and filters to Prometheus Rules API (#66627)
This commit adds support for limits and filters to the Prometheus Rules
API.

Limits:

It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:

- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule

It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.

Filters:

Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.

A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.

Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.

Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.

The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:

{
    "name": "test",
    "value": "value1",
    "isRegex": false,
    "isEqual": true
}

The `isRegex` and `isEqual` options work as follows:

| IsEqual | IsRegex  | Operator |
| ------- | -------- | -------- |
| true    | false    |    =     |
| true    | true     |    =~    |
| false   | true     |    !~    |
| false   | false    |    !=    |
2023-04-17 17:45:06 +01:00