Files

Tom Ratcliffe fc51ec70ba Alerting: Add manage permissions UI logic for Contact Points (#92885 )

* Add showPolicies prop

* Add manage permissions component for easier reuse within alerting

* Add method for checking whether to show access control within alerting

* Remove accidental console.log from main

* Tweak styling for contact point width and add manage permissions drawer

* Improve typing for access control type response

* Add basic test for manage permissions on contact points list

* Only show manage permissions if grafana AM and alertingApiServer is enabled

* Update i18n

* Add test utils for turning features on and back off

* Add access control handlers

* Update tests with new util

* Pass AM in and add tests

* Receiver OSS resource permissions

There is a complication that is not fully addressed: Viewer defaults to read:*
and Editor defaults to read+write+delete:*

This is different to other resource permissions where non-admin are not granted
any global permissions and instead access is handled solely by resource-specific
permissions that are populated on create and removed on delete.

This allows them to easily remove permission to view or edit a single resource
from basic roles.

The reason this is tricky here is that we have multiple APIs that can
create/delete receivers: config api, provisioning api, and k8s receivers api.
Config api in particular is not well-equipped to determine when creates/deletes
are happening and thus ensuring that the proper resource-specific permissions
are created/deleted is finicky.

We would also have to create a migration to populate resource-specific
permissions for all current receivers. This migration would need to be reset so
it can run again if the flag is disabled.

* Add access control permissions

* Pass in contact point ID to receivers form

* Temporarily remove access control check for contact points

* Include access control metadata in k8s receiver List & Get

GET: Always included.
LIST: Included by adding a label selector with value `grafana.com/accessControl`

* Include new permissions for contact points navbar

* Fix receiver creator fixed role to not give global read

* Include in-use metadata in k8s receiver List & Get

GET: Always included.
LIST: Included by adding a label selector with value `grafana.com/inUse`

* Add receiver creator permission to receiver writer

* Add receiver creator permission to navbar

* Always allow listing receivers, don't return 403

* Remove receiver read precondition from receiver create

Otherwise, Creator role will not be able to create their first receiver

* Update routes permissions

* Add further support for RBAC in contact points

* Update routes permissions

* Update contact points header logic

* Back out test feature toggle refactor

Not working atm, not sure why

* Tidy up imports

* Update mock permissions

* Revert more test changes

* Update i18n

* Sync inuse metadata pr

* Add back canAdmin permissions after main merge

* Split out check for policies navtree item

* Tidy up utils and imports and fix rules in use

* Fix contact point tests and act warnings

* Add missing ReceiverPermissionAdmin after merge conflict

* Move contact points permissions

* Only show contact points filter when permissions are correct

* Move to constants

* Fallback to empty array and remove labelSelectors (not needed)

* Allow `toAbility` to take multiple actions

* Show builtin alertmanager if contact points permission

* Add empty state and hide templates if missing permissions

* Translations

* Tidy up mock data

* Fix tests and templates permission

* Update message for unused contact points

* Don't return 403 when user lists receivers and has access to none

* Fix receiver create not adding empty uid permissions

* Move SetDefaultPermissions to ReceiverPermissionService

* Have SetDefaultPermissions use uid from string

Fixes circular dependency

* Add FakeReceiverPermissionsService and fix test wiring

* Implement resource permission handling in provisioning API and renames

Create: Sets to default permissions
Delete: Removes permissions
Update: If receiver name is modified and the new name doesn't exist, it copies
the permissions from the old receiver to the newly created one. If old receiver
is now empty, it removes the old permissions as well.

* Split contact point permissions checks for read/modify

* Generalise getting annotation values from k8s entities

* Proxy RouteDeleteAlertingConfig through MultiOrgAlertmanager

* Cleanup permissions on config api reset and restore

* Cleanup permissions on config api POST

note this is still not available with feature flag enabled

* Gate the permission manager behind FF until initial migration is added

* Sync changes from config api PR

* Switch to named export

* Revert unnecessary changes

* Revert Filter auth change and implement in k8s api only

* Don't allow new scoped permissions to give access without FF

Prevents complications around mixed support for the scoped permissions causing
oddities in the UI.

* Fix integration tests to account for list permission change

* Move to `permissions` file

* Add additional tests for contact points

* Fix redirect for viewer on edit page

* Combine alerting test utils and move to new file location

* Allow new permissions to access provisioning export paths with FF

* Always allow exporting if its grafana flavoured

* Fix logic for showing auto generated policies

* Fix delete logic for contact point only referenced by a rule

* Suppress warning message when renaming a contact point

* Clear team and role perm cache on receiver rename

Prevents temporarily broken UI permissions after rename when a user's source of
elevated permissions comes from a cached team or basic role permission.

* Debug log failed cache clear on CopyPermissions

---------

Co-authored-by: Matt Jacobson <matthew.jacobson@grafana.com>

2024-09-27 19:56:32 +01:00

accesscontrol

RBAC: Add required component to perform access control checks for user api when running single tenant (#93104 )

2024-09-23 11:26:44 +02:00

api

Alerting: Add manage permissions UI logic for Contact Points (#92885 )

2024-09-27 19:56:32 +01:00

backtesting

Alerting: Send information about alert rule to data source in headers (#90344 )

2024-07-17 22:55:12 +03:00

client

Alerting: Instrument outbound requests for Loki Historian and Remote Alertmanager with tracing (#89185 )

2024-06-14 13:24:12 -05:00

eval

refactor(alerting): remove transformation that is now done by the querier (#93660 )

2024-09-24 14:46:03 +03:00

image

Chore: Remove public vars in setting package (#81018 )

2024-01-23 12:36:22 +01:00

metrics

Alerting: Add a metric to track the number of rules with simplified editor settings (#93511 )

2024-09-20 17:56:40 +02:00

models

Alerting: Copy alert rule metadata when the rule is updated via provisioning API (#93723 )

2024-09-25 22:31:02 +02:00

notifier

Alerting: Managed receiver resource permission in config api (#93632 )

2024-09-25 09:39:36 -04:00

provisioning

Alerting: Update GetTemplates to return sorted list of templates (#93933 )

2024-09-27 18:49:37 +01:00

remote

Check is config is default by comparing hashes (#92296 )

2024-08-23 11:22:06 +02:00

schedule

Alerting: Add a metric to track the number of rules with simplified editor settings (#93511 )

2024-09-20 17:56:40 +02:00

sender

Alerting: Managed receiver resource permission in config api (#93632 )

2024-09-25 09:39:36 -04:00

state

Alerting: Fix logging for failed annotations writing. (#93856 )

2024-09-26 23:27:40 +02:00

store

Revert read replica POC (#93551 )

2024-09-25 15:21:39 -08:00

tests

Revert read replica POC (#93551 )

2024-09-25 15:21:39 -08:00

testutil

Revert read replica POC (#93551 )

2024-09-25 15:21:39 -08:00

writer

Alerting: Don't suppress translation errors in PointsFromFrames (#93747 )

2024-09-26 16:30:50 -05:00

accesscontrol.go

Alerting: Notifications Templates API (#91349 )

2024-09-25 09:31:57 -04:00

CHANGELOG.md

Update Alerting changelog (#56684 )

2022-10-11 10:55:18 +00:00

limits_test.go

Alerting: Decouple quota configuration logic from API interfaces and add tests (#78930 )

2023-12-01 10:47:19 -06:00

limits.go

Alerting: Guided legacy alerting upgrade dry-run (#80071 )

2024-01-05 18:19:12 -05:00

ngalert_test.go

Alerting: update rule versions on folder move (#88376 )

2024-08-13 12:26:26 +02:00

ngalert.go

Alerting: Managed receiver resource permission in config api (#93632 )

2024-09-25 09:39:36 -04:00

README.md

Alerting: Decouple rule routine from scheduler (#84018 )

2024-03-06 13:44:53 -06:00

README.md

Next generation alerting (ngalert) in Grafana 8

Ngalert (Next generation alert) is the next generation of alerting in Grafana 8.

Overview

The ngalert package can be found in pkg/services/ngalert and has the following sub-packages:

- api
- eval
- logging
- metrics
- models
- notifier
- schedule
- sender
- state
- store
- tests

Scheduling and evaluation of alert rules

The scheduling of alert rules happens in the schedule package. This package is responsible for managing the evaluation of alert rules including checking for new alert rules and stopping the evaluation of deleted alert rules.

The scheduler runs at a fixed interval, called its heartbeat, in which it does a number of tasks:

Fetch the alert rules for all organizations (excluding disabled)
Start a goroutine (if this is a new alert rule or the scheduler has just started) to evaluate the alert rule
Send an *evaluation event to the goroutine for each alert rule if its interval has elapsed
Stop the goroutines for all alert rules that have been deleted since the last heartbeat

The function that evaluates each alert rule is called run. It waits for an *evaluation event (sent each interval seconds elapsed and is configurable per alert rule) and then evaluates the alert rule. To ensure that the scheduler is evaluating the latest version of the alert rule it compares its local version of the alert rule with that in the *evaluation event, fetching the latest version of the alert rule from the database if the version numbers mismatch. It then invokes the Evaluator which evaluates any queries, classic conditions or expressions in alert rule and passes the results of this evaluation to the State Manager. An evaluation can return no results in the case of NoData or Error, a single result in the case of classic conditions, or more than one result if the alert rule is multi-dimensional (i.e. one result per label set). In the case of multi-dimensional alert rules the results from an evaluation should never contain more than one per label set.

The State Manager is responsible for determining the current state of the alert rule (normal, pending, firing, etc) by comparing each evaluation result to the previous evaluations of the same label set in the state cache. Given a label set, it updates the state cache with the new current state, the evaluation time of the current evaluation and appends the current evaluation to the slice of previous evaluations. If the alert changes state (i.e. pending to firing) then it also creates an annotation to mark it on the dashboard and panel for this alert rule.

You might have noticed that so far we have avoided using the word "Alert" and instead talked about evaluation results and the current state of an alert rule. The reason for that is at this time in the evaluation of an alert rule the State Manager does not know about alerts, it just knows for each label set the state of an alert rule, the current evaluation and previous evaluations.

Notification of alerts

When an evaluation transitions the state of an alert rule for a given label set from pending to firing or from firing to normal the scheduler creates an alert instance and passes it to Alertmanager. In the case where a label set is transitioning from pending to firing the state of the alert instance is "Firing" and when transitioning from firing to normal the state of the alert instance is "Normal".

Which Alertmanager?

In ngalert it is possible to send alerts to the internal Alertmanager, an external Alertmanager, or both.

The internal Alertmanager is called MultiOrgAlertmanager and creates an Alertmanager for each organization in Grafana to preserve isolation between organizations in Grafana. The MultiOrgAlertmanager receives alerts from the scheduler and then forwards the alert to the correct Alertmanager for the organization.

When Grafana is configured to send alerts to an external Alertmanager it does so via the sender which creates an abstraction over notification of alerts and discovery of external Alertmanagers in Prometheus. The sender receives alerts via the SendAlerts function and then passes them to Prometheus.

How does Alertmanager turn alerts into notifications?

Alertmanager receives alerts via the PutAlerts function. Each alert is validated and its annotations and labels are normalized, then the alerts are put in an in-memory structure. The dispatcher iterates over the alerts and matches it to a route in the configuration as explained here.

The alert is then matched to an alert group depending on the configuration in the route. The alert is then sent through a number of stages including silencing and inhibition (which is currently not supported) and at last the receiver which can include wait, de-duplication, retry.

What are notification channels?

Notification channels receive alerts and turn them into notifications and is often the last callback in the receiver after wait, de-duplication and retry.