Commit Graph

110 Commits

Author SHA1 Message Date
Serge Zaitsev d6d4097567 Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
Matthew Jacobson 8379a29b53 Alerting: Improve comments on alert table migration immutability (#62161)
* Alerting: Improve comments around alert migration immutability

* Reunite alerting config history migrations
2023-01-26 16:13:08 -05:00
Alex Moreno 531b439cf1 Alerting: Add alert pausing feature (#60734)
* Add field in alert_rule model, add state to alert_instance model, and state to eval

* Remove paused state from eval package

* Skip paused alert rules in scheduler

* Add migration to add is_paused field to alert_rule table

* Convert to postable alerts only if not normal, pernding, or paused

* Handle paused eval results in state manager

* Add Paused state to eval package

* Add paused alerts logic in scheduler

* Skip alert on scheduler

* Remove paused status from eval package

* Apply suggestions from code review

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Remove state

* Rethink schedule and manager for paused alerts

* Change return to continue

* Remove unused var

* Rethink alert pausing

* Paused alerts storing annotations

* Only add one state transition

* Revert boolean method renaming refactor

* Revert take image refactor

* Make registry errors public

* Revert method extraction for getting a folder title

* Revert variable renaming refactor

* Undo unnecessary changes

* Revert changes in test

* Remove IsPause check in PatchPartiLAlertRule function

* Use SetNormal to set state

* Fix text by returning to old behaviour on alert rule deletion

* Add test in schedule_unit_test.go to test ticks with paused alerts

* Add coment to clarify usage of context.Background()

* Add comment to clarify resetStateByRuleUID method usage

* Move rule get to a more limited scope

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* rum gofmt on pkg/services/ngalert/schedule/schedule.go

* Remove defer cancel for context

* Update pkg/services/ngalert/models/instance_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/models/testing.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/schedule/schedule_unit_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/schedule/schedule_unit_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/models/instance_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* skip scheduler rule state clean up on paused alert rule

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Fix mock in test

* Add (hopefully) final suggestions

* Use error channel from recordAnnotationsSync to cancel context

* Run make gen-cue

* Place pause alert check in channel update after version check

* Reduce branching un update channel select

* Add if for error and move code inside if in state manager ResetStateByRuleUID

* Add reason to logs

* Update pkg/services/ngalert/schedule/schedule.go

Co-authored-by: George Robinson <george.robinson@grafana.com>

* Do not delete alert rule routine, just exit on eval if is paused

* Reduce branching and create-close a channel to avoid deadlocks

* Separate state deletion and state reset (includes history saving)

* Add current pause state in rule route in scheduler

* Split clearState and bring errCh closer to RecordStatesAsync call

* Change rule to ruleMeta in RecordStatesAsync

* copy state to be able to modify it

* Add timeout to context creation

* Shorten the timeout

* Use resetState is rule is paused and deleteState if rule is not paused

* Remove Empty state reason

* Save every rule change in historian

* Add tests for DeleteStateByRuleUID and ResetStateByRuleUID

* Remove useless line

* Remove outdated comment

Co-authored-by: George Robinson <george.robinson@grafana.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: Armand Grillet <2117580+armandgrillet@users.noreply.github.com>
2023-01-26 18:29:10 +01:00
Kristin Laemmert e8b8a9e276 chore: move dashboard_acl models into dashboard service (#62151) 2023-01-26 08:46:30 -05:00
Kristin Laemmert 40feee0d17 chore: move alert-related models (#61716)
* chore: move alert notification models into the alerting service (alerting/models)
2023-01-23 08:19:25 -05:00
idafurjes b573b19ca3 Chore: Remove dashboards from models pkg (#61578)
* Copy dashboard models to dashboard pkg

* Use some models from current pkg instead of models

* Adjust api pkg

* Adjust pkg services

* Fix lint

* Chore: Remove dashboards models

* Remove dashboards from models pkg

* Fix lint in tests

* Fix lint in tests 2

* Fix for import in auth

* Remove newline

* Revert unused fix
2023-01-18 13:52:41 +01:00
Matthew Jacobson 63ba3ccb58 Alerting: Improve legacy migration to include send reminder & frequency (#60275)
* Alerting: Improve legacy migration to include send reminder & frequency

Legacy channel frequency is migrated to the channel's migrated route's
repeat interval if send reminder is true. If send reminder is false, we
pseudo-disable the repeat interval by setting it to a large value (1y).

If there were no default channels, the root notification policy is still
created with the default 4h repeat interval.
2023-01-10 23:01:43 -05:00
Alexander Weaver 0e7640475f Alerting: Store alertmanager configuration history in a separate table in the database (#60492)
* Update config store to split between active and history tables

* Migrations to fix up indexes

* Implement migration from old format to new

* Move add migrations call

* Delete duplicated rows

* Explicitly map fields

* Quote the column name because it's a reserved word

* Lift migrations to top

* Use XORM for nearly everything, avoid any non trivial raw SQL

* Touch up indexes and zero out IDs on move

* Drop TODO that's already completed

* Fix assignment of IDs
2023-01-04 10:43:26 -06:00
idafurjes bb35f37b66 Chore: Delete org model duplicates (#60940)
* Delete org model duplicates

* Fix lint

* Move OrgDetailsDTO to org pkg
2023-01-04 16:20:26 +01:00
Matthew Jacobson 570b62091c Alerting: Prevent uid collision in migration when db is case-insensitive (#60494)
* Alerting: Prevent short uid collision in legacy migration when db is case-insensitive

Two factors come into play that cause sporadic uid conflicts during legacy alert migration:
- MySQL and MySQL-compatible backends use case-insensitive collation.
- Our short uid generator is not a uniform RNG and generates uids in such a way that generations in quick succession have a higher probability of creating similar uids.

Normally we would be guaranteed unique short uid generation, however if the source alphabet contains
duplicate characters (for example, if we use case-insensitive comparison) this guarantee is void.

Generating even ~1000 uids in quick succession is nearly guaranteed to create a case-insensitive
duplicate.
2022-12-29 15:15:29 -05:00
Yuri Tseretyan f990be58cb Alerting: Use all notifiers from alerting repository (#60655) 2022-12-22 09:27:18 -05:00
Yuri Tseretyan 35090c376c Alerting: Replace VictorOps receiver with the one from alerting repository (#60543)
* replace victorops with one from alerting

* update other usages
2022-12-20 10:55:41 +01:00
Yuri Tseretyan f0cabe14d5 Alerting: import Grafana alerting package and update usages (#60490)
* update remaining notifiers to use alerting package
2022-12-19 10:53:58 -05:00
Yuri Tseretyan 9ad45aedcf Alerting: replace usage of simplejson to json.RawMessage in NotificationChannelConfig (#60423)
* introduce alias for json.RawMessage with name RawMessage. This is needed to keep raw JSON and implement a marshaler for YAML, which does not seem to be used but there are tests that fail.
* replace usage of simplejson with RawMessage in NotificationChannelConfig
* remove usage of simplejson in tests
* change migration code to convert simplejson to raw message
2022-12-16 13:01:06 -05:00
Alexander Weaver 91bd1cdb41 Revert "Alerting: Store alertmanager configuration history in a separate table in the database" (#60470)
Revert "Alerting: Store alertmanager configuration history in a separate table in the database (#60197)"

This reverts commit ec80f38c34.
2022-12-16 10:07:44 -05:00
Alexander Weaver ec80f38c34 Alerting: Store alertmanager configuration history in a separate table in the database (#60197)
* Update config store to split between active and history tables

* Migrations to fix up indexes

* Implement migration from old format to new

* Move add migrations call

* Delete duplicated rows

* Explicitly map fields

* Quote the column name because it's a reserved word

* Lift migrations to top
2022-12-15 17:35:00 -06:00
Yuri Tseretyan 6637333748 Alerting: refactor notifiers to use package specific Logger interface (#60361)
* introduce Logger interface local to channles + implementaton that wraps the Grafana logger
* make NewFactoryConfig accept LoggerFactory
* add logger field to FactoryConfig
* update usages of log.Logger to internal interface
2022-12-15 11:10:31 -05:00
Alexander Weaver 3bdffc92cf Alerting: Create alertmanager config history table (#60103)
Create config history table
2022-12-09 11:42:40 -06:00
Ryan McKinley 5b71a16acf Slugify: Replace gosimple/slug with a simple function (#59517) 2022-11-30 11:12:56 -05:00
Denis Limarev 4d8287b319 Performance: add preallocation for some slice/map (#57860)
This change preallocates slices and maps where the size of the data is known before the object is created.

Co-authored-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-11-22 20:24:36 +08:00
Alexander Weaver 0dfd78c88c Attempt to preserve UID from migrated channel (#57639) 2022-10-31 11:29:07 -05:00
Matthew Jacobson 0db339d82f Alerting: Improve notification policies created during migration (#52071)
* Alerting: Improve notification policies created during migration

Previously, migrated legacy alerts were connected to notification policies through
a `rule_uid` label in a 1:1 fashion. While this correctly mimicked pre-migration routing,
it didn't create a notification policy structure that is easy to view/modify. In addition,
having one policy per migrated alert is, in some ways, counter to the recommended approach of
Unified Alerting.

This change replaces `rule_uid`-based migrated notification policies with a private
label called `__contacts__`. This label stores a list of double quoted strings containing the names of
all contact points an AlertRule should route to (based on legacy notification channels). Finally,
one notification policy is created per contact point with each matching AlertRules via regex on this
`__contacts__` label.

The result is a simpler, clearer, and easier to modify notification policy structure, with the
added benefit that you can see which contact points an AlertRule is being routed to from the
AlertRule creation page.
2022-10-18 00:47:39 -04:00
Yuriy Tseretyan 3487e68d15 Alerting: Fix migration to create rules with group index 1 (#56511) 2022-10-07 17:20:01 -04:00
Yuriy Tseretyan e2f1201382 Alerting: Fix migration to not add label "alertname" (#56509)
* do not add label alertname because it is overridden in state manager anyway
* update state manager to not consider labels with same value as dupe
2022-10-07 15:06:53 -04:00
Jean-Philippe Quéméner f3a307778a Alerting: cache general folder in migration based on org id (#55620) 2022-09-23 18:22:45 +02:00
Emil Tullstedt 647997cc4c Alerting: Fix flaky test (#55551)
The length of the identifier from the underlying library is 9 or more characters depending on the rate at which the identifiers are generated. See https://pkg.go.dev/github.com/teris-io/shortid

The test previously made the assumption that the length will always be 10, which would intermittently fail.
2022-09-21 14:17:25 +02:00
Alexander Weaver 9f45e2e706 Alerting: Fix legacy migration crash when rule name is too long (#55053)
* Extract standardized UID field length to constant

* Extract default length to constant

* Truncate rule names that are too long

* Add tests for name normalization

* Fix whitespace lint error

* Another linter fix

* Empty commit to kick build
2022-09-13 13:53:09 -05:00
Emil Tullstedt b287047052 Chore: Upgrade Go to 1.19.1 (#54902)
* WIP

* Set public_suffix to a pre Ruby 2.6 version

* we don't need to install python

* Stretch->Buster

* Bump versions in lib.star

* Manually update linter

Sort of messy, but the .mod-file need to contain all dependencies that
use 1.16+ features, otherwise they're assumed to be compiled with
-lang=go1.16 and cannot access generics et al.

Bingo doesn't seem to understand that, but it's possible to manually
update things to get Bingo happy.

* undo reformatting

* Various lint improvements

* More from the linter

* goimports -w ./pkg/

* Disable gocritic

* Add/modify linter exceptions

* lint + flatten nested list

Go 1.19 doesn't support nested lists, and there wasn't an obvious workaround.
https://go.dev/doc/comment#lists
2022-09-12 12:03:49 +02:00
Valério Valério b5142832fa Alerting: Fix saving of screenshots uploaded with a signed url (#53933)
The URL of screenshots uploaded to external image storages can be optionally signed, resulting in a long string (800+ chars).
2022-08-24 12:40:50 +01:00
idafurjes 6afad51761 Move SignedInUser to user service and RoleType and Roles to org (#53445)
* Move SignedInUser to user service and RoleType and Roles to org

* Use go naming convention for roles

* Fix some imports and leftovers

* Fix ldap debug test

* Fix lint

* Fix lint 2

* Fix lint 3

* Fix type and not needed conversion

* Clean up messages in api tests

* Clean up api tests 2
2022-08-10 11:56:48 +02:00
Sofia Papagiannaki ae101bf935 Alerting: Fix migration (#53253) 2022-08-03 11:41:18 -04:00
idafurjes f5cace8bbd Rename Acl to ACL (#52342)
* Rename Acl to ACL

* Fix yaml files

* Add xorm tags and fix test
2022-07-18 15:14:58 +02:00
Matthew Jacobson 434e94ef2b Alerting: Update default route groupBy to [grafana_folder, alertname] (#50052)
* Alerting: Update default route groupBy to [grafana_folder, alertname]

Default group by for new routes and migrations is now [grafana_folder, alertname]
2022-07-11 12:24:43 -04:00
Kristin Laemmert 9de00c8eb2 chore/backend: move dashboard errors to dashboard service (#51593)
* chore/backend: move dashboard errors to dashboard service

Dashboard-related models are slowly moving out of the models package and into dashboard services. This commit moves dashboard-related errors; the rest will come in later commits.

There are no logical code changes, this is only a structural (package) move.

* lint lint lint
2022-06-30 09:31:54 -04:00
Emil Tullstedt 7d815a1db5 Alerting: Use google/uuid instead of gofrs/uuid (#51242) 2022-06-28 11:57:24 +02:00
Kristin Laemmert 945f015770 backend/datasources: move datasources models into the datasources service package (#51267)
* backend/datasources: move datasources models into the datasources service pkg
2022-06-27 12:23:15 -04:00
gotjosh 90646e7f41 Alerting: Don't stop the migration when alert rule tags are invalid (#51253)
* Alerting: Don't stop the migration when alert rule tags are invalid

As we migrate we expect the `alertRuleTags` on a dashboard alert to be a JSON object. However, it seems this is not really validated by Grafana and an user can change the format to something else that the JSON parser is not able to marshal into a `map[string]string`.

Let's do a bit better by "attempting" to parse the tags and if we can't we'll simple return an empty map. The data is still there so if the user wishes they can go back, fix the data and attemp the migration again.
2022-06-22 17:39:17 +01:00
Yuriy Tseretyan 4d02f73e5f Alerting: Persist rule position in the group (#50051)
Migrations:
* add a new column alert_group_idx to alert_rule table
* add a new column alert_group_idx to alert_rule_version table
* re-index existing rules during migration

API:
* set group index on update. Use the natural order of items in  the array as group index
* sort rules in the group on GET
* update the version of all rules of all affected groups. This will make optimistic lock work in the case of multiple concurrent request touching the same groups.

UI:
* update UI to keep the order of alerts in a group
2022-06-22 10:52:46 -04:00
George Robinson 0bd12ab531 Alerting: Fix force_migration when alerting is disabled (#50431)
* Alerting: Fix force_migration when alerting is disabled

This commit fixes a bug where force_migration must be set to true
when both unified and legacy alerting is disabled.

* Update comment

* Fix typo in comment

Co-authored-by: Armand Grillet <armand.grillet@outlook.com>
2022-06-09 10:10:11 +02:00
Joe Blubaugh 30f035ca34 Alerting: Improve Unified Alerting Rollback Warning (#50470)
After migrating to unified alerting, users must explicitly allow rolling
back to legacy alerting by setting force_migration = true in config.
This updates the panic message to clarify why that's required and what
the consequences of rolling back will be.

Fixes #50469
2022-06-09 13:31:49 +08:00
idafurjes e9f8d582c8 Chore: Remove dashboard version from models (#50287)
* Remove dashbpard version from models

* Fix lint

* Fix api & sqlstore tests

* Remove integration tags

* Fix lint again

* Add integration test to correct namespace

* Lont fix 2

* Change Id to ID in dashVersionMeta
2022-06-08 12:22:55 +02:00
George Robinson 3b7f871bf4 Alerting: Enable Unified Alerting for open source and enterprise (#49834)
This commit enables Unified Alerting for open source and enterprise unless disabled in configuration.
2022-05-30 16:47:15 +01:00
Joe Blubaugh 1cc034d960 Alerting: Add a "Reason" to Alert Instances to show underlying cause of state. (#49259)
This change adds a field to state.State and models.AlertInstance
that indicate the "Reason" that an instance has its current state. This
helps us account for cases where the state is "Normal" but the
underlying evaluation returned "NoData" or "Error", for example.

Fixes #42606

Signed-off-by: Joe Blubaugh <joe.blubaugh@grafana.com>
2022-05-23 16:49:49 +08:00
Joe Blubaugh 12c25759da Alerting: Attach screenshot data to Slack notifications. (#49374)
This change extracts screenshot data from alert messages via a private annotation `__alertScreenshotToken__` and attaches a URL to a Slack message or uploads the data to an image upload endpoint if needed.

This change also implements a few foundational functions for use in other notifiers.
2022-05-23 14:24:20 +08:00
Joe Blubaugh 687e79538b Alerting: Add a general screenshot service and alerting-specific image service. (#49293)
This commit adds a pkg/services/screenshot package for taking and uploading screenshots of Grafana dashboards. It supports taking screenshots of both dashboards and individual panels within a dashboard, using the rendering service.

The screenshot package has the following services, most of which can be composed:

BrowserScreenshotService (Takes screenshots with headless Chrome)
CachableScreenshotService (Caches screenshots taken with another service such as BrowserScreenshotService)
NoopScreenshotService (A no-op screenshot service for tests)
SingleFlightScreenshotService (Prevents duplicate screenshots when taking screenshots of the same dashboard or panel in parallel)
ScreenshotUnavailableService (A screenshot service that returns ErrScreenshotsUnavailable)
UploadingScreenshotService (A screenshot service that uploads taken screenshots)

The screenshot package does not support wire dependency injection yet. ngalert constructs its own version of the service. See https://github.com/grafana/grafana/issues/49296

This PR also adds an ImageScreenshotService to ngAlert. This is used to take screenshots with a screenshotservice and then store their location reference for use by alert instances and notifiers.
2022-05-22 22:33:49 +08:00
Yuriy Tseretyan d87fdc1037 Alerting: Update migration to migrate only alerts that belong to existing org\dashboard (#49192)
* Update migration to migrate only alerts that belong to existing org\dashboard
2022-05-18 16:00:08 -04:00
Matthew Jacobson 5c32a6b6f6 Alerting: Fix flaky migration test (#48595)
* Fix flaky migration test
2022-05-18 13:23:13 -04:00
Yuriy Tseretyan 00ef1acb93 Alerting: Create folder for alerting when start from the scratch (#48866)
* create folderHelper struct
2022-05-13 11:49:04 -04:00
Yuriy Tseretyan f85e758972 unhide alert rule's data sources during migraiton (#48559) 2022-05-04 09:31:05 -04:00
Jean-Philippe Quéméner 0a87ef06af Alerting: add safeguard for migrations that might cause dataloss (#48526)
* Alerting: add safeguard for migrations that might cause dataloss

* add test for panic

* add documentation
2022-05-02 10:38:42 +02:00