Commit Graph

148 Commits

Author SHA1 Message Date
Yuri Tseretyan 5097dd5c7d Alerting: Send templates from extra configuration to remote Alertmanager (#107981)
* extract logging of MergedResult into method

* convert GetMergedTemplateDefinitions to return PostableApiTemplate

* update mergeExtracConfigs to return GrafanaAlertmanagerConfig

* pass by value, not pointer

* add template definition to payload

* update tests

* rename to Templates

* log merge results

* fix reference in workspace
2025-07-18 15:16:17 -04:00
Vadim Stepanov bccc980b90 Alerting: Notifiication history (#107644)
* Add unified_alerting.notification_history to ini files

* Parse notification history settings

* Move Loki client to a separate package

* Loki client: add params for metrics and traces

* add NotificationHistorian

* rm writeDuration

* remove RangeQuery stuff

* wip

* wip

* wip

* wip

* pass notification historian in tests

* unify loki settings

* unify loki settings

* add test

* update grafana/alerting

* make update-workspace

* add feature toggle

* fix configureNotificationHistorian

* Revert "add feature toggle"

This reverts commit de7af8f7

* add feature toggle

* more tests

* RuleUID

* fix metrics test

* met.Info.Set(0)
2025-07-17 14:26:26 +01:00
Alexander Akhmetov 5a78ce8a4b Alerting: Support saving extra Mimir configurations (#106721) 2025-06-17 23:33:16 +02:00
Yuri Tseretyan b0ff51a903 Alerting: Support for Mimir configuration in Grafana Alertmanager (#106402) 2025-06-13 16:32:23 +02:00
Kevin Minehart 910eb1dd9e Security: apply patch 428 (#106710)
* declare dingding url as secret

patch raw settings before parsing because DingDing's config parser does not know about secrets

* fix integration test

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2025-06-13 15:56:26 +02:00
Yuri Tseretyan d019b4ff1b Alerting: Change logging in Alertmanager (#105704)
* change logger to ngalert.notifier and use component label
* update alerting module

---------

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2025-05-21 10:13:18 -04:00
Yuri Tseretyan 7dc13d63b9 Alerting: Refactor Grafana Alertmanager (#105568)
* update alerting module

* use NotificationsConfiguration
* update to use opts and configure new fields
* use TenantID instead of orgId
2025-05-20 04:40:51 +03:00
Matthew Jacobson 5795ba34f9 Alerting: Update grafana/alerting from de176b4a0309 to 83b6de6b0a35 (#105157)
* Update grafana alerting from de176b4a0309 to 83b6de6b0a35

Includes:
- https://github.com/grafana/alerting/pull/319
- https://github.com/grafana/alerting/pull/317

* Remove unused SendWebhook method from sender struct

grafana/alerting hasn't used the grafana webhook sender for a while now,
so this method is no longer used anywhere.

- Removed SendWebhook from the sender struct and rename it to emailSender
so that its use is clearer.
- Also, for similar reasons, the Webhook method on Grafana's
webhook sender `sendWebRequestSync` should not call grafana/alerting code for
NewTLSClient. The previous grafana/alerting function is vendored into grafana.

* Use BuildReceiverIntegrations new func signature
2025-05-09 12:26:20 -04:00
Sonia Aguilar 0ceea29787 Alerting: Remove alertingSimplifiedRouting feature toggle (#104980)
Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>
2025-05-09 16:30:56 +03:00
Yuri Tseretyan c8d92ee06a Alerting: Refactor applyConfig in Alertmanager (#104970)
* refactor: remove applyAndMarkConfig

---------

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2025-05-06 09:36:22 -04:00
Steve Simpson 870d401f75 Alerting: Update grafana/alerting (#103607)
* Alerting: Update grafana/alerting

* Run make update-workspace
2025-04-10 00:12:02 +02:00
alerting-team[bot] 6df99a6224 Alerting: Update alerting module to 58ba6c617ff05eb1d6f65c59d369a6a16923dff6 (#102812)
* remove feature flag alertingAlertmanagerExtraDedupStage
* use most recent version of fork of alertmanager

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2025-03-26 15:15:10 -04:00
Matthew Jacobson 75c4c5ca0f Alerting: Upgrade grafana/alerting to 92d5f29 (#100982)
* Alerting: Upgrade grafana/alerting to 92d5f29

Includes:
- Add more context to log in PipelineAndStateTimestampCoordinationStage (#277)
- Update Alertmanager fork to latest commit (#279)
- Copy http client from Grafana (#281)

* Satisfy signature change from grafana/alerting #281 (http client)
2025-02-19 18:49:46 +02:00
Yuri Tseretyan 0be6e1bb86 Alerting: Extra dedup stage in Grafana Alertmanager (#99825)
* add feature flags

* update alerting module

* update grafana alertmanager to configure the extra dedup stage

---------

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2025-01-31 11:12:38 -05:00
Matthew Jacobson a6dffd7552 Upgrade grafana/alerting to 209e052dba64 (#99118)
Update grafana/alerting to 209e052dba64

Includes:
- Add NoopDecode function for non-base64-encoded secrets (#264)
- Log duplicated receivers (#265)
2025-01-17 21:53:41 +02:00
Santiago aa77023008 Alerting: Fix panics when attempting to create an Alertmanager after failing (#94023) 2024-09-30 13:50:35 -03:00
Tito Lins 539d363d2c Create histogram and observe grafana config size (#93028) 2024-09-06 18:25:16 +02:00
Santiago b79b38f02c Alertmanager: Support limits for silences (#90826)
* Alertmanager: support limits for silences

* update grafana/alerting to latest main
2024-07-24 14:22:29 +02:00
Matthew Jacobson f79dd7c7f9 Alerting: Persist silence state immediately on Create/Delete (#84705)
* Alerting: Persist silence state immediately on Create/Delete

Persists the silence state to the kvstore immediately instead of waiting for the
 next maintenance run. This is used after Create/Delete to prevent silences from
 being lost when a new Alertmanager is started before the state has persisted.
 This can happen, for example, in a rolling deployment scenario.

* Fix test that requires real data

* Don't error if silence state persist fails, maintenance will correct
2024-04-09 13:39:34 -04:00
Santiago 2e7cc68394 Alerting: Remove CleanUp method from the Alertmanager (#85650)
Alerting: Remove Cleanup method from the Alertmanager
2024-04-09 12:13:27 +02:00
Santiago c7573bb0f7 Alerting: Make retention period configurable for the notification log (#85605)
* Alerting: Make retention period configurable for the notification log

* update sample.ini

* fix outdated comment (on disk -> kvstore)

* skip checking cyclomatic complexity for ReadUnifiedAlertingSettings
2024-04-05 12:25:43 +02:00
Matthew Jacobson 0c3c5c5607 Alerting: Stop persisting silences and nflog to disk (#84706)
With this change, we no longer need to persist silence/nflog states to disk in addition to the kvstore
2024-03-23 00:37:33 +02:00
Matthew Jacobson 2e8c514cfd Alerting: Stop persisting user-defined templates to disk (#83456)
Updates Grafana Alertmanager to work with new interface from grafana/alerting#161. This change stops passing user-defined templates to the Grafana Alertmanager by persisting them to disk and instead passes them by string.
2024-03-04 20:12:49 +02:00
Yuri Tseretyan 1eebd2a4de Alerting: Support for simplified notification settings in rule API (#81011)
* Add notification settings to storage\domain and API models. Settings are a slice to workaround XORM mapping
* Support validation of notification settings when rules are updated

* Implement route generator for Alertmanager configuration. That fetches all notification settings.
* Update multi-tenant Alertmanager to run the generator before applying the configuration.

* Add notification settings labels to state calculation
* update the Multi-tenant Alertmanager to provide validation for notification settings

* update GET API so only admins can see auto-gen
2024-02-15 09:45:10 -05:00
Matthew Jacobson 0ce1ccd6f9 Alerting: Fix inconsistent AM raw config when applied via sync vs API (#81655)
AM config applied via API would use the PostableUserConfig as the AM raw
config and also the hash used to decide when the AM config has changed.
However, when applied via the periodic sync the PostableApiAlertingConfig would
be used instead.

This leads to two issues:
- Inconsistent hash comparisons when modifying the AM causing redundant applies.
- GetStatus assumed the raw config was PostableUserConfig causing the endpoint
to return correctly after a new config is applied via API and then nothing once
 the periodic sync runs.

Note: Technically, the upstream GrafanaAlertamanger GetStatus shouldn't be
returning PostableUserConfig or PostableApiAlertingConfig, but instead
GettableStatus. However, this issue required changes elsewhere and is out of
scope.
2024-01-31 21:05:30 +02:00
George Robinson 85b9edcd28 Alerting: Fix incorrect initialization of logger (#81099) 2024-01-23 17:29:38 +02:00
Santiago 3afd94185c Alerting: Add metric to check for default AM configurations (#80225)
* Alerting: Add metric to check for default AM configurations

* Use a gauge for the config hash

* don't go out of bounds when converting uint64 to float64

* expose metric for config hash

* update metrics after applying config
2024-01-16 17:12:24 +01:00
Santiago a77ba40ed4 Alerting: Use the forked Alertmanager for remote secondary mode (#79646)
* (WIP) Alerting: Use the forked Alertmanager for remote secondary mode

* fall back to using internal AM in case of error

* remove TODOs, clean up .ini file, add orgId as part of remote AM config struct

* log warnings and errors, fall back to remoteSecondary, fall back to internal AM only

* extract logic to decide remote Alertmanager mode to a separate function, switch on mode

* tests

* make linter happy

* remove func to decide remote Alertmanager mode

* refactor factory function and options

* add default case to switch statement

* remove ineffectual assignment
2023-12-21 15:26:31 +01:00
Santiago 73776f37eb Alerting: Send state to the remote Alertmanager (#78538)
* Alerting: Introduce a Mimir client as part of the Remote Alertmanager

Mimir client that understands the new APIs developed for mimir. Very much a WIP still.

* more wip

* appease the linter

* more linting

* add more code

* get state from kvstore, encode, send

* send state to the remote Alertmanager, extract fullstate logic into its own function

* pass kvstore to remote.NewAlertmanager()

* refactor

* add fake kvstore to tests

* tests

* use FileStore to get state

* always log 'completed state upload'

* refactor compareRemoteConfig

* base64-encode the state in the file store

* export silences and nflog filenames, refactor

* log 'completed state/config upload...' regardless of outcome

* add values to the state store in tests

* address code review comments

* log error from filestore

---------

Co-authored-by: gotjosh <josue.abreu@gmail.com>
2023-11-29 12:49:39 +01:00
Santiago a6b9b27673 Alerting: Remove OrgID() from the Alertmanager interface (#77398) 2023-10-31 10:58:47 +01:00
Santiago f9fc2e4568 Alerting: Remove ConfigHash() from the Alertmanager interface (#77134) 2023-10-25 17:11:53 +02:00
Santiago 322a9c0b15 Alerting: Replace FileStore() for CleanUp() in the Alertmanager interface (#77126)
Alerting: Remplace FileStore() for CleanUp() in the Alertmanager interface
2023-10-25 13:58:28 +02:00
Santiago 61cb26711e Alerting: Fetch alerts from a remote Alertmanager (#75844)
* Alerting: post alerts to the remote Alertmanager and fetch them

* fix broken tests

* Alerting: Add Mimir Backend image to devenv (blocks)

* add alerting as code owner for mimir_backend block

* Alerting: Use Mimir image to run integration tests for the remote Alertmanager

* skip integration test when running all tests

* skipping integration test when no Alertmanager URL is provided

* fix bad host for mimir_backend

* remove basic auth testing until we have an nginx image in our CI

* add integration tests for alerts

* fix tests

* change SendCtx -> Send, add context.Context to Send, fix CI

* add reover() for functions from the Prometheus Alertmanager HTTP client that could panic

* add TODO to implement PutAlerts in a way that mimicks what Prometheus does

* fix log format
2023-10-19 11:27:37 +02:00
Santiago 93b9f9b537 Alerting: Use interfaces for the Alertmanager (#73900) 2023-09-06 07:59:29 -03:00
Santiago ff9eff49bd Alerting: Bump grafana/alerting and refactor the ImageStore/Provider to provide image URL/bytes (#70182)
* implement alerting.images.Provider interface in our ImageStore

* add URLExists() method to fakeConfigStore

* make linter happy

* update integration tests
2023-06-21 20:53:30 -03:00
George Robinson f085e99d3c Alerting: Add matchers metrics to Alertmanager (#69855) 2023-06-15 09:18:01 +01:00
Matthew Jacobson c16f1f5e99 Alerting: Fix provisioned templates being ignored by alertmanager (#69485)
* Alerting: Fix provisioned templates being ignored by alertmanager

Template provisioning sets the template in cfg.TemplateFiles while a recent change
made it so that alertmanager reads cfg.AlertmanagerConfig.Templates instead.

This change fixes the issue on both ends, by having provisioning set boths fields and
reverts the change on the alertmanager side so that it uses cfg.TemplateFiles.
2023-06-02 15:47:43 -04:00
Alexander Weaver 0ed5d3bdf2 Revert "Alerting: Refactor the ImageStore/Provider to provide image URL/bytes" (#69265)
Revert "Alerting: Refactor the ImageStore/Provider to provide image URL/bytes (#67693)"

This reverts commit 72a187b0be.
2023-05-30 11:33:33 -05:00
Santiago 72a187b0be Alerting: Refactor the ImageStore/Provider to provide image URL/bytes (#67693)
* (WIP) Refactor the ImageStore interface to work with our latest alerting repository

* update alerting package

* refactor, new URLExists method in ImageProvider

* tests for the new methods

* fix linter warnings

* use alertingImages as an alias for grafana/alerting/images

* logs about image uris and not found images

* nerf image not found logs

* extract duplicated code to getImageFromURI() method

* refactor getImageFromURI()

* add index on url

* add comment about migration log

* sync generated files
2023-05-30 11:25:55 -03:00
Sladyn a06a5a7393 Alerting: Improve log messages (#67688)
* Rename base logger and capatilize messages
* Remove cflogger from config.go
2023-05-25 18:55:01 +03:00
Matthew Jacobson 91471ac7ae Alerting: Template Testing API (#67450) 2023-04-28 15:56:59 +01:00
Yuri Tseretyan a8b4a4bb45 Alerting: Update alerting module to 20230418161049-5f374e58cb32 + refactoring (#66622)
* update to alerting 20230418161049-5f374e58cb32
* rename renamed structs in https://github.com/grafana/alerting/pull/73
* update ValidateContactPoint to use BuildReceiverConfiguration
* update logger factory according to changes
* rewrite integration builder
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
2023-04-25 13:39:46 -04:00
Yuri Tseretyan afd52d0866 Alerting: use alerting GrafanaReceiver and BuildReceiverConfiguration in Grafana (#65224)
* replace receiver errors with one from alerting
* add the converter to alerting models
* update buildReceiverIntegration to accept GrafanaReceiver
---------

Co-authored-by: George Robinson <george.robinson@grafana.com>
2023-04-13 12:25:32 -04:00
Yuri Tseretyan e760f22402 Alerting: Use background context for maintenance function (#64065) 2023-03-02 14:19:52 -05:00
Yuri Tseretyan f066e8cdcd Alerting: Update to alerting 20230203015918-0e4e2675d7aa (after refactoring) (#62823)
* add alerting prefix to some packages from alerting that have similar names in prometheus alertmanager
2023-02-03 11:36:49 -05:00
Santiago ba731f7865 Alerting: Mark AM configuration as applied (#61330)
* Mark AM configuration as applied

* add missing checks, make linter happy

* fix deadlock, mark as valid on save and on load

* mark configurations only if needed

* check error after applyConfig()

* code review comments

* code review changes

* more code review changes

* clean HistoricConfigFromAlertConfig function
2023-02-02 14:45:17 -03:00
Serge Zaitsev d6d4097567 Chore: Fix goimports grouping in alerting (#62424)
* fix goimports

* fix goimports order
2023-01-30 09:55:35 +01:00
gotjosh 0be920e61c Alerting: Remove unused code after importing from grafana/alerting (#61869)
* Alerting: Remove unused code after importing from grafana/alerting
2023-01-23 10:30:10 +00:00
Denis Limarev e6dee8a723 Perfomance: Preallocate slices (#61580) 2023-01-17 11:50:17 +00:00
gotjosh e7cd6eb13c Alerting: Use alerting.GrafanaAlertmanager instead of initialising Alertmanager components directly (#61230)
* Alerting: Use `alerting.GrafanaAlertmanager` instead of initialising Alertmanager components directly
2023-01-13 12:54:38 -04:00