Commit Graph

75 Commits

Author SHA1 Message Date
Mariell Hoversholm 757be6365a CI: Bump golangci-lint to 2.0.2 (#103572) 2025-04-10 14:42:23 +02:00
Matthew Jacobson 371ea5cda7 Alerting: Fix loss of TimeInterval location on remote AM apply (#102510)
* Alerting: Fix loss of TimeInterval location on remote AM apply

deepcopy.Copy does not correctly copy PostableUserConfig because it ignores
unexported fields. As a result, TimeInterval locations default to UTC instead
of retaining their original values.

* make update-workspace
2025-03-20 09:54:33 +01:00
Yuri Tseretyan c3f00eb403 Alerting: log body of unexpected response from Mimir (#102382)
log body of unexpected response

Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2025-03-19 10:14:05 -04:00
Santiago 7d4895c3c9 Alerting: Use exponential backoff in the remote Alertmanager readiness check (#99756)
* Alerting: Use exponential backoff in the remote Alertmanager readiness check

* fix capitalized error

* remove unnecessary 'for'

* refactor, use time.After() instead of channel
2025-01-29 18:53:30 +01:00
Santiago 361312bbd7 Alerting: Expect 406s from the remote Alertmanager during the readiness check (#99507)
* Alerting: Expect 406s from the remote Alertmanager during the readiness check

* make it clear in the warning logs that we'll attempt to send the confgiuration/state without comparing in case of error pulling the current state/config
2025-01-24 16:26:57 +01:00
Santiago 9e408f842c Alerting: Skip sanitizing labels when sending alerts to the remote Alertmanager (#96251)
* Alerting: Skip sanitizing labels when sending alerts to the remote Alertmanager

* fix drone image name
2024-11-13 11:21:44 -03:00
Santiago 4f8f82f5f1 Alerting: Fix remote Alertmanager readiness check path (#95063) 2024-10-21 17:24:51 +02:00
Santiago 4c15266a77 Alerting: Add X-Remote-Alertmanager header to the remote AM client (#94913) 2024-10-17 22:38:13 +02:00
Santiago 80611b381c Alerting: Decrypt secure settings when testing receivers in the remote Alertmanager (#93864)
* Alerting: Decrypt secure settings when testing receivers in the remote Alertmanager

* go work sync

* make update-workspace

* point to latest main in grafana/alerting

* unit test

* import definitions only once
2024-09-30 13:28:30 -03:00
Tito Lins 4a124469fa Check is config is default by comparing hashes (#92296) 2024-08-23 11:22:06 +02:00
Fayzal Ghantiwala 8d725a641c Alerting: Integration test for testing template via remote alertmanager (#92147)
* Add integration test for testing template

* Update drone signature
2024-08-21 13:06:01 +01:00
Fayzal Ghantiwala e321dbb690 Alerting: Use remote Alertmanager to test templates and receivers when enabled (#91570)
* Initial impl

* Add code to test templates and receivers

* Fix linter

* Fix forked am tests

* Update mimir client

* Remove trailing whitespace

* re-trigger CI
2024-08-15 16:56:14 +01:00
Santiago f852bf684a Alerting: Fix duplicated silences in remote primary mode bug (#91902)
* Alerting: Fix duplicated silences in remote primary mode bug

* test that a new silence id returned by calling CreateSilence() on the internal Alertmanager is ignored
2024-08-15 17:14:55 +02:00
Fayzal Ghantiwala 25dbb32cea Alerting: Vendor in latest grafana/alerting package (#91786)
* temp

* vendor

* Remove dead code

* Vendoring
2024-08-12 15:37:15 +01:00
Santiago 5487ea444a Alerting: Update remote Alertmanager config marshalling in test (#91791) 2024-08-12 13:45:22 +02:00
Santiago e097ffc771 Alerting: Update grafana/alerting dependency (#90365)
* update grafana/alerting to latest main
* update alertmanager to  66ec17e3aa45
2024-07-12 14:05:17 -04:00
Santiago 3bb861b9f0 Alerting: Remove empty/namespace labels when sending alerts to the remote Alertmanager (#90284)
* Alerting: Remove empty/namespace labels when sending alerts to the remote Alertmanager

* update tests

* fix typo in comment
2024-07-11 15:20:12 +02:00
Santiago fce03cd724 Alerting: Send static headers to the remote Alertmanager (#89846) 2024-07-01 17:48:40 +02:00
Santiago fe1309dd96 Alerting: Send external URL to the remote Alertmanager (#89701)
* Alerting: Send external URL to the remote Alertmanager

* test that the URL is sent to the remote Alertmanager

* AppURL -> ExternalURL
2024-06-26 14:02:02 +02:00
Steve Simpson 8eabef1f91 Alerting: Update remote alertmanager to use extended receivers API. (#89253)
* Alerting: Update remote alertmanager to use extended receivers API.

* Update integration test and Mimir image

* Update Mimir image in more places
2024-06-19 12:40:22 +03:00
Alexander Weaver 8491e02caf Alerting: Instrument outbound requests for Loki Historian and Remote Alertmanager with tracing (#89185)
* Add TracedClient

* Handle errors and status codes

* Wire up tracing to normal ASH and loki annotation mapping

* Add tracing to remote alertmanager

* one more spot

* and not or

* More consistency with other grafana traces, lower cardinality name
2024-06-14 13:24:12 -05:00
Dave Henderson 6262c56132 chore(perf): Pre-allocate where possible (enable prealloc linter) (#88952)
* chore(perf): Pre-allocate where possible (enable prealloc linter)

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>

* fix TestAlertManagers_buildRedactedAMs

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>

* prealloc a slice that appeared after rebase

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>

---------

Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
2024-06-14 14:16:36 -04:00
Steve Simpson dd3c3b5857 Alerting: Update grafana/alerting. (#88914) 2024-06-14 09:19:04 +02:00
Santiago 12d5251c12 Alerting: Alertmanager configuration sync loop (#88822)
* make the config sync happen on each call to ApplyConfig(), fix tests

* send autogen config

* add fake autogen function for tests

* update stale comments, tidy things up, make linter happy

* add auto-gen routes only if the feature toggle is enabled

* remove unnecessary fake autogen function

* throttle configuration syncs

* restore pkg/services/store/entity/sqlstash/sql_storage_server.go

* test sync loop in ApplyConfig, skip invalid autogen routes

* restore conf/defaults.ini

* restore conf/defaults.ini

* avoid skipping invalid auto-gen routes in SaveAndApplyConfig

* test that autogenFn is called and its errors are returned

* add debug message about the sync interval not having elapsed

* collapse two log lines into one
2024-06-12 10:13:34 +02:00
Santiago cdbc9d801f Alerting: Use the internal Alertmanager to test templates and receivers (remote primary) (#88988) 2024-06-11 11:06:07 +02:00
Santiago 5f4d07bb75 Alerting: Enable remote primary mode using feature toggles (#88976) 2024-06-10 17:07:13 +02:00
Santiago 9f9928d41a Alerting: Update grafana/alerting (#88363)
* Alerting: Update grafana/alerting

* make tests pass by implementing yaml unmarshallers and deleting fields with omitempty in their yaml tags

* go mod tidy

* fix tests by implementing not calling GettableApiAlertingConfig.UnmarshalYAML from GettableApiAlertingConfig.UnmarshalJSON

* cleanup, reduce diff

* fix more tests

* update grafana/alerting to latest commit, delete global section from configs in tests

* bring back YAML unmarshaller for GettableApiAlertingConfig

* update alerting package dependency to point to main

* skip test for sns notifier
2024-06-04 20:29:37 +02:00
Kyle Brandt a738cb42d8 Prometheus: Update dependency to v0.52.0 (#87809)
* Prometheus: Update dependency to v0.52.0

* go work sync

* fix panics in tests

* go work sync

* prometheus v0.52.0

* handle errors

* Update pkg/services/ngalert/sender/sender_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

* Update pkg/services/ngalert/sender/sender_test.go

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>

---------

Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Co-authored-by: ismail simsek <ismailsimsek09@gmail.com>
2024-05-28 15:22:20 +02:00
Steve Simpson ed42119907 Alerting: Pass metrics Registerer into NewExternalAlertmanagerSender. (#88313)
* Alerting: Pass metrics Registerer into NewExternalAlertmanagerSender.

I will work on a separate change to export the metrics from Grafana, this
is a little more complicated.

* Typo
2024-05-24 23:03:34 +02:00
Steve Simpson 8bcf589301 Alerting: Pass logger into NewExternalAlertmanagerSender (#88256) 2024-05-24 20:11:26 +02:00
Santiago e41434c332 Alerting: Promote configuration in the remote Alertmanager (#87388) 2024-05-16 12:06:03 +02:00
Matthew Jacobson babfa2beac Alerting: Hook up GMA silence APIs to new authentication handler (#86625)
This PR connects the new RBAC authentication service to existing alertmanager API silence endpoints.
2024-05-03 15:32:30 -04:00
Santiago b76a9e4d31 Alerting: Implement GetStatus in the remote Alertmanager struct (#84887)
* Alerting: Implement GetStatus in the remote Alertmanager struct

* update tests

* fix tests, extract AlertmanagerConfig from PostableConfig

* get the remote AM config instead of the Grafana one from the remote AM

* pass grafana AM config in test

* return error in GetStatus instead of logging it (internal AM)
2024-05-03 13:59:02 +02:00
Santiago 36a0499128 Alerting: Implement CreateSilence in the forked Alertmanager (remote primary mode) (#85716) 2024-04-29 18:47:25 +02:00
Santiago 1af2e69625 Alerting: Implement DeleteSilence in the forked AM (remote primary) (#85721) 2024-04-29 17:23:41 +02:00
Santiago a6be12c037 Alerting: Implement SaveAndApplyConfig in the forked Alertmanager (remote primary) (#84659)
* Alerting: Implement SaveAndApplyConfiguration in the forked Alertmanager struct

* call SaveAndApplyConfig on the remote first, log errors for the internal

* add comments explaining why we ignore errors in the internal AM

* restore go.work.sum
2024-04-23 15:45:35 +02:00
Santiago c77ab53819 Alerting: implement SaveAndApplyConfig in the remote Alertmanager struct (#84642)
* implement SaveAndApplyConfig in the remote Alertmanager struct

* remove ID from CreateGrafanaAlertmanagerConfig call

* decrypt, test that we decrypt, refactor

* fix duplicated declaration in test

* rephrase comment, remove unnecessary conversion to slice of bytes

* fix test
2024-04-23 14:37:10 +02:00
Santiago 8b7c2a459b Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary mode) (#85668)
* Alerting: Implement SaveAndApplyDefaultConfig in the forked Alertmanager (remote primary)

* log the error for the internal AM instead of returning it
2024-04-23 14:36:40 +02:00
Santiago 309a7e7684 Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct (#85005)
* Alerting: Implement SaveAndApplyDefaultConfig in the remote Alertmanager struct

* send the hash of the encrypted configuration

* tests, default config hash in AM struct

* add missing default config to test

* restore build directory

* go work file...

* fix broken test

* remove unnecessary conversion to []byte

* go work again...

* make things work again with latest main branch changes

* update error messages in tests for decrypting config
2024-04-19 15:11:07 +02:00
Santiago a2ce8fefed Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager (#86451)
* Alerting: Use a struct when sending a Grafana AM configuration to the remote Alertmanager

* remove '-distroless' from mimir image name
2024-04-19 13:04:18 +02:00
Matthew Jacobson f79dd7c7f9 Alerting: Persist silence state immediately on Create/Delete (#84705)
* Alerting: Persist silence state immediately on Create/Delete

Persists the silence state to the kvstore immediately instead of waiting for the
 next maintenance run. This is used after Create/Delete to prevent silences from
 being lost when a new Alertmanager is started before the state has persisted.
 This can happen, for example, in a rolling deployment scenario.

* Fix test that requires real data

* Don't error if silence state persist fails, maintenance will correct
2024-04-09 13:39:34 -04:00
Santiago 2e7cc68394 Alerting: Remove CleanUp method from the Alertmanager (#85650)
Alerting: Remove Cleanup method from the Alertmanager
2024-04-09 12:13:27 +02:00
Matthew Jacobson 0c3c5c5607 Alerting: Stop persisting silences and nflog to disk (#84706)
With this change, we no longer need to persist silence/nflog states to disk in addition to the kvstore
2024-03-23 00:37:33 +02:00
Santiago a2facbecd4 Alerting: Implement ApplyConfig for remote primary mode (forked AM) (#84811)
* Alerting: Implement ApplyConfig for remote primary mode (forked AM)

* add TODO for saving the config hash in other config-related methods

* fix bad method receiver name (m -> am)

* tests

* add mutex

* remove sync loop
2024-03-22 15:17:41 +01:00
Santiago 4ad6d66479 Alerting: Remove ID from UserGrafanaConfig struct (#84602)
* Alerting: Remove ID from UserGrafanaConfig struct

* user custom mimir image withoud id in grafana config

* change mimir image name
2024-03-19 12:47:13 +01:00
Santiago c9bb18101c Alerting: Decrypt secrets before sending configuration to the remote Alertmanager (#83640)
* (WIP) Alerting: Decrypt secrets before sending configuration to the remote Alertmanager

* refactor, fix tests

* test decrypting secrets

* tidy up

* test SendConfiguration, quote keys, refactor tests

* make linter happy

* decrypt configuration before comparing

* copy configuration struct before decrypting

* reduce diff in TestCompareAndSendConfiguration

* clean up remote/alertmanager.go

* make linter happy

* avoid serializing into JSON to copy struct

* codeowners
2024-03-19 12:12:03 +01:00
Santiago fbbda6c05e Alerting: Retry readiness check to the remote Alertmanager on 5xx status code responses (#81174) 2024-01-24 21:39:06 +01:00
Santiago 3217a0dc05 Alerting: Fix state sync errors counter increment (#80702) 2024-01-17 11:04:27 +01:00
Santiago 6c87d9a1e7 Alerting: Stop retries on 4xx status code responses (remote Alertmanager readiness check) (#80350) 2024-01-11 12:12:35 +01:00
Santiago 9e78faa7ba Alerting: Add metrics to the remote Alertmanager struct (#79835)
* Alerting: Add metrics to the remote Alertmanager struct

* rephrase http_requests_failed description

* make linter happy

* remove unnecessary metrics

* extract timed client to separate package

* use histogram collector from dskit

* remove weaveworks dependency

* capture metrics for all requests to the remote Alertmanager (both clients)

* use the timed client in the MimirAuthRoundTripper

* HTTPRequestsDuration -> HTTPRequestDuration, clean up mimir client factory function

* refactor

* less git diff

* gauge for last readiness check in seconds

* initialize LastReadinesCheck to 0, tweak metric names and descriptions

* add counters for sync attempts/errors

* last config sync and last state sync timestamps (gauges)

* change latency metric name

* metric for remote Alertmanager mode

* code review comments

* move label constants to metrics package
2024-01-10 11:18:24 +01:00