Commit Graph

1530 Commits

Author SHA1 Message Date
Yuri Tseretyan ee78bb653f Alerting: Log rule evaluation error in scheduler (#91585) 2024-08-06 19:27:02 +03:00
Matthew Jacobson 53cfdf0ef8 Alerting: Remove option to return settings from api/v1/receivers and restrict provisioning action access (#90861)
* Remove provisioning action access to v1/receivers api

* Separate ListOnly functionality to its own method without decryption
2024-08-05 11:49:23 -04:00
AvivGuiser 93aa5a56ad Alerting: Use stable identifier of a group,contact point,mute timing when export to HCL (#90917)
---------

Signed-off-by: Aviv Guiser <avivguiser@gmail.com>
2024-08-05 09:56:17 -04:00
Matthew Jacobson a397bca02e Alerting: Fix panic with nil annotations & Nodata=alerting/ok/keep (#91506) 2024-08-02 22:15:57 +03:00
Alexander Weaver 72ecde5045 Alerting: Make orgID a direct arg of writer interface (#91422)
make orgID a direct arg of writer interface
2024-08-02 09:37:28 -05:00
Alexander Akhmetov 3952f627eb Alerting: Parse secret fields case-insensitively when creating or updating a contact point (#90968)
* Alerting: Handle case-insensitive secret fields in contact point settings
2024-08-01 19:03:47 +02:00
William Wernert a1ee84f757 Alerting: Remove duplicate tracing middleware from prom writer (#91353)
Remove duplicate tracing middleware from prom writer
2024-08-01 11:57:14 -04:00
Ieva 2e2ddc5c42 Folders: Allow folder editors and admins to create subfolders without any additional permissions (#91215)
* separate permissions for root level folder creation and subfolder creation

* fix tests

* fix tests

* fix tests

* frontend fix

* Update pkg/api/accesscontrol.go

Co-authored-by: Eric Leijonmarck <eric.leijonmarck@gmail.com>

* fix frontend when action sets are disabled

---------

Co-authored-by: Eric Leijonmarck <eric.leijonmarck@gmail.com>
2024-08-01 18:20:38 +03:00
Yuri Tseretyan 537f1fb857 Alerting: Fix persisting result fingerprint that is used by recovery threshold (#91224)
* fix persister to save result fingerprint

* revert change

* fmt
2024-07-30 18:07:13 -04:00
Nihal 9ad9b4989b Alerting: Include a list of ref_Id and aggregated datasource UIDs to alerts when state reason is NoData (#88819)
* include a list of ref_Id and datasource UID to alerts when state reason is NoData. 

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-07-30 12:55:59 -04:00
Alexander Weaver 4c71cadd5f Alerting: Detach condition validator from condition evaluator (#91150)
* Detach validator from evaluator

* Drop unnecessary interface and type
2024-07-30 10:55:37 -05:00
github-actions[bot] 66b1a219f4 Alerting: Update Swagger spec (#79850)
* chore: update alerting swagger spec
* update public swagger

---------

Co-authored-by: rwwiv <rwwiv@users.noreply.github.com>
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2024-07-30 18:17:23 +03:00
Yuri Tseretyan 2023821100 Alerting: update Loki backend of state history to batch requests by folder (#89865)
* refactor `selectorString` and remove Selector struct

* move code from selector string to BuildLogQuery

* batch requests by folder UID

* update historian annotation store to handle multiple queries

* sort folder uids to make consistent queries

* add logs to loki http

* log batch size but not content. content is logged by the client
2024-07-30 11:07:10 -04:00
Yuri Tseretyan 8323b688c6 Alerting: Improve logging in scheduler and states (#91003)
* handle metadata map nil

* remove double context

* clean up logging in scheduler

* do not reuse loggers from previous ticks

* log the dropped tick

* log tick instead of ticknum

* replace with processing tick logs

* log sending notifications

* update logging in persister to fetch context

* logs to historian

moved them upstream to be able to log when store is overridden
2024-07-29 16:01:48 -04:00
Matthew Jacobson 62f67e38b8 Alerting: Implement receiver auth service (#90857) 2024-07-29 15:49:10 -04:00
Yuri Tseretyan 34dbfefc86 Alerting: Template service to check for provenance status of update\delete (#90688) 2024-07-29 14:10:03 -04:00
Matthew Jacobson a1f0b599a7 Alerting: Refactor receiver_svc and provisioning config store into legacy_storage package (#90856)
* Add more receivers api tests

* Move provisioning config store to new legacy_storage package
2024-07-26 17:45:33 -04:00
Yuri Tseretyan 6b0d20c96a Alerting: time interval service to support addressing intervals by Base64 encoded name (#90563)
* rename to getMuteTimingByName

* add UID to api model of MuteTiming

* update GetMuteTiming to search by UID

* update UpdateMuteTiming to support search by UID

* update DeleteMuteTiming to support uid

* make sure UID is populated

* update usages

* use base64 url-safe, no padding encoding for UID
2024-07-26 16:43:40 -04:00
Alexander Weaver b7220b532e Alerting: Fix bug where patching recording rule queries wouldn't apply (#91011)
* the fix

* tests
2024-07-26 11:02:54 -05:00
Ryan McKinley be7b1ce2df Chore: Replace appcontext.User(ctx) with identity.GetRequester(ctx) (#91030) 2024-07-26 16:39:23 +03:00
Ryan McKinley 9db3bc926e Identity: Rename "namespace" to "type" in the requester interface (#90567) 2024-07-25 12:52:14 +03:00
Sven Grossmann 94dd4105e2 Loki: Allow alert headers to be forwarded (#90890)
* Loki: Allow alert headers to be forwarded

* Loki: fix tests

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2024-07-25 07:39:34 +02:00
Santiago b79b38f02c Alertmanager: Support limits for silences (#90826)
* Alertmanager: support limits for silences

* update grafana/alerting to latest main
2024-07-24 14:22:29 +02:00
William Wernert 45f298120e Alerting: Return error when writing recorded metrics instead of default writing NaN (#90743)
* Return error instead of default writing NaN
2024-07-22 15:47:02 -04:00
AvivGuiser 96c3e9c550 Alerting: Use stable identifier of a group when export to HCL (#90196)
* change the rule-group to be hashed when exporting to HCL

Signed-off-by: Aviv Guiser <avivguiser@gmail.com>

---------

Signed-off-by: Aviv Guiser <avivguiser@gmail.com>
2024-07-19 18:13:26 +03:00
Alexander Weaver 418b077c59 Alerting: Integration testing for recording rules including writes (#90390)
* Add success case and tests for writer using metrics

* Use testable version of clock

* Assert a specific series was written

* Fix linter

* Fix manually constructed writer
2024-07-18 17:14:49 -05:00
Alexander Weaver 0e269db8a9 Alerting: Expose recordingWriter on ngalert (#90573)
Expose recordingWriter on ngalert
2024-07-18 13:24:06 -05:00
Yuri Tseretyan 09e10ae9e0 Alerting: Update State history API Open API documentation (#89795) 2024-07-18 10:37:05 -04:00
Alexander Weaver 88ed77e7e8 Alerting: More graceful handling of NoData in recording rules (#90312)
* Handle NoData as its own case

* Debug

* Scalars parseable by CollectionReader

* fix linter

* Orgit add pkg/*git add pkg/* not and
2024-07-17 15:24:03 -05:00
Yuri Tseretyan c3b9c9b239 Alerting: Send information about alert rule to data source in headers (#90344)
* add support of metadata to condition and adding it to request headers
* support for additional metadata when condition is built
* add additionall context to conditions: source and folder title
* add version
* use percent-encoding for header values
2024-07-17 22:55:12 +03:00
Yuri Tseretyan 970cafa20f Alerting: Time interval Delete API to check for usages in alert rules (#90500)
* Check if a time interval is used in alert rules before deleting it
* Add time interval to parameters of ListAlertRulesQuery and ListNotificationSettings of DbStore

== Refacorings == 
* refactor isMuteTimeInUse to accept a single route
* update getMuteTiming to not return err
* update delete to get the mute timing from config first
2024-07-17 10:53:54 -04:00
Matthew Jacobson b7f422b68d Alerting: Receiver API Get+List+Delete (#90384) 2024-07-16 10:02:16 -04:00
Yuri Tseretyan 9c05b30489 Chore: Add more logs and tracing to hysteresis flows (#90369) 2024-07-15 13:38:20 -04:00
Santiago e097ffc771 Alerting: Update grafana/alerting dependency (#90365)
* update grafana/alerting to latest main
* update alertmanager to  66ec17e3aa45
2024-07-12 14:05:17 -04:00
Matthew Jacobson ba800692c6 Alerting: Persist AlertInstance ResolvedAt & LastSentAt (#89135)
* Alerting: Persist AlertInstance ResolvedAt & LastSentAt

* Fix test

* Modify existing tests

* Fix merge conflicts from nullable LastSentAt & ResolvedAt
2024-07-12 12:26:58 -04:00
Matthew Jacobson b7767c79e7 Alerting: Fix contact point export 500 error and notifications/receivers missing settings (#90342)
* Regression test

* Fix 500 error when exporting redacted receivers

* Fix tests to check permissions
2024-07-12 11:42:22 -04:00
Kristin Laemmert 8a6107cd35 DashboardStore: Use ReplDB and get dashboard quotas from the ReadReplica (#90235)
* Use ReplDB in dashboard store and update all fixtures - no other changes

* just moving dashboard counts for now

* find the missing test fixture
2024-07-12 10:47:49 -04:00
Alexander Weaver 111ebd4fb2 Alerting: Create integration testing infra for recording rules (#90306)
* Create some integration testing infra for RRs

* whoops

* Require no error in responding

* fix linter

* Panic, no need to pass testing around

* Extend status test
2024-07-11 14:59:52 -05:00
Alexander Weaver ab32183e18 Alerting: Track recording rule health and last eval info ephemerally (#90247)
* Track health and last eval info

* Read method for status

* Minor tests
2024-07-11 14:05:09 -05:00
Santiago 3bb861b9f0 Alerting: Remove empty/namespace labels when sending alerts to the remote Alertmanager (#90284)
* Alerting: Remove empty/namespace labels when sending alerts to the remote Alertmanager

* update tests

* fix typo in comment
2024-07-11 15:20:12 +02:00
Yuri Tseretyan 5ae5fa3a7a Alerting: Support field selectors in time interval API (#90022)
* fix kind of TimeInterval
* register custom fields for selectors
* support field selectors in legacy storage
* support selectors in storage

===== Misc
* refactor conversions to build in one place
* hide implementation of provenance status behind accessors to use the key in selectors
* fix provenance error
2024-07-08 22:45:30 +03:00
Alexander Weaver 3b6a8775bb Alerting: Fix stale values associated with states that have gone to NoData, unify values calculation (#89807)
* Unify values

* Fix with latest changes on main

* Fix up NaN test

* Keep refIDs with -1 as value

* Test that refIDs are preserved on Normal to Error transition

* Alerting to err test too

* Add a blurb to docs about this behavior
2024-07-08 12:30:23 -05:00
Steve Simpson e9fd191065 Alerting: Fix some status codes returned from provisioning API. (#90117)
The contact point deletion API was returning 500 when it should have been
returning a 4xx error, when the contact point is in use:

- When in use by a notificiation policy, we were missing
  the `.Errorf("")` to convert `errutil.Base` into `errutil.Error`.
- When in use by an alert rule, an regular error was returned.
2024-07-05 19:06:37 +02:00
Alexander Zobnin 87d86e81ce Zanzana: Evaluate permissions alongside with RBAC engine (#90064)
* Zanzana: Evaluate permissions if feature flag enabled

* Fix tests

* adjust logs

* fix spelling

* remove unused

* only evaluate implemented resources

* refactor
2024-07-05 11:31:23 +02:00
Yuri Tseretyan 411bab6d44 Alerting: Lower severity of logs about duplicates to debug (#89971)
lower severity of logs about duplicates to debug
2024-07-03 16:46:28 -04:00
Yuri Tseretyan c3b5cabb14 Alerting: Refactor scheduler's rule evaluator to store rule key (#89925) 2024-07-01 16:43:23 -04:00
Yuri Tseretyan 655e477c20 Alerting: Fix flaky test in scheduler's tests (#89923) 2024-07-01 13:31:03 -04:00
Santiago fce03cd724 Alerting: Send static headers to the remote Alertmanager (#89846) 2024-07-01 17:48:40 +02:00
Yuri Tseretyan 559738ce6a Alerting: Fix flaky test in historian (#89913) 2024-07-01 16:59:06 +03:00
Gabriel MABILLE 71d31397e5 Fix flaky tests (#89910) 2024-07-01 14:39:51 +02:00