* make legacy store expose only model.Receiver
* use integration as provenance type provider
* use revision RenameReceiverInRoutes
* introduce function GetReceiversNames in config revision
---------
Co-authored-by: Matthew Jacobson <matthew.jacobson@grafana.com>
* add VersionedNotifierPlugin and method that converts NotifierPlugin to it
* return new schema if query parameter version=2
* add version to k8s model of integration
* fix open api snapshot
* add version to IntegrationConfig
* use current version on conversion
* create versioned integrations for test
Recording rule fields were not being copied correctly when duplicating an alert rule. This manifests as missing `TargetDataSourceUID` fields from the `Record` part of the rule when rules in a group are re-ordered.
Added some additional tests to ensure we cover the generation of recording rules in tests and fixed the copying logic to ensure all fields are copied correctly.
* Alerting: Add Extended List Query for Alert Rules w/pagination
This adds an extended query which allows filtering by the kind of rule (Recording or Alerting) and supports pagination.
Pagination tokens will allow us to list all rules paginated, regardless of the rule group.
---------
Co-authored-by: William Wernert <william.wernert@grafana.com>
This reintroduces store level pagination, without using it in the prometheus API yet.
Related to #108633
Co-authored-by: William Wernert <william.wernert@grafana.com>
* Add to available channels
* Export
* Fix bug in deeply nested secrets
BE: Slice re-use bug when traversing deeply.
FE: Only at most one level of nesting was being taken into account
when determining secureFields keys. This change adds a new field on
NotificationChannelOption: secureFieldKey. This is populated on API GET via
transform. This change gives us the option to hardcode secureFieldKey in the
backend and no longer calculate the key via settings topology.
* Update grafana/alerting to 3e20fda3b872
* Prettier
* Linting
* Fix IntegrationConfig test to catch secure field mismatch
* add active_time_intervals to route model
* update k8s compat layer
* update notification policies service to validate active time intervals
* update integration tests
* update openapi
* add active time interval to model
* update route generator to include active time interval
* Update storage list and rename methods to handle active intervals
* update api model
* update provisioning and export models
* update ui to allow active timing config
* update i18n
* fix snapshots for ui tests
* run prettier
* Alerting: Active time intervals UI naming (#104402)
* update naming in UI
* update naming in the edit page title
* update translations
* update alerting module
---------
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>
* Ensure field validators return the proper type
This ensures correct error propagation through services up to
the API layer.
* Move error wrapping up to call site
What is this feature?
A follow-up for #101184, adds AlertRule.MissingSeriesEvalsToResolve to the APIs.
missing_series_evals_to_resolve must be specified too and it must be > 0.
POST /api/ruler/grafana/api/v1/rules/{folderUID} works in the following way:
If missing_series_evals_to_resolve is not sent or null, the rule keeps its existing value
If missing_series_evals_to_resolve > 0: updates to that value
If missing_series_evals_to_resolve = 0: resets to default (nil).
AlertRule.MissingSeriesEvalsToResolve can't be 0, so I used it to reset
In the Provisioning API, the value is just set if present and > 0. Otherwise it's reset:
PUT to /api/v1/provisioning/alert-rules/{UID}:
If missing_series_evals_to_resolve is nil, it's reset to the default value
If missing_series_evals_to_resolve > 0, it's updated
What is this feature?
Prometheus conversion: ensures that AlertRule.Data queries always have default parameters set (intervalMs, maxDataPoints). Without this, updates of the same rule can cause version increments.
Why do we need this feature?
Currently, when converting Prometheus rules to Grafana alerts, some default parameters are not explicitly set in the query model. This creates a problem during rule updates:
When a user updates a rule that hasn't changed, we still detect differences in the AlertQuery.Model because the newly converted rules are missing the default fields, such as intervalMs and maxDataPoints. This causes unnecessary version increments of alert rules.
What is this feature?
This PR introduces a new alert rule configuration option, keep_firing_for (Prometheus documentation).
keep_firing_for prevents alerts from resolving immediately after the alert condition returns to normal. Instead, they transition into a "Recovering" state and are not considered resolved by the Alertmanager. Once the recovery period ends (or after the next evaluation if it is bigger than keep_firing_for), the alert transitions to "Normal" if it doesn't start alerting again:
Before
+----------+ +----------+
| Alerting |---->| Normal |
+----------+ +----------+
-----
After
+----------+ +------------+ +----------+
| Alerting |----->| Recovering |---->| Normal |
+----------+ +------------+ +----------+
Why do we need this feature?
This feature prevents flapping alerts by adding a recovery period. This helps avoid false resolutions caused by brief alert
Extend the recording rule definition to include the target data source, allowing
configuration of where the output of the recording rule is written to. Also
extends the relevant interfaces in preparation for the next set of changes.
* add column guid to alert rule table and rule_guid to rule version table
+ populate the new field with UUID
* update storage and domain models
* patch GUID
* ignore GUID in fingerprint tests
Initially, Metadata had only the EditorSettings, and HasMetadata was used to understand if the incoming update request had metadata in the body because it could be omitted if it was empty. For example, when the rule is updated via the provisioning API or has only false values. If it was in the request, we used that; if not, we used the metadata from the existing rule from the database. If the rule was updated via the AlertRuleService, we didn't change Metadata at all if the rule already existed.
But now, Metadata also has the Prometheus rule definition, and we always need to update it with the new version of the AlertRuleService when the rule exists in the DB and has the same UID. HasMetadata is renamed to HasEditorSettings to keep the old behaviour only for EditorSettings.
Now, the provisioning API and the conversion API will overwrite everything except EditorSettings with the new data.
What is this feature?
Adds an API endpoint to create alert rules with mimirtool:
- POST /convert/prometheus/config/v1/rules/{NamespaceTitle} - Accepts a single rule group in a Prometheus YAML format and creates or updates a Grafana rule group from it.
The endpoint uses the conversion package from #100224.
Key parts
The API works similarly to the provisioning API. If the rule does not exist, it will be created, otherwise updated. Any rules not present in the new group will be deleted, ensuring the group is fully synchronized with the provided configuration.
Since the API works with namespace titles (folders), the handler automatically creates a folder in the root based on the provided title if it does not exist. It also requires a special header, X-Grafana-Alerting-Datasource-UID. This header specifies which datasource to use for the new rules.
If the rule group's evaluation interval is not specified, it uses the DefaultRuleEvaluationInterval from settings.
* Send new annotation containing image url
* Use new image TokenProvider with TokenStore
New abstraction GetImage no longer needs to support parsing both token and
url from annotations, as remote AM will use the new URLProvider. Instead, we
use the new generic TokenProvider and give it a TokenStore backed by the
grafana database.
That means we revert back to always using token simplifying code and security
considerations.
* Upgrade grafana/alerting to merged commit SHA
* Alerting: Fix Alertmanager configuration updates
Alertmanager configuration updates would behave inconsistently when performing no-op updates with `mysql` as the store.
In particular this bug manifested as a failure to reload the provisioned alertmanager configuration components with no changes to the configuration itself. This would result in a 500 error with mysql store only.
The core issue is that we were relying on the number of rows affected by the update query to determine if the configuration was found in the db or not.
While this behavior works for certain sql dialects, mysql does not return the number of rows matched by the update query but rather the number of rows actually updated.
Also discovered and fixed the mismatched `xorm` tag for the `CreatedAt` field to match the actual column name in the db.
References: https://dev.mysql.com/doc/refman/8.4/en/update.html
* introduce new fields created_by in rule tables
* update domain model and compat layer to support UpdatedBy
* add alert rule generator mutators for UpdatedBy
* ignore UpdatedBy in diff and hash calculation
* Add user context to alert rule insert/update operations
Updated InsertAlertRules and UpdateAlertRules methods to accept a user context parameter. This change ensures auditability and better tracking of user actions when creating or updating alert rules. Adjusted all relevant calls and interfaces to pass the user context accordingly.
* set UpdatedBy in PreSave because this is where Updated is set
* Use nil userID for system-initiated updates
This ensures differentiation between system and user-initiated changes for better traceability and clarity in update origins.
---------
Signed-off-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>