* Alerting: Use default_datasource_uid as the default target for recording rules
* Add tests
---------
Co-authored-by: Konrad Lalik <konradlalik@gmail.com>
pkgs/tsdb/[grafana-pyroscope-datasource|parca]: Fix use of request headers in responses
In the parca and the grafana-pyroscope-datasource we were wrongly using the request headers instead of the response
header when communication the results to the backend.
This PR fixes this bug.
Was reported by an user via community slack, who faced issues, with a request header of `content-length: 0` being
inserted by a intermediate proxy.
What is this feature?
Ensures that resolved notifications are sent when alert states transition from Error to Normal after the configured number of evaluation intervals: Missing series evaluations to resolve.
Why do we need this feature?
Before this change, when an alert was transitioning from Error to Normal, in case when the labels on the new Normal alert instance are the same, Grafana would not send resolved notifications for the Error alert state. The alert would be resolved after a few evaluation intervals automatically in the alertmanager, following the endsAt.
With this change the resolved notification is sent after the configured number of evaluation intervals: Missing series evaluations to resolve.
* SCIM: fix provisioned user role assignment from SAML assertion
* revert org_sync_test changes
* clean up tests
* skip user lookup during org sync
* sanitize log output
* only log non-sensitive fields
What is this feature?
Fixes a bug when group-level query_offset and labels parameters are ignored and not saved
Why do we need this feature?
In the import API Prometheus YAML rule definitions are supported:
groups:
- name: group-1
interval: 1m
query_offset: 10m
labels:
severity: "warning"
rules:
- alert: Alert 0 > 0
expr: vector(0) > 0
But applying group-level labels and query_offset is broken and they are not saved right now because during the conversion of the API model to PrometheusRuleGroup they aren't saved to the new structure.
* Zanzana: Split client and server logs
* Zanzana: Improve error handling and logging
* log internal error at the server side
* refactor
* improve errors for list request
* update go modules
* handle errors for read and write
* refactor
* reset go.mod changes
* Alerting: Optimize prometheus api permission checks
This improves the performance of the Prometheus API by performing the permission checks for rule read permission in a folder upfront, rather than checking permissions for each rule group individually. This reduces the number of permission checks and should speed up the API response time.
* refactor vars
---------
Co-authored-by: Konrad Lalik <konradlalik@gmail.com>
When a rule configured with `ExecErrState` state of `Alerting`, has an instance which is Alerting then has a data source error, then successfully evaluates and continues to be Alerting, the cached instance keeps the error cached until it is no longer firing.
This is unexpected and leads to misleading results.
* ds-querier: add new metric for the total request
Co-authored-by: Sarah Zinger <sarah.zinger@grafana.com>
* fix logger and trace
Co-authored-by: Sarah Zinger <sarah.zinger@grafana.com>
* ds-querier: rewrite downstream 500s to 400
---------
Co-authored-by: Sarah Zinger <sarah.zinger@grafana.com>
What is this feature?
This PR fixes the MissingSeriesEvalsToResolve behavior when it's set to more than 4 evaluation intervals.
Why do we need this feature?
The MissingSeriesEvalsToResolve setting was not working correctly due to alerts being auto-resolved by Alertmanager after 4 evaluation intervals (via the endsAt field).
Before we had deleteStaleStatesFromCache method that was returning only stale states that had to be resolved. Non-stale states for which the current evaluation does not have a series never had endsAt updated and were never resend to the Alertmanager, so they were automatically resolved after 4 evaluations regardless of the setting.
The new processMissingSeriesStates returns state for each missing series on every evaluation, and resolves the stale ones. This guarantees that alerts without series still alert for the configured number of evaluations.