Commit Graph

7 Commits

Author SHA1 Message Date
Steve Simpson 43a246f431 Alerting: Improve performance of /api/prometheus for large numbers of alerts. (#89268)
* Alerting: Optimize sorting alert instances.

* Also change other Labels fields for consistency
2024-06-17 12:25:47 +02:00
Steve Simpson f07f48616a Alerting: Fix panic when limit_alerts=0. (#86640)
Oversight in the TopK function meant if k=0, then we'd panic when checking
element zero in the heap, because no items are ever allowed into the heap.
2024-04-22 10:14:19 +02:00
Steve Simpson 6ea97e41fb Alerting: Consistently return Prometheus-style responses from rules APIs. (#86600)
* Alerting: Consistently return Prometheus-style responses from rules APIs.

This commit is part refactor and part fix. The /rules API occasionally returns
error responses which are inconsistent with other error responses. This fixes
that, and adds a function to map from Prometheus error type and HTTP code.

* Fix integration tests

* Linter happiness

* Make linter more happy

* Fix up one more place returning non-Prometheus responses
2024-04-19 21:03:20 +02:00
Steve Simpson 73873f5a8a Alerting: Optimize rule status gathering APIs when a limit is applied. (#86568)
* Alerting: Optimize rule status gathering APIs when a limit is applied.

The frontend very commonly calls the `/rules` API with `limit_alerts=16`. When
there are a very large number of alert instances present, this API is quite
slow to respond, and profiling suggests that a big part of the problem is
sorting the alerts by importance, in order to select the first 16.

This changes the application of the limit to use a more efficient heap-based
top-k algorithm. This maintains a slice of only the highest ranked items whilst
iterating the full set of alert instances, which substantially reduces the
number of comparisons needed. This is particularly effective, as the
`AlertsByImportance` comparison is quite complex.

I've included a benchmark to compare the new TopK function to the existing
Sort/limit strategy. It shows that for small limits, the new approach is
much faster, especially at high numbers of alerts, e.g.

100K alerts / limit 16: 1.91s vs 0.02s (-99%)

For situations where there is no effective limit, sorting is marginally faster,
therefore in the API implementation, if there is either a) no limit or b) no
effective limit, then we just sort the alerts as before. There is also a space
overhead using a heap which would matter for large limits.

* Remove commented test cases

* Make linter happy
2024-04-19 11:51:22 +02:00
George Robinson 19ebb079ba Alerting: Add limits and filters to Prometheus Rules API (#66627)
This commit adds support for limits and filters to the Prometheus Rules
API.

Limits:

It adds a number of limits to the Grafana flavour of the Prometheus Rules
API:

- `limit` limits the maximum number of Rule Groups returned
- `limit_rules` limits the maximum number of rules per Rule Group
- `limit_alerts` limits the maximum number of alerts per rule

It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals)
for all Rule Groups, individual Rule Groups and rules.

Filters:

Alerts can be filtered by state with the `state` query string. An example
of an HTTP request asking for just firing alerts might be
`/api/prometheus/grafana/api/v1/rules?state=alerting`.

A request can filter by two or more states by adding additional `state`
query strings to the URL. For example `?state=alerting&state=normal`.

Like the alert list panel, the `firing`, `pending` and `normal` state are
first compared against the state of each alert rule. All other states are
ignored. If the alert rule matches then its alert instances are filtered
against states once more.

Alerts can also be filtered by labels using the `matcher` query string.
Like `state`, multiple matchers can be provided by adding additional
`matcher` query strings to the URL.

The match expression should be parsed using existing regular expression
and sent to the API as URL-encoded JSON in the format:

{
    "name": "test",
    "value": "value1",
    "isRegex": false,
    "isEqual": true
}

The `isRegex` and `isEqual` options work as follows:

| IsEqual | IsRegex  | Operator |
| ------- | -------- | -------- |
| true    | false    |    =     |
| true    | true     |    =~    |
| false   | true     |    !~    |
| false   | false    |    !=    |
2023-04-17 17:45:06 +01:00
George Robinson bd29071a0d Revert "Alerting: Add limits to the Prometheus Rules API" (#65842) 2023-04-03 15:20:37 +00:00
George Robinson d96b0a71d3 Alerting: Add limits to the Prometheus Rules API (#65169)
This commit adds a number of limits to the Grafana flavor of the
Prometheus Rules API:

1. `limit` limits the maximum number of Rule Groups returned
2. `limit_rules` limits the maximum number of rules per Rule Group
3. `limit_alerts` limits the maximum number of alerts per rule

It sorts Rule Groups and rules within Rule Groups such that data in the
response is stable across requests. It also returns summaries (totals) for
all Rule Groups, individual Rule Groups and rules.
2023-04-03 10:17:02 +01:00