grafana

Author	SHA1	Message	Date
Moustafa Baiou	74e800e427	Alerting: Add provenance to Prometheus API (#106596 ) This commit adds provenance information to the Prometheus API in the ngalert service to enable compatibility with the new alert list page.	2025-06-17 23:31:42 +02:00
Moustafa Baiou	9f07e49cdd	Alerting: Add extended definition to prometheus alert rules api (#103320 ) * Alerting: Add extended definition to prometheus alert rules api This adds `isPaused` and `notificationSettings` to the paginated rules api to enable the paginated view of GMA rules. refactor: make alert rule status and state retrieval extensible This lets us get status from other sources than the local ruler. * update swagger spec * add safety checks in test	2025-04-23 21:14:09 +01:00
Konrad Lalik	0a8dccc19a	Alerting: New alert list filter improvements (#103107 ) * Move filtering code to generators for performance reasons Discarding rules and groups early in the iterable chain limits the number of promises we need to wait for which improves performance significantly * Add error handling for generators * Add support for data source filter for GMA rules * search WIP fix * Fix datasource filter * Move filtering back to filtered rules hook, use paged groups for improved performance * Add queriedDatasources field to grafana managed rules and update filtering logic to rely on it - Introduced a new field `queriedDatasources` in the AlertingRule struct to track data sources used in rules. - Updated the Prometheus API to populate `queriedDatasources` when creating alerting rules. - Modified filtering logic in the ruleFilter function to utilize the new `queriedDatasources` field for improved data source matching. - Adjusted related tests to reflect changes in rule structure and filtering behavior. * Add FilterView performance logging * Improve GMA Prometheus types, rename queried datasources property * Use custom generator helpers for flattening and filtering rule groups * Fix lint errors, add missing translations * Revert test condition * Refactor api prom changes * Fix lint errors * Update backend tests * Refactor rule list components to improve error handling and data source management - Enhanced error handling in FilterViewResults by logging errors before returning an empty iterable. - Simplified conditional rendering in GrafanaRuleLoader for better readability. - Updated data source handling in PaginatedDataSourceLoader and PaginatedGrafanaLoader to use new individual rule group generator. - Renamed toPageless function to toIndividualRuleGroups for clarity in prometheusGroupsGenerator. - Improved filtering logic in useFilteredRulesIterator to utilize a dedicated function for data source type validation. - Added isRulesDataSourceType utility function for better data source type checks. - Removed commented-out code in PromRuleDTOBase for cleaner interface definition. * Fix abort controller on FilterView * Improve generators filtering * fix abort controller * refactor cancelSearch * make states exclusive * Load full page in one loadResultPage call * Update tests, update translations * Refactor filter status into separate component * hoist hook * Use the new function for supported rules source type --------- Co-authored-by: Gilles De Mey <gilles.de.mey@gmail.com>	2025-04-11 10:02:34 +02:00
Alexander Akhmetov	695ac91290	Alerting: Add backend support for keep_firing_for (#100750 ) What is this feature? This PR introduces a new alert rule configuration option, keep_firing_for (Prometheus documentation). keep_firing_for prevents alerts from resolving immediately after the alert condition returns to normal. Instead, they transition into a "Recovering" state and are not considered resolved by the Alertmanager. Once the recovery period ends (or after the next evaluation if it is bigger than keep_firing_for), the alert transitions to "Normal" if it doesn't start alerting again: Before +----------+ +----------+ \| Alerting \|---->\| Normal \| +----------+ +----------+ ----- After +----------+ +------------+ +----------+ \| Alerting \|----->\| Recovering \|---->\| Normal \| +----------+ +------------+ +----------+ Why do we need this feature? This feature prevents flapping alerts by adding a recovery period. This helps avoid false resolutions caused by brief alert	2025-03-18 11:24:48 +01:00
Konrad Lalik	5aeaccadff	Alerting: Add read-only GMA rules to the new list view (#98116 ) * Reuse prom groups generator between GMA, external DS and list view * Improve generators, add initial support for GMA in grouped view components * Improve handling of GMA rules * Split componentes into files * Improve error handling, simplify groups grouping * Extract grafana rules component * Reset yarn.lock * Reset yarn.lock 2 * Update filters, adjust file names, add folder display name to GMA rules * Re-enable filtering for cloud rules * Rename AlertRuleLoader * Add missing translations, fix lint errors * Remove unused imports, update translations * Fix responses in BE tests * Update backend tests * Update integration test * Tidy up group page size constants * Add error throwing to getGroups endpoint to prevent grafana usage * Refactor FilterView to remove exhaustive check * Refactor common props for grafana rule rendering * Unify identifiers' discriminators, add comments, minor refactor * Update translations * Remove unnecessary prev page condition, add a few explanations --------- Co-authored-by: fayzal-g <fayzal.ghantiwala@grafana.com> Co-authored-by: Tom Ratcliffe <tom.ratcliffe@grafana.com>	2025-01-15 11:36:32 +01:00
Fayzal Ghantiwala	5a143be653	Alerting: Add pagination to /api/prometheus/grafana/api/v1/rules (#95959 ) * Intermediate step before refactoring * Sort groups to paginate on them * Formatting and improved test * Address comments * Update tests	2024-11-08 16:58:14 +00:00
Steve Simpson	43a246f431	Alerting: Improve performance of /api/prometheus for large numbers of alerts. (#89268 ) * Alerting: Optimize sorting alert instances. * Also change other Labels fields for consistency	2024-06-17 12:25:47 +02:00
Alexander Weaver	58fdb24b0b	Alerting: Recording rules appear as type=recording in Prometheus API + better abstraction for type (#88805 ) * Wire status through to prom API * Regenerate swagger	2024-06-07 11:24:06 -05:00
Steve Simpson	f07f48616a	Alerting: Fix panic when limit_alerts=0. (#86640 ) Oversight in the TopK function meant if k=0, then we'd panic when checking element zero in the heap, because no items are ever allowed into the heap.	2024-04-22 10:14:19 +02:00
Steve Simpson	6ea97e41fb	Alerting: Consistently return Prometheus-style responses from rules APIs. (#86600 ) * Alerting: Consistently return Prometheus-style responses from rules APIs. This commit is part refactor and part fix. The /rules API occasionally returns error responses which are inconsistent with other error responses. This fixes that, and adds a function to map from Prometheus error type and HTTP code. * Fix integration tests * Linter happiness * Make linter more happy * Fix up one more place returning non-Prometheus responses	2024-04-19 21:03:20 +02:00
Steve Simpson	73873f5a8a	Alerting: Optimize rule status gathering APIs when a limit is applied. (#86568 ) * Alerting: Optimize rule status gathering APIs when a limit is applied. The frontend very commonly calls the `/rules` API with `limit_alerts=16`. When there are a very large number of alert instances present, this API is quite slow to respond, and profiling suggests that a big part of the problem is sorting the alerts by importance, in order to select the first 16. This changes the application of the limit to use a more efficient heap-based top-k algorithm. This maintains a slice of only the highest ranked items whilst iterating the full set of alert instances, which substantially reduces the number of comparisons needed. This is particularly effective, as the `AlertsByImportance` comparison is quite complex. I've included a benchmark to compare the new TopK function to the existing Sort/limit strategy. It shows that for small limits, the new approach is much faster, especially at high numbers of alerts, e.g. 100K alerts / limit 16: 1.91s vs 0.02s (-99%) For situations where there is no effective limit, sorting is marginally faster, therefore in the API implementation, if there is either a) no limit or b) no effective limit, then we just sort the alerts as before. There is also a space overhead using a heap which would matter for large limits. * Remove commented test cases * Make linter happy	2024-04-19 11:51:22 +02:00
Julien Duchesne	c9211fbd69	ngalert openapi: Use same `basePath` as rest of Grafana (#79025 ) * ngalert openapi: Use same `basePath` as rest of Grafana Currently, there are two issues that prevent easily merging `ngalert` and grafana openapi specs: - The basePath is different. `grafana` has `/api` and `ngalert` has `/api/v1`. I changed `ngalert` to use `/api` - The `ngalert` endpoints have their basePath in the each operation path. The basePath should actually be omitted --------- Co-authored-by: Yuriy Tseretyan <yuriy.tseretyan@grafana.com>	2024-01-17 11:53:16 -05:00
Matthew Jacobson	eddd4f4508	Alerting: Add totalsFiltered to RuleResponse for hidden by filters count (#66883 ) Alerting: Add totalsFiltered to RuleResponse to facilitate hidden by filters count Currently, when both a limit_alerts and a matcher/state filter is applied, there is not enough information to determine how many alert instances were hidden by the filters. Only enough to determine the total hidden by the limit and filter combined. This change adds a separate totalsFiltered field alongside the AlertRule totals that will contain the count of instances after filters but before limits.	2023-04-21 09:35:12 +01:00
George Robinson	19ebb079ba	Alerting: Add limits and filters to Prometheus Rules API (#66627 ) This commit adds support for limits and filters to the Prometheus Rules API. Limits: It adds a number of limits to the Grafana flavour of the Prometheus Rules API: - `limit` limits the maximum number of Rule Groups returned - `limit_rules` limits the maximum number of rules per Rule Group - `limit_alerts` limits the maximum number of alerts per rule It sorts Rule Groups and rules within Rule Groups such that data in the response is stable across requests. It also returns summaries (totals) for all Rule Groups, individual Rule Groups and rules. Filters: Alerts can be filtered by state with the `state` query string. An example of an HTTP request asking for just firing alerts might be `/api/prometheus/grafana/api/v1/rules?state=alerting`. A request can filter by two or more states by adding additional `state` query strings to the URL. For example `?state=alerting&state=normal`. Like the alert list panel, the `firing`, `pending` and `normal` state are first compared against the state of each alert rule. All other states are ignored. If the alert rule matches then its alert instances are filtered against states once more. Alerts can also be filtered by labels using the `matcher` query string. Like `state`, multiple matchers can be provided by adding additional `matcher` query strings to the URL. The match expression should be parsed using existing regular expression and sent to the API as URL-encoded JSON in the format: { "name": "test", "value": "value1", "isRegex": false, "isEqual": true } The `isRegex` and `isEqual` options work as follows: \| IsEqual \| IsRegex \| Operator \| \| ------- \| -------- \| -------- \| \| true \| false \| = \| \| true \| true \| =~ \| \| false \| true \| !~ \| \| false \| false \| != \|	2023-04-17 17:45:06 +01:00
George Robinson	bd29071a0d	Revert "Alerting: Add limits to the Prometheus Rules API" (#65842 )	2023-04-03 15:20:37 +00:00
George Robinson	d96b0a71d3	Alerting: Add limits to the Prometheus Rules API (#65169 ) This commit adds a number of limits to the Grafana flavor of the Prometheus Rules API: 1. `limit` limits the maximum number of Rule Groups returned 2. `limit_rules` limits the maximum number of rules per Rule Group 3. `limit_alerts` limits the maximum number of alerts per rule It sorts Rule Groups and rules within Rule Groups such that data in the response is stable across requests. It also returns summaries (totals) for all Rule Groups, individual Rule Groups and rules.	2023-04-03 10:17:02 +01:00
Yuriy Tseretyan	718620c197	Alerting: Update forking request handlers to use the same errors (#52965 ) * generalize error handling in forking request handlers * remove MatchesBackend and change test to test Can * add 404 to route specs * change backendTypeByUID to getDatasourceByUID of expected type * use common errors in api testing * handle 401 in errorToResponse * replace backend type error with "unexpected datasource type" * update swagger spec	2022-08-02 09:33:59 -04:00
Sofia Papagiannaki	bb66c03f9a	Alerting: modify prometheus endpoints for proxying using the datasource UID (#48052 ) * Modify prometheus endpoints to expect the data source UID * Update frontend	2022-05-06 15:05:02 -04:00
Sofia Papagiannaki	54962c2f0c	Alerting: Rename Recipient path parameter to DatasourceID (#47949 )	2022-04-20 16:20:17 +03:00
gotjosh	a338c78ca8	Alerting: Remove internal labels from prometheus compatible API responses (#46548 ) * Alerting: Remove internal labels from prometheus compatible API responses * Appease the linter * Fix integration tests * Fix API documentation & linter * move removal of internal labels to the models	2022-03-16 16:04:19 +00:00
Yuriy Tseretyan	ddfe2dce74	Alerting: Split grafana and lotex routes (#44742 ) * split Lotex and Grafana routes * update template to use authorize function for every route	2022-02-04 12:42:04 -05:00
gotjosh	6572017ec7	Alerting: Allow more characters in label names so notifications are sent (#38629 ) Remove validation for labels to be accepted in the Alertmanager, This helps with datasources that produce non-compatible labels. Adds an "object_matchers" to alert manager routers so we can support labels names with extended characters beyond prometheus/openmetrics. It only does this for the internal Grafana managed Alert Manager. This requires a change to alert manager, so for now we use grafana/alertmanager which is a slight fork, with the intention of going back to upstream. The frontend handles the migration of "matchers" -> "object_matchers" when the route is edited and saved. Once this is done, downgrades will not work old versions will not recognize the "object_matchers". Co-authored-by: Kyle Brandt <kyle@grafana.com> Co-authored-by: Nathan Rodman <nathanrodman@gmail.com>	2021-10-04 15:06:40 +02:00
Owen Diehl	e065e19583	Fix/ngalert generation (#33172 ) * fixes pkg names & alerting openapi generation * cleans up api generation, uses docker & removes python	2021-04-20 13:12:32 -04:00
Owen Diehl	e37a780e14	Inhouse alerting api (#33129 ) * init * autogens AM route * POST dashboards/db spec * POST alert-notifications spec * fix description * re inits vendor, updates grafana to master * go mod updates * alerting routes * renames to receivers * prometheus endpoints * align config endpoint with cortex, include templates * Change grafana receiver type * Update receivers.go * rename struct to stop swagger thrashing * add rules API * index html * standalone swagger ui html page * Update README.md * Expose GrafanaManagedAlert properties * Some fixes - /api/v1/rules/{Namespace} should return a map - update ExtendedUpsertAlertDefinitionCommand properties * am alerts routes * rename prom swagger section for clarity, remove example endpoints * Add missing json and yaml tags * folder perms * make folders POST again * fix grafana receiver type * rename fodler->namespace for perms * make ruler json again * PR fixes * silences * fix Ok -> Ack * Add id to POST /api/v1/silences (#9) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add POST /api/v1/alerts (#10) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * fix silences * Add testing endpoints * removes grpc replace directives * [wip] starts validation * pkg cleanup * go mod tidy * ignores vendor dir * Change response type for Cortex/Loki alerts * receiver unmarshaling tests * ability to split routes between AM & Grafana * api marshaling & validation * begins work on routing lib * [hack] ignores embedded field in generation * path specific datasource for alerting * align endpoint names with cloud * single route per Alerting config * removes unused routing pkg * regens spec * adds datasource param to ruler/prom route paths * Modifications for supporting migration * Apply suggestions from code review * hack for cleaning circular refs in swagger definition * generates files * minor fixes for prom endpoints * decorate prom apis with required: true where applicable * Revert "generates files" This reverts commit `ef7e975584`. * removes server autogen * Update imported structs from ngalert * Fix listing rules response * Update github.com/prometheus/common dependency * Update get silence response * Update get silences response * adds ruler validation & backend switching * Fix GET /alertmanager/{DatasourceId}/config/api/v1/alerts response * Distinct gettable and postable grafana receivers * Remove permissions routes * Latest JSON specs * Fix testing routes * inline yaml annotation on apirulenode * yaml test & yamlv3 + comments * Fix yaml annotations for embedded type * Rename DatasourceId path parameter * Implement Backend.String() * backend zero value is a real backend * exports DiscoveryBase * Fix GO initialisms * Silences: Use PostableSilence as the base struct for creating silences * Use type alias instead of struct embedding * More fixes to alertmanager silencing routes * post and spec JSONs * Split rule config to postable/gettable * Fix empty POST /silences payload Recreating the generated JSON specs fixes the issue without further modifications * better yaml unmarshaling for nested yaml docs in cortex-am configs * regens spec * re-adds config.receivers * omitempty to align with prometheus API behavior * Prefix routes with /api * Update Alertmanager models * Make adjustments to follow the Alertmanager API * ruler: add for and annotations to grafana alert (#45) * Modify testing API routes * Fix grafana rule for field type * Move PostableUserConfig validation to this library * Fix PostableUserConfig YAML encoding/decoding * Use common fields for grafana and lotex rules * Add namespace id in GettableGrafanaRule * Apply suggestions from code review * fixup * more changes * Apply suggestions from code review * aligns structure pre merge * fix new imports & tests * updates tooling readme * goimports * lint * more linting!! * revive lint Co-authored-by: Sofia Papagiannaki <papagian@gmail.com> Co-authored-by: Domas <domasx2@gmail.com> Co-authored-by: Sofia Papagiannaki <papagian@users.noreply.github.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Co-authored-by: gotjosh <josue@grafana.com> Co-authored-by: David Parrott <stomp.box.yo@gmail.com> Co-authored-by: Kyle Brandt <kyle@grafana.com>	2021-04-19 14:26:04 -04:00

24 Commits