Commit Graph

937 Commits

Author SHA1 Message Date
Yuri Tseretyan 47f7b3e095 Alerting: Dedicated permission for Template testing API (#115032) 2025-12-10 10:56:29 -05:00
Pepe Cano a8d174ccef docs(alerting): add new Examples of trace-based alerts (#114511)
* docs(alerting): add new Examples of trace-based alerts

* fix vale issues
2025-12-05 08:28:16 +01:00
Johnny Kartheiser 0aeb4feef3 update documentation to mention protected fields (#114809)
* update documentation to mention protected fields

* alerting docs: add protected field info for grafana cloud

add protected field info for grafana cloud

* prettier

* link fix

---------

Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
2025-12-03 19:47:23 -05:00
Sonia Aguilar e7377a8842 Alerting: Update docs for ash AI helper button (#114229)
Update docs for ash AI helper button
2025-11-21 09:31:22 +01:00
Johnny Kartheiser 441556d1a3 alerts docs: triage page (#113869)
* alerts docs: triage page

documentation for the alert triage page

* public preview note added

* Update alert-triage.md

* prettier

* removed alerts triage references

* rename

* image

* prettier
2025-11-17 15:15:46 -06:00
Johnny Kartheiser a67fad4734 alerting: best practices docs update (#113188)
* alerting: best practices docs update

best practices docs update re: recording rules

* Update _index.md
2025-11-12 19:26:51 +00:00
Johnny Kartheiser 1f14f1447f alerting docs: target data source clarification (#113126)
adding information to create grafana managed recording rules doc per support request 16892
2025-11-12 10:51:13 -06:00
Johnny Kartheiser 67f811b6d8 alerting: mute timing clarification (#113129)
* alerting: mute timing clarification

clarify that mute timing takes precedence over active timing

* alerting docs: best practices addition

draft content for best practices re: recording rules

* wrong branch

* alerting: best practices docs

best practices addition re: recording rules

* smh
2025-11-12 10:49:50 -06:00
Seunghun Shin c784de6ef5 Alerting: Add compressed periodic save for alert instances (#111803)
What is this feature?

This PR implements compressed periodic save for alert state storage, providing a more efficient alternative to regular periodic saves by grouping alert instances by rule UID and storing them using protobuf and snappy compression. When enabled via the state_compressed_periodic_save_enabled configuration option, the system groups alert instances by their alert rule, compresses each group using protobuf serialization and snappy compression, and processes all rules within a single database transaction at specified intervals instead of syncing after every alert evaluation cycle.

Why do we need this feature?

During discussions in PR #111357, we identified the need for a compressed approach to periodic alert state storage that could further reduce database load beyond the jitter mechanism. While the jitter feature distributes database operations over time, this compressed periodic save approach reduces the frequency of database operations by batching alert state updates at explicitly declared intervals rather than syncing after every alert evaluation cycle.
This approach provides several key benefits:

- Reduced Database Frequency: Instead of frequent sync operations tied to alert evaluation cycles, updates occur only at configured intervals
- Storage Efficiency: Rule-based grouping with protobuf and snappy compression significantly reduces storage requirements

The compressed periodic save complements the existing jitter mechanism by providing an alternative strategy focused on reducing overall database interaction frequency while maintaining data integrity through compression and batching.

Who is this feature for?

- Platform/Infrastructure teams managing large-scale Grafana deployments with high alert cardinality
- Organizations looking to optimize storage costs and database performance for alerting workloads
- Production environments with 1000+ alert rules where database write frequency is a concern
2025-11-07 11:51:48 +01:00
Pepe Cano ffa5e41bec docs(alerting): add note about invalid numeric identifiers in templates (#113269) 2025-11-04 17:56:41 +01:00
Pepe Cano 7eb8a9af99 docs(alerting): clarify notification group deletion after group interval elapses (#113160) 2025-10-29 16:08:08 +01:00
Pepe Cano 86bf99aaaa docs(alerting): add additional migration details (#112383) 2025-10-29 13:58:13 +01:00
ksemtinimahmoud d25f5199c7 Docs: Fix incorrect label in recording rules documentation (#111464)
* Fix incorrect label: 'New Grafana recording rule' → 'New Data source recording rule'

* lowercase
2025-10-27 15:29:12 +00:00
Pepe Cano fb5c5411f8 docs(alerting): clarify usage of templates in webhook custom payloads (#112672)
* docs(alerting): clarify usage of templates in webhook custom payloads

* Update docs/sources/alerting/configure-notifications/template-notifications/manage-notification-templates.md

Co-authored-by: Johnny Kartheiser <140559259+JohnnyK-Grafana@users.noreply.github.com>

---------

Co-authored-by: Johnny Kartheiser <140559259+JohnnyK-Grafana@users.noreply.github.com>
2025-10-21 13:32:01 -05:00
Pepe Cano 9e505ea2de docs(alerting) Add examples of high-cardinality alerts (#112311)
* docs(alerting) Add examples of high-cardinality alerts

* minor intro edits
2025-10-20 12:17:58 +02:00
Seunghun Shin 7bc97d5fa8 Docs: Add contact point specific text formatting examples (#112276)
* Docs: Add contact point specific text formatting examples to notification templates
* Add Slack formatting examples with *bold* and _italic_ syntax
* Clarify that text formatting depends on contact point type

* Docs: Add contact point specific text formatting examples to notification templates
* Reflect review that fix docs
2025-10-15 17:28:10 +00:00
Johnny Kartheiser f150c56e00 alerting docs: ai templating .md swap (#112233)
moving the file to the proper templating section
2025-10-09 16:57:50 +00:00
Pepe Cano 5438df01a1 docs(alerting): Email contact point enhancements (#111944)
* docs(alerting): Email contact point enhancements

* minor content tweaks
2025-10-07 17:20:52 +02:00
Johnny Kartheiser 277fc492bd alerting docs: new AI template helper (#111856)
* alerting docs: new AI tools

docs for alert history and template ai tools

* Update docs/sources/alerting/alerting-rules/templates/_index.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* edit word

* Update view-alert-state-history.md

* Update docs/sources/alerting/alerting-rules/templates/_index.md

Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>

* prettier

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Sonia Aguilar <33540275+soniaAguilarPeiron@users.noreply.github.com>
2025-10-03 14:43:18 -05:00
Pepe Cano 6248971d1e docs(alerting): add settings for alert evaluation backoff retries (#111891)
* docs(alerting): add settings for alert evaluation backoff retries

* docs(alerting): add mention of backoff settings on the Error state docs

* fix vale prose
2025-10-03 11:08:14 +02:00
Pepe Cano 7ed46fd321 docs(alerting): alertingSaveStateCompressed is enabled by default (#111897) 2025-10-03 09:02:58 +02:00
Seunghun Shin 512c292e04 Alerting: Add jitter support for periodic alert state storage to reduce database load spikes (#111357)
What is this feature?

This PR implements a jitter mechanism for periodic alert state storage to distribute database load over time instead of processing all alert instances simultaneously. When enabled via the state_periodic_save_jitter_enabled configuration option, the system spreads batch write operations across 85% of the save interval window, preventing database load spikes in high-cardinality alerting environments.

Why do we need this feature?

In production environments with high alert cardinality, the current periodic batch storage can cause database performance issues by processing all alert instances simultaneously at fixed intervals. Even when using periodic batch storage to improve performance, concentrating all database operations at a single point in time can overwhelm database resources, especially in resource-constrained environments.

Rather than performing all INSERT operations at once during the periodic save, distributing these operations across the time window until the next save cycle can maintain more stable service operation within limited database resources. This approach prevents resource saturation by spreading the database load over the available time interval, allowing the system to operate more gracefully within existing resource constraints.

For example, with 200,000 alert instances using a 5-minute interval and 4,000 batch size, instead of executing 50 batch operations simultaneously, the jitter mechanism distributes these operations across approximately 4.25 minutes (85% of 5 minutes), with each batch executed roughly every 5.2 seconds.

This PR provides system-level protection against such load spikes by distributing operations across time, reducing peak resource usage while maintaining the benefits of periodic batch storage. The jitter mechanism is particularly valuable in resource-constrained environments where maintaining consistent database performance is more critical than precise timing of state updates.
2025-09-29 11:22:36 +02:00
Johnny Kartheiser 82e5019333 alerting docs: update list view filtering description (#110885)
* alerting docs: update list view filtering description

update the description of the list view filter on the view alert rules page.

* img

* Update view-alert-rules.md

* Update view-alert-rules.md

* edit via lauren
2025-09-17 14:51:35 +02:00
Johnny Kartheiser 73a241da86 alerting docs: update contact point emails (SE #17651) (#110780) 2025-09-10 19:47:33 +00:00
Alex Bikfalvi 4d55358fd2 docs: Add Grafana-managed recording rules documentation for Tempo (#110097)
Creates comprehensive guide for configuring recording rules with TraceQL metrics queries,
including Tempo-specific considerations for time ranges, evaluation delays, and examples.

Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com>
Co-authored-by: Kim Nylander <kim.nylander@grafana.com>
Co-authored-by: Jack Baldry <jack.baldry@grafana.com>
2025-08-28 10:22:56 +02:00
Johnny Kartheiser fe6985f2ac docs: alerting list view UI changes (#108876) 2025-08-06 15:42:33 +02:00
Alexander Akhmetov 8b5b9b68c2 Alerting: Document Accept header in Prometheus conversion API (#109080) 2025-08-01 22:39:54 +02:00
Alexander Akhmetov b36a8e84cc Alerting: Document "Get rule group" Prometheus conversion API endpoint (#109075) 2025-08-01 21:15:33 +02:00
Eve Meelan 147df3de08 Pricing update: no more Cloud Advanced (#109056)
* scrub Cloud Advanced

* prettier edit
2025-08-01 16:23:58 +00:00
Jack Baldry 0faa03edbe Add snippets for 'Create log alert rules with Grafana Alerting' learning journey (#109059) 2025-08-01 15:57:18 +00:00
Alexander Akhmetov c827ddf790 Alerting: Add meta-monitoring documentation for GRAFANA_ALERTS (#108785) 2025-07-29 21:54:44 +02:00
Alexander Akhmetov 412704c9de Alerting: Document Mimir compatibility for Prometheus conversion API endpoints (#108741)
Document Mimir compatibility for Prometheus conversion API endpoints
2025-07-29 21:53:24 +02:00
renovate[bot] c94f930950 Update dependency prettier to v3.6.2 (#108689)
* Update dependency prettier to v3.6.2

* run prettier

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Ashley Harrison <ashley.harrison@grafana.com>
2025-07-25 17:47:44 +01:00
Johnny Kartheiser a49b35a7ec Update _index.md (#108192) 2025-07-18 15:10:16 +01:00
Johnny Kartheiser 08e8a71ad6 alerting docs: activate active timing time (#107928)
* alerting docs: activate active timing time

rough draft: documentation for active timing feature.

* purdier

* more tweaks

* edits via yuri advice

* prettier
2025-07-17 10:56:25 -05:00
Blueswen 1cd6ef6b84 Docs: Align Jira documentation with contact point options (#105069)
Align with option names in contact point.

To avoid confusion, ensure that the documentation aligns with the option names used in the Jira contact point.
2025-07-10 21:15:04 +00:00
Pepe Cano 61efc8b609 docs(alerting): clarify usage of different Alertmanagers and fix misleading details (#107498)
* docs(alerting): clarify usage of different Alertmanagers and fix misleading details

* address review changes
2025-07-02 21:46:29 +02:00
Pepe Cano f5b79fca55 docs(alerting): performance considerations minor clarifications (#107333) 2025-06-30 07:42:26 +00:00
Alexander Akhmetov f4b0e793aa Alerting: Document label sanitization in GRAFANA_ALERTS (#107285)
* Alerting: Document label sanitization in GRAFANA_ALERTS
2025-06-27 23:33:42 +02:00
Pepe Cano 2f1a6ae171 docs(alerting): Add Detect missing series in Prometheus section to the MissingData guide (#107329) 2025-06-27 21:13:13 +02:00
Pepe Cano ff4d7d35c6 docs(alerting): add new guidelines to the missing data alerting guide (#107232) 2025-06-26 21:06:34 +02:00
Pepe Cano 5c1b263664 docs(alerting): simplify Intro to Grafana Alerting docs (#106944)
* docs(alerting): improve `Intro > Alert rule evaluation` docs

* Update Introduction to Grafana Alerting

* Simplify `Intro > Alert rules` and related docs

* minor copy change phrasing GMA and DS differences

* fix vale error
2025-06-26 12:25:32 +00:00
Pepe Cano f14492baf8 docs(alerting): prom backend to write ALERTS metric (#107006)
* docs(alerting): prom backend to write ALERTS metric

* add enterprise label

* Update docs/sources/alerting/set-up/configure-alert-state-history/index.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/configure-alert-state-history/index.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/configure-alert-state-history/index.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/configure-alert-state-history/index.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/configure-alert-state-history/index.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/meta-monitoring.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/meta-monitoring.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/meta-monitoring.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/meta-monitoring.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/meta-monitoring.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/configure-alert-state-history/index.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/configure-alert-state-history/index.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/meta-monitoring.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

* Update docs/sources/alerting/set-up/meta-monitoring.md

Co-authored-by: Alexander Akhmetov <me@alx.cx>

---------

Co-authored-by: Alexander Akhmetov <me@alx.cx>
2025-06-24 10:38:52 +02:00
Jack Baldry 244ffad99d Fix all the old usage of admonition syntax (#106984) 2025-06-19 17:31:13 +01:00
Pepe Cano 286a6638b8 docs(alerting): fix Grafana Play links due to provisioning (#106816) 2025-06-17 23:08:46 +02:00
Bryan Boreham fca89d0d4c Docs: Typo: mediam->median (#106305) 2025-06-16 12:35:29 +03:00
Pepe Cano 493e7ba75f docs(alerting): enhancements for MQTT docs (#106566)
* docs(alerting): enhancements for MQTT docs

* Update docs/sources/alerting/configure-notifications/manage-contact-points/integrations/configure-mqtt.md

Co-authored-by: Simon Prickett <simon@crudworks.org>

---------

Co-authored-by: Simon Prickett <simon@crudworks.org>
2025-06-13 10:27:08 +02:00
Pepe Cano 79ff67268f docs(alerting): Add Tutorials directory page under Best Practices (#106159)
* docs(alerting): Add Tutorials directoy page under Best Practices

* run prettier

* Include latest tutorials

* fix tutorial list
2025-06-10 16:10:20 +00:00
Pepe Cano f76e4f8fda docs(alerting): Import to Grafana-managed rules (#106384)
* docs(alerting): Import to Grafana-managed rules

* apply latest evaluation changes

* Add additional conversion details to How it works section

* fix ref link

* fix Data source input name

* more details about the `Target data source` input
2025-06-10 17:22:48 +02:00
Pepe Cano 0d0aa35ba7 docs(alerting): add a short new guideline for handling NoData scenarios. (#106412)
docs(alerting): add consideration for handling NoData scenarios
2025-06-06 21:19:21 +02:00