* update documentation to mention protected fields
* alerting docs: add protected field info for grafana cloud
add protected field info for grafana cloud
* prettier
* link fix
---------
Co-authored-by: Yuri Tseretyan <yuriy.tseretyan@grafana.com>
What is this feature?
This PR implements compressed periodic save for alert state storage, providing a more efficient alternative to regular periodic saves by grouping alert instances by rule UID and storing them using protobuf and snappy compression. When enabled via the state_compressed_periodic_save_enabled configuration option, the system groups alert instances by their alert rule, compresses each group using protobuf serialization and snappy compression, and processes all rules within a single database transaction at specified intervals instead of syncing after every alert evaluation cycle.
Why do we need this feature?
During discussions in PR #111357, we identified the need for a compressed approach to periodic alert state storage that could further reduce database load beyond the jitter mechanism. While the jitter feature distributes database operations over time, this compressed periodic save approach reduces the frequency of database operations by batching alert state updates at explicitly declared intervals rather than syncing after every alert evaluation cycle.
This approach provides several key benefits:
- Reduced Database Frequency: Instead of frequent sync operations tied to alert evaluation cycles, updates occur only at configured intervals
- Storage Efficiency: Rule-based grouping with protobuf and snappy compression significantly reduces storage requirements
The compressed periodic save complements the existing jitter mechanism by providing an alternative strategy focused on reducing overall database interaction frequency while maintaining data integrity through compression and batching.
Who is this feature for?
- Platform/Infrastructure teams managing large-scale Grafana deployments with high alert cardinality
- Organizations looking to optimize storage costs and database performance for alerting workloads
- Production environments with 1000+ alert rules where database write frequency is a concern
* Docs: Add contact point specific text formatting examples to notification templates
* Add Slack formatting examples with *bold* and _italic_ syntax
* Clarify that text formatting depends on contact point type
* Docs: Add contact point specific text formatting examples to notification templates
* Reflect review that fix docs
* docs(alerting): add settings for alert evaluation backoff retries
* docs(alerting): add mention of backoff settings on the Error state docs
* fix vale prose
What is this feature?
This PR implements a jitter mechanism for periodic alert state storage to distribute database load over time instead of processing all alert instances simultaneously. When enabled via the state_periodic_save_jitter_enabled configuration option, the system spreads batch write operations across 85% of the save interval window, preventing database load spikes in high-cardinality alerting environments.
Why do we need this feature?
In production environments with high alert cardinality, the current periodic batch storage can cause database performance issues by processing all alert instances simultaneously at fixed intervals. Even when using periodic batch storage to improve performance, concentrating all database operations at a single point in time can overwhelm database resources, especially in resource-constrained environments.
Rather than performing all INSERT operations at once during the periodic save, distributing these operations across the time window until the next save cycle can maintain more stable service operation within limited database resources. This approach prevents resource saturation by spreading the database load over the available time interval, allowing the system to operate more gracefully within existing resource constraints.
For example, with 200,000 alert instances using a 5-minute interval and 4,000 batch size, instead of executing 50 batch operations simultaneously, the jitter mechanism distributes these operations across approximately 4.25 minutes (85% of 5 minutes), with each batch executed roughly every 5.2 seconds.
This PR provides system-level protection against such load spikes by distributing operations across time, reducing peak resource usage while maintaining the benefits of periodic batch storage. The jitter mechanism is particularly valuable in resource-constrained environments where maintaining consistent database performance is more critical than precise timing of state updates.
* alerting docs: update list view filtering description
update the description of the list view filter on the view alert rules page.
* img
* Update view-alert-rules.md
* Update view-alert-rules.md
* edit via lauren
Creates comprehensive guide for configuring recording rules with TraceQL metrics queries,
including Tempo-specific considerations for time ranges, evaluation delays, and examples.
Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com>
Co-authored-by: Kim Nylander <kim.nylander@grafana.com>
Co-authored-by: Jack Baldry <jack.baldry@grafana.com>
* alerting docs: activate active timing time
rough draft: documentation for active timing feature.
* purdier
* more tweaks
* edits via yuri advice
* prettier
Align with option names in contact point.
To avoid confusion, ensure that the documentation aligns with the option names used in the Jira contact point.
* docs(alerting): Import to Grafana-managed rules
* apply latest evaluation changes
* Add additional conversion details to How it works section
* fix ref link
* fix Data source input name
* more details about the `Target data source` input