* remove unused compat functions
* update to alerting module from pr
* replace IntegrationConfig with IntegrationSchemaVersion
* safely resolve a string into integration type
* change usages of integration config
* devenv: fix volumes section when sources don't contain one
* wip
* Working correctly with improvedExternalSessionHandling on
* Remove not needed lines
* Working with the old flow, tests
* Handle compatibility with the feature toggle, tests wip
* Tests
* Cleanup
* Address feedback
* Align tests
* Add comment
* Fix issue with session removal after the invalidation of tokens
* Remove commented out code
* clean up
* Don't error on recording rules without conditions
The provisioning model doesn't include conditions for recording rules.
Only return an error for a missing condition if the rule isn't a
recording rule.
Also, added a test case to show the failure.
This resolves#109398
* Run `go fmt` to fix whitespace issues
* update alerting module
* replace compat with ones from alerting
* update type references Receiver and Integration to *Status
* update route in provisioning test that is invalid after recent change
* use right type for LINE ingtegration
* Revert "Alerting: Generate simplified routing routes with old fingerprint function (#111893)"
This reverts commit 0da9d49896.
* Add alertingUseNewSimplifiedRoutingHashAlgorithm flag
* Alerting: Add feature toggle to use the old simplified routing hash generation
* disable cgo by default for local builds, also set cgo variable in either case
* actually do not set the default value
* disable cgo for darwin, display sqlite driver in logs
* fix linter warning, although I do not fully agree with it
* Add feature flag for multiple scopes endpoint usage
* Add method to API client for fetching multiple ScopeNodes at the same time
* Add parent title to tree
* Get nodes from cache too
* Update test with new functionality
* Update test
* Fix linting issue
* Remove unapplied scope parents
* adding some logs to better understand what might be happening
* only focus this PR on improve logging in finalizer handling
* debug log before calling finalizers
* working on finalizers
* removing last todos, adding unit tests
* better use SupportedFinalizers name
* addressing comments
* wip: fix tests and add delete error in status
* chore: codegen
* chore: codegen openapi
* Merge remote-tracking branch 'origin/main' into provisioning-finalisers-fix-2
* update frontend client
* fix: errors in testing
* fix: breaking test
---------
Co-authored-by: Daniele Ferru <daniele.ferru@grafana.com>
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
What is this feature?
This PR implements a jitter mechanism for periodic alert state storage to distribute database load over time instead of processing all alert instances simultaneously. When enabled via the state_periodic_save_jitter_enabled configuration option, the system spreads batch write operations across 85% of the save interval window, preventing database load spikes in high-cardinality alerting environments.
Why do we need this feature?
In production environments with high alert cardinality, the current periodic batch storage can cause database performance issues by processing all alert instances simultaneously at fixed intervals. Even when using periodic batch storage to improve performance, concentrating all database operations at a single point in time can overwhelm database resources, especially in resource-constrained environments.
Rather than performing all INSERT operations at once during the periodic save, distributing these operations across the time window until the next save cycle can maintain more stable service operation within limited database resources. This approach prevents resource saturation by spreading the database load over the available time interval, allowing the system to operate more gracefully within existing resource constraints.
For example, with 200,000 alert instances using a 5-minute interval and 4,000 batch size, instead of executing 50 batch operations simultaneously, the jitter mechanism distributes these operations across approximately 4.25 minutes (85% of 5 minutes), with each batch executed roughly every 5.2 seconds.
This PR provides system-level protection against such load spikes by distributing operations across time, reducing peak resource usage while maintaining the benefits of periodic batch storage. The jitter mechanism is particularly valuable in resource-constrained environments where maintaining consistent database performance is more critical than precise timing of state updates.
* update tests to assert against snapshot
* remove channel_config package replaced by schemas from alerting module
* update references to use new schema