* add check for access to rule's data source in GET APIs
* use more general method GetAlertRules instead of GetNamespaceAlertRules.
* remove unused GetNamespaceAlertRules.
Tests:
* create a method to generate permissions for rules
* extract method to create RuleSrv
* add tests for RouteGetNamespaceRulesConfig
* move validation at the beginning of method
* remove usage of GetOrgRuleGroups because it is not necessary. All information is already available in memory.
* remove unused method
* Base-line API for provisioning notification policies
* Wire API up, some simple tests
* Return provenance status through API
* Fix missing call
* Transactions
* Clarity in package dependencies
* Unify receivers in definitions
* Fix issue introduced by receiver change
* Drop unused internal test implementation
* FGAC hooks for provisioning routes
* Polish, swap names
* Asserting on number of exposed routes
* Don't bubble up updated object
* Integrate with new concurrency token feature in store
* Back out duplicated changes
* Remove redundant tests
* Regenerate and create unit tests for API layer
* Integration tests for auth
* Address linter errors
* Put route behind toggle
* Use alternative store API and fix feature toggle in tests
* Fixes, polish
* Fix whitespace
* Re-kick drone
* Rename services to provisioning
* Alerting: Accurately set value for prom-compatible APIs
Sets the value fields for the prometheus compatible API based on a combination of condition `refID` and the values extracted from the different frames.
* Fix an extra test
* Ensure a consitent ordering
* Address review comments
* address review comments
* Add basic UI for custom ruler URL
* Add build info fetching for alerting data sources
* Add keeping data sources build info in the store
* Use data source build info to construct data source urls
* Remove unused code
* Add custom ruler support in prometheus api calls
* Migrate actions
* Use thunk condition to prevent multiple data source buildinfo fetches
* Unify prom and ruler rules loading
* Upgrade RuleEditor tests
* Upgrade RuleList tests
* Upgrade PanelAlertTab tests
* Upgrade actions tests
* Build info refactoring
* Get rid of lotex ruler support action
* Add prom ruler availability checking when the buildinfo is not available
* Add rulerUrlBuilder tests
* Improve prometheus data source validation, small build info refactoring
* Change prefix based on Prometheus subtype
* Use the correct path
* Revert config routing
* Add deprecation notice for /api/prom prefix
* Add tests to the datasource subtype
* Remove custom ruler support
* Remove deprecation notice
* Prevent fetching ruler rules when ruler api is not available
* Add build info tests
* Unify naming of ruler methods
* Fix test
* Change buildinfo data source validation
* Use strings for subtype params and unveil mimir
* organise imports
* frontend changes and wordsmithing
* fix test suite
* add a nicer verbose message for prometheus datasources
* detect Mimir datasource
* fix test
* fix buildinfo test for Mimir
* shrink vectors
* add some code documentation
* DRY prepareRulesFilterQueryParams
* clarify that Prometheus does not support managing rules
* Improve buildinfo error handling
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: gillesdemey <gilles.de.mey@gmail.com>
* make eval.Evaluator an interface
* inject Evaluator to TestingApiSrv
* move conditionEval to RouteTestGrafanaRuleConfig because it is the only place where it is used
* update rule test api to check data source permissions
* Use alert:create action for folder search with edit permissions. This matches the action that is used to query dashboards (the update will be addressed later)
* Update rule store to use FindDashboards instead of folder service to list folders the user has access to view alerts. Folder service does not support query type and additional filters.
* Do not check whether the user can save to folder if FGAC is enabled because it is checked on API level.
* require legacy Editor for post, put, delete endpoints
* require user to be signed in on group level because handler that checks that user has role Editor does not check it is signed in
* verify that the user has access to all data sources used by the rule that needs to be deleted from the group
* if a user is not authorized to access the rule, the rule is removed from the list to delete
* rename GetRuleGroupAlertRules to GetAlertRules
* make rule group optional in GetAlertRulesQuery
* simplify FakeStore. the current structure did not support optional rule group
update method getEvaluatorForAlertRule to accept permissions evaluator and exit on the first negative result, which is more effective than returning an evaluator that in fact is a bunch of slices.
* Alerting: Remove internal labels from prometheus compatible API responses
* Appease the linter
* Fix integration tests
* Fix API documentation & linter
* move removal of internal labels to the models
* move alerting actions to accesscontrol to avoid cycledeps
* define new actions and fixed roles for alerting
* add folder permission to alert reader role
* Remove InTransaction from RuleStore and make it its own interface
* Ensure that ctx-based is clear from name
* Resolve merge conflicts
* Refactor tests to work in terms of the introduced abstraction rather than concrete dbstore
* add actions, roles and route mapping for rule permission
* add instance\notification actions
* do not declare alerting roles if no feature flag is set (temporary)
* do not include update if no diff
* refactor calculate changes to include diff (and log)
Co-authored-by: George Robinson <george.robinson@grafana.com>
* Add missing OK option to models
* add ok to legacy legacy UI does not support it but it is possible to do so via provisioning.
* use enums in migration so linter would catch missing cases
* Update API controller
- add validation of rules API model
- add function to calculate changes between the submitted alerts and existing alerts
- update RoutePostNameRulesConfig to validate input models, calculate changes and apply in a transaction
* Update DBStore
- delete unused storage method. All the logic is moved upstream.
- upsert to not modify fields of new by values from the existing alert
- if rule has UID do not try to pull it from db. (it is done upstream)
* Add rule generator
* API: Using go-swagger for extracting OpenAPI specification from source code
* Merge Grafana Alerting spec
* Include enterprise endpoints (if enabled)
* Serve SwaggerUI under feature flag
* Fix building dev docker images
* Configure swaggerUI
* Add missing json tags
Co-authored-by: Ying WANG <ying.wang@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* Fix evaluation of alert rules for datasources with custom headers
* Fix unit tests
* Fix integration tests
* Evaluator fields should be package private
* (WIP) send alerts to external, internal, or both alertmanagers
* Modify admin configuration endpoint, update swagger docs
* Integration test for admin config updated
* Code review changes
* Fix alertmanagers choice not changing bug, add unit test
* Add AlertmanagersChoice as enum in swagger, code review changes
* Fix API and tests errors
* Change enum from int to string, use 'SendAlertsTo' instead of 'AlertmanagerChoice' where necessary
* Fix tests to reflect last changes
* Keep senders running when alerts are handled just internally
* Check if any external AM has been discovered before sending alerts, update tests
* remove duplicate data from logs
* update comment
* represent alertmanagers choice as an int instead of a string
* default alertmanagers choice to all alertmanagers, test cases
* update definitions and generate spec
* pass notification service down to the notifiers
* add ns to all notifiers
* remove bus from ngalert notifiers
* use smaller interfaces for notificationservice
* attempt to fix the tests
* remove unused struct field
* simplify notification service mock
* trying to resolve issues in the tests
* make linter happy
* make linter even happier
* linter, you are annoying
* Update API to call the scheduler to remove\update an alert rule. When a rule is updated by a user, the scheduler will remove the currently firing alert instances and clean up the state cache.
* Update evaluation loop in the scheduler to support one more channel that is used to communicate updates to it.
* Improved rule deletion from the internal registry.
* Move alert rule version from the internal registry (structure alertRuleInfo) closer rule evaluation loop (to evaluation task structure), which will make the registry values immutable.
* Extract notification code to a separate function to reuse in update flow.
* Create API test for overwriting invalid alertmanager config
* Avoid requiring alertmanager readiness for config changes
* AlertmanagerSrv depends on functionality rather than concrete types
* Add test for non-ready alertmanagers
* Additional cleanup and polish
* Back out previous integration test changes
* Refactor of tests incorrectly caused a test to become redundant
* Use pre-existing fake secret service
* Drop unused interface
* Test against concrete MultiOrgAlertmanager re-using fake infra from other tests
* Fix linter error
* Empty commit to rerun checks
* refactor datasource loading
* refactor datasource loading
* pass uid
* use dscache in alerting to get DS
* remove expr/translate pacakge
* remove dup injection entry
* fix DS type on metrics endpoint, remove SQL DS lookup inside SSE
* update test and adapter
* comment fix
* Make eval run as admin when getting datasource info
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
* fmt and comment
* remove unncessary/redundant code
Co-authored-by: Kyle Brandt <kyle@grafana.com>
Co-authored-by: Marcus Efraimsson <marcus.efraimsson@gmail.com>
Co-authored-by: Santiago <santiagohernandez.1997@gmail.com>
Fixes a panic that would ocurr as we proxy 4xx responses. When this happens and the content type of the response is JSON we try to check if the response has a "message" key. Then, we assume that the key will contain a value of string but we don't take into account that this value can potentially be `null`.
This adds a type assertion check to to this assumption so that we can keep the original JSON body as the response if we're unable to extract an `message`.