To better observe and identify performance bottlenecks in the prometheus rules API.
The following spans were added:
- `api.prometheus.RouteGetRuleStatuses`
- `api.prometheus.PrepareRuleGroupStatusesV2`
The `api.prometheus.PrepareRuleGroupStatusesV2` includes attributes to capture the parameters used in the request to help with debugging and performance analysis.
* Deprecating features.IsEnabled
* add one more nolint
* add one more nolint
* Give better hints to devs in the deprecation message of IsEnabledGlobally
* adding more doc strings
* fix linter after rebase
* Extend deprecation message
* Unified Storage: allow rebuilding indexes for resource from a new grpc endpoint
* remove log line
* fix trace def
* lint
* fix after rebase
* addressing code review changes
* update with one channel per rebuild request
* other review suggestions
* update with review suggestions
* run mockery generate for MockResourceClient
* update tests
* update tests and lint
* fix test
* Secrets manager: create secure value using the active keeper
* SecureValueService.Update: fetch secure value from db to get the keeper
* fix keeper_store_test.go
* SecureValueService: fix bug in update where the current version keeper wasn't being passed to the createNewVersion method
* make gofmt
* remove outdated test
* update TestModel
* undo enterprise_imports changes
* use xkube.Namespace
* migrator: set secret_secure_value.keeper to 'system' when the column is null
* indent cue
* fix tests
* fix enterprise imports
* properly fix enterprise imports
* make update-workspace
* go mod tidy
---------
Co-authored-by: Matheus Macabu <macabu.matheus@gmail.com>
* refactor: extract logic
* directly use the setting.cfg in the middleware
* more granular config handling, per section
* fixed unit test
* refactor code to avoid lint error
* Add library panel repeat options to v2 schema during conversion
* use any instead of interface{}
* change to common.Unstructured instead of byte[] for model field
* Fix the tests and let the library panel behavior fetch repeat options in public and scripted dashboards
* fix library panel differences between backend and frontend conversion
* feat(toggle): new feature toggle for time range zoom keyboard shortcuts
* feat(keyboard-shortcuts): handle `t +` and `t -` key combinations
* test(keyboard-shortcuts): validate time range zoom with `t +`, and `t -`
* chore(i18n): update keyboard shortcut translations
* refactor(time-range): no-op when timespan is zero instead of defaulting
* `grafana-iam`: Use the UniStore as the default store
* Refactor all instantiations
* Remove enableDualWriter
* Nit. dw is clear enough
* Use the correct access control client
* feat: add retry logic for transient errors in Kubernetes client
Add retry wrapper for dynamic.ResourceInterface that automatically retries
transient errors using Kubernetes' wait.ExponentialBackoff utility.
- Implements retry logic with exponential backoff for all Kubernetes API operations
- Detects transient errors: ServiceUnavailable, ServerTimeout, TooManyRequests,
InternalError, Timeout, and network errors
- Uses wait.ExponentialBackoff from k8s.io/apimachinery/pkg/util/wait
- Respects context cancellation
- Includes comprehensive unit tests
Part of https://github.com/grafana/git-ui-sync-project/issues/634
* docs: add detailed documentation for defaultRetryBackoff
Document when retry attempts will happen, what errors trigger retries,
and the retry behavior (attempts, delays, exponential backoff, jitter).
* feat: add logging and increase retry attempts for Kubernetes client
- Add context logger to track retry attempts (Info for retries, Warn for exhaustion)
- Increase retry attempts from 5 to 8 steps (~10 seconds total retry window)
- Document when all retry attempts will fail:
* API server completely unavailable/unreachable
* Network connectivity issues persist beyond retry window
* Consistent transient errors for entire retry duration
* Context cancellation before retries complete
* chore: update retry client documentation
* fix: resolve linting issues in retry client
- Replace type assertions with errors.As for wrapped errors
- Remove deprecated Temporary() check (deprecated since Go 1.18)
- Update tests to remove temporary error test case
* fix: resolve staticcheck S1008 linting issue in retry_client.go
Simplify return statement to use errors.As directly instead of if-return pattern
* do not create sa with no expiration when day limit is set
* disable No expiration option if day limit is set
* make i18n-extract
* run prettier
* address feedback
* Moving things around
* Copy parseURL function to where it's used
* Update test
* Remove experimental-strip-types
* Revert "Remove experimental-strip-types"
This reverts commit 70fbc1c0cd.
* Trigger build
This adds a `message` column to the `alert_rule_version` table. This follows the
pattern established for dashboards as closely as possible. A new type is
introduced internally for passing the new `message` field around in a type-safe
manner, but doing the same for the API types becomes very messy. In that case, a
new field is added with omitempty.
Note this PR is only:
- The column addition
- The "read" path; API for listing versions
Subsequent PRs will add code to actually set the message when updating rules.
* chore: bump authlib/types to v0.0.0-20251119142549-be091cf2f4d4
Signed-off-by: Maicon Costa <maiconscosta@gmail.com>
* Update Go Workspace
Signed-off-by: Maicon Costa <maiconscosta@gmail.com>
* Stop supporting deprecated namespace format in TestExtendedJWT_Authenticate
Signed-off-by: Maicon Costa <maiconscosta@gmail.com>
* Update go mod
Signed-off-by: Maicon Costa <maiconscosta@gmail.com>
---------
Signed-off-by: Maicon Costa <maiconscosta@gmail.com>
* Chore: Add html_handler_requests metric for tracking requests handled by index.go
* make a member of HttpServer
* make it a histogram instead
* update test
* OAuth: Optionally make refresh tokens required if use_refresh_token is enabled
* make linter happy
* feedback: log missing refresh token during token refresh
* feedback: tweak wording in the message & change level
* remove feature toggle dataplaneFrontendFallback because GA
* remove feature toggle logic
* fix import
* fix toggle ownership from merging main
* merge main
* fix extra feature toggle