Commit Graph

1143 Commits

Author SHA1 Message Date
Ezequiel Victorero 227b596a46 Snapshots: Migrate API as dashboards k8s subresource (#113552) 2025-12-02 16:26:45 -03:00
Liza Detrick a112c6c169 Logs Explore: logsdrilldown authorizer permissions, rtkq (#114320)
* Logs Explore: logsdrilldown app platform authorizer permissions, rtkq
---------

Co-authored-by: Austin Pond <austin.pond@grafana.com>
2025-12-02 09:07:36 -08:00
Roberto Jiménez Sánchez f2694ce72f Provisioning: add generic version handling for dashboard export (#114691)
* feat(provisioning): add generic version handling for dashboard export

- Update export job to handle any dashboard version generically (v0, v1, v2, v3, etc.)
- Dynamically construct GroupVersionResource for any stored version
- Cache version-specific clients for efficiency
- Add comprehensive table-driven unit tests for multiple versions
- Add integration test to verify version handling end-to-end
- Remove unnecessary version shim from clean operation (deletion works by name)

* test: add unit test for v2 dashboard version (no suffix)
2025-12-02 16:44:24 +01:00
Todd Treece bdf529c545 Plugins: Support MT app registration (#113348) 2025-12-02 09:59:46 -05:00
Serge Zaitsev c15b1b6f10 Chore: Annotation store interface (#114100)
* annotation legacy store with api server, read only

* annotations are not addressable by ID for read operations

* add ownership for an app

* typo, of course

* fix go workspace

* update workspace

* copy annotation app in dockerfile

* experimenting with store interface

* finalising interfaces

* add tags as custom handler

* implement tags handler

* add missing config file

* mute linter

* update generated files

* update workspace
2025-12-01 12:28:46 +01:00
Steve Simpson eafc8ab1cd Alerting: Foundations of historian app. (#114463)
We have two historians in alerting - alert state and notification. The intention
of this app is to provide query capabilities for both.

In this initial commit, the existing /history API is simply cloned to the new
app. It is identical except that it will send Kubernetes-style error responses
instead of Grafana-style.

This approach was taken to implement the new app more iteratively - ideally we
would define a new API, but this requires quite a significant overhaul of the
backend code.
2025-11-28 11:51:56 +01:00
Alexander Zobnin 725df38dad Zanzana: Use team bindings write APIs on the client side (#114503)
* Zanzana: Use team bindings write APIs on the client side

* fix linter

* remove unused code

* Apply suggestions from code review

Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>

* fix syntax

---------

Co-authored-by: Gabriel MABILLE <gamab@users.noreply.github.com>
2025-11-28 11:45:14 +01:00
Daniele Stefano Ferru 8e4be891c5 Provisioning: add URL and Path in setting response (#114534)
* Provisioning: add URL and Path in setting response

* linting

* marking fields as non-required
2025-11-27 16:06:03 +01:00
Haris Rozajac 763067f8e1 Dashboard Schema V2: Force v2 when dashboardNewLayouts or v2DashboardAPI are enabled (#113548)
* SchemaV2: Convertion from v1beta1 to v2beta1

* Compare backend-frontend v1 convertion

* Compare backend-frontend v1 convertion

* Fix fe be diff

* Resolve DS issues

* Fix ds inconsistecnies

* fix legacy string value issues

* fix ds test

* fix layout issue

* update test

* Fix tests and issue with defaultConfig

* Update output

* Fix viz config convertion

* wip

* Fix v1 to v2 dashboard transformation differences

Major fixes implemented:
- Backend function names in conversion.go
- Backend group field logic for queries, annotations, and vizConfig
- Backend datasource resolution with map-based lookup
- Backend timezone handling (empty string vs browser)
- Backend annotation processing (empty array vs default annotation)
- Backend default values (editable, liveNow)
- Backend variable processing (definition, defaultKeys, refresh, refId)
- Backend panel layout (y position calculations)
- Backend VizConfig (Kind and Group fields, default values)
- Frontend snapshot issue (annotations not processing)
- Frontend datasource references (only when original has valid datasource)

Test results:
- annotation-conversions: PASSING (0 differences)
- dashboard-properties: 3 expected architectural differences
- panel-conversions: Multiple expected architectural differences
- variable-conversions: 7 expected architectural differences

All remaining differences are expected architectural choices between
backend persistence optimization and frontend UI consumption optimization.

* fix issues with panel and annotation queries with no datasource

* definition and regex

* Use proper v1beta1 resource when testing

* remove misc file

* fix ds provider test

* fix def ds test in response transformer

* fix remaining ResponseTransformers test

* timesettings, variable refresh, editable, liveNow, definition

* fix transformSceneToSaveModelSchemaV2 test

* revert legacyRow changes

* fix go lint issues

* normalize y coordinates when serializing a row

* clean up

* update tests

* use GetStringValue from schemaversion

* fix go lint - cyclomatic complexity

* update open api snapshot

* add migrated dashboards

* fix default panel type when panel type is not provided

* revert dash link changes for now

* fix

* fix nested panel issue and default ref in v1

* apply defaults to nested panels too

* update snapshots

* fix issues with annotations

* matchers, showLegend, annotations

* when converting also don't process queries that have only a refId

* fix issues with text var

* fix dash links

* default to collapse: false when serializing

* fix: filter refId from variable query specs in backend migration

- Add buildDataQueryKindForVariable function to filter refId for variables
- Remove default refId "A" in transformSingleQuery
- Only include __legacyStringValue for non-empty string queries
- Remove refId addition in transformSaveModelSchemaV2ToScene.getDataQueryForVariable
- Handle undefined queries gracefully in frontend and backend
- Ensure backend matches frontend behavior for query variable serialization

* fix: default variable refresh to 'never' to match frontend behavior

Change backend default for missing refresh field from 'onDashboardLoad'
to 'never' to match frontend defaultVariableRefresh() schema default

* fix: only include iconColor in annotations when it exists

- Frontend: Use defaultAnnotationQuerySpec().iconColor as fallback to match schema defaults
- Backend: Only set iconColor if it exists in v1 input (not using GetStringValue)
- Ensures iconColor is only included when present in original dashboard

* fix: use schema defaults for annotation enable, hide, and iconColor

- Use defaultAnnotationQuerySpec() to get schema defaults instead of hardcoded values
- Default enable to false (schema default) to match frontend behavior
- Use schema default for iconColor and hide fields
- Ensures consistency with frontend which uses defaultAnnotationQuerySpec() defaults

* fix: set collapse for hidden-header rows to match first explicit row

- When panels appear before the first explicit row, the hidden-header row's
  collapse should match the first explicit row's collapsed value
- Matches frontend behavior where collapse: panel.collapsed uses the next
  row panel's collapsed value
- Ensures consistency between frontend and backend when converting rows layout

* fix: handle constant variables with missing query value

- Frontend: Fix bug where undefined value was converted to string 'undefined'
  - Now defaults to empty string when value is undefined: value ? String(value) : ''
- Backend: Match frontend fix - default to empty string for text/value when query is missing
- Ensures consistency when constant variable query is missing from v1 dashboard

* Fix interval variable handling when query is missing

- Extract intervals from options when query is missing/empty (matches backend behavior)
- Handle undefined/null query in getIntervalsFromQueryString
- Handle missing current object/value in getCurrentValueForOldIntervalModel
- Update interval variable refresh to use literal 'onTimeRangeChanged' in schema
- Use defaultIntervalVariableSpec() for interval variable serialization
- Backend: Generate query string from options when query is missing

* Fix corrupted dashboard with systemRef override

* don't resolve types for template variables in datasource refs on the backend

* fix annotation and ds issues

* fix range and special mappings

* fix datasource var pluginId and regex

* add __systemRef to schema

* update v15 migration annotation to have a ds type because v2 keeps track of if type is in the initial save model, and if it's not it removes it, but for frontendOuput we are running transformSaveModelToScene which will then assign the type

* add migration fields since the backend applies automigrations in collapsed rows

* filter out queries in ResponseTransformer that only have refId field

* lint

* v2: add default query if queries are empty to match v1 behavior

* fix single migration test

* tracking test should have a defined spec otherwise datasource is removed and won't be tracked

* initialize default with default ds ref

* wip

* Do not assign DS if ds group is empty

* cleanup

* revert change in setupTests.ts

* clean up TODO

* query with only refId should not expect to have a group

* refactor: extract v0alpha1 to v1beta1 conversion logic into atomic function

- Extract ConvertDashboard_V0_to_V1beta1 into v0alpha1_to_v1beta1.go
- Extract prepareV0ConversionContext and migrateV0Dashboard helper functions
- Standardize v0.go to match v1.go pattern with inline multi-step conversions
- Implement Convert_V0_to_V2alpha1 using atomic functions (v0->v1beta1->v2alpha1)
- Implement Convert_V0_to_V2beta1 using atomic functions (v0->v1beta1->v2alpha1->v2beta1)
- Remove non-atomic v0alpha1_to_v2alpha1.go file

* test: add version-specific test files for conversion error handling

- Extract v0 conversion tests into v0_test.go
- Extract v1 conversion tests into v1_test.go
- Add v2 conversion tests in v2_test.go
- Ensure all error handling paths in conversion functions are covered
- Add tests for Convert_V0_to_V2alpha1 and Convert_V0_to_V2beta1 error paths
- Add tests for Convert_V1beta1_to_V2alpha1 and Convert_V1beta1_to_V2beta1 error paths
- Add tests for Convert_V2alpha1_to_V2beta1 error handling

* Fix tests

* Fix linter

* Clean up

* feat(dashboard): Add automatic data loss detection for dashboard conversions

Implements comprehensive data loss detection for all dashboard API version conversions.

Components Tracked:
• Panels (visualization + library panels)
• Queries (data source queries, excludes row panel queries)
• Annotations
• Links
• Variables (template variables)

Features:
• Automatic detection via withConversionMetrics wrapper (zero code changes)
• Error type: 'conversion_data_loss_error'
• Logs: panelsLost, queriesLost, annotationsLost, linksLost, variablesLost

Bugs Found:
• Fixed critical bug: metrics.go was silently swallowing ALL errors (return nil → return err)

Testing:
• TestDataLossDetectionOnAllInputFiles - runs all conversions with detailed logging
• V2→V0/V1 downgrades write output for debugging then skip (not yet implemented)
• All tests passing

* Run dashboards on schema v2 E2Es

* reveret unintended changes

* cleanup

* Reset active manager correctly according to toggles config

* Fix new dashboard being serialized as v1

* Rename toggle

---------

Co-authored-by: Ivan Ortega <ivanortegaalba@gmail.com>
Co-authored-by: Dominik Prokop <dominik.prokop@grafana.com>
2025-11-27 15:52:42 +01:00
Alexander Zobnin 80fc87339a Zanzana: Role binding hooks (#114470)
* Zanzana: Role bindings hooks WIP

* Empty hooks for role bindings

* implement hooks for role bindings

* add tests

* apply review suggestions
2025-11-27 15:11:34 +01:00
beejeebus ca8cad68c8 Add a metric to track usage of datasource configuration CRUD
This PR adds `ds_config_handler_requests_duration_seconds` metric to help us
track the release of the new datasource configuration CRUD api.

Fixes https://github.com/grafana/grafana-enterprise/issues/10309
2025-11-26 10:49:11 -05:00
Mustafa Sencer Özcan 4130bd9cd3 Revert "K8s: read resource configs from API Enablement for API Builders" (#114475)
Revert "K8s: read resource configs from API Enablement for API Builders (#114…"

This reverts commit 0c2707bbc4.
2025-11-26 16:15:24 +01:00
Matheus Macabu 21c1d9aedd Secrets: Remove unused methods and dependencies from secure value service (#114467) 2025-11-26 12:58:00 +01:00
Gabriel MABILLE 8c7170727b grafana-iam: Prevent crashloops of the standalone IAM server (#114473)
* `grafana-iam`: Prevent crashloops of the standalone IAM server
2025-11-26 12:54:50 +01:00
Tom Ratcliffe cef4449f14 Folders: Send permissions query param with app platform for folder picker (#114158) 2025-11-26 11:16:47 +00:00
Ezequiel Victorero da374527f2 ShortURL: K8s Implement custom authorizer (#114192) 2025-11-25 16:34:10 -03:00
Misi 93ec32dd6a IAM: Add teams/{id}/groups as a custom endpoint to Teams API (#114228)
* Add teams/{id}/groups as a custom endpoint

* TeamGroupsHandler OSS and Ent registration

* Update OpenAPI spec

* Add indexer tests for external group mapping

* Remove noopsearch

* Remove unnecessary interface declaration, fixes

* Chores

* fmt

* Rename constant

* Align the rest to the changes of main

* Update workspace

* Add missing file
2025-11-25 14:19:57 +01:00
Jean-Philippe Quéméner b57e6383e4 refactor(unified-storage): move builders in their own package (#114375) 2025-11-25 10:58:03 +01:00
Charandas 0c2707bbc4 K8s: read resource configs from API Enablement for API Builders (#114329) 2025-11-24 11:09:49 -08:00
Will Browne f1dbbcbe00 Plugins: Add /meta and /metas APIs to plugins app (#113775)
* add /meta and /metas APIs

* wrapped storage route

* format file

* fix switch statement lint issue

* fix plugininstaller test

---------

Co-authored-by: Todd Treece <todd.treece@grafana.com>
2025-11-24 18:20:11 +00:00
Mustafa Sencer Özcan 2f6836e78a fix: make graceful handle default for the malformed dashboard jsons during unified migration (#114295)
* fix: graceful handling by default

* fix: make fallback the default behavior
2025-11-24 14:12:52 +01:00
Denis Vodopianov 0e460a267e chore : Deprecating FeatureToggles.IsEnabled (#113062)
* Deprecating features.IsEnabled

* add one more nolint

* add one more nolint

* Give better hints to devs in the deprecation message of IsEnabledGlobally

* adding more doc strings

* fix linter after rebase

* Extend deprecation message
2025-11-21 18:43:42 +01:00
João Calisto c2c443757d Unified Storage: allow rebuilding indexes for resource with a new grpc endpoint (#113748)
* Unified Storage: allow rebuilding indexes for resource from a new grpc endpoint

* remove log line

* fix trace def

* lint

* fix after rebase

* addressing code review changes

* update with one channel per rebuild request

* other review suggestions

* update with review suggestions

* run mockery generate for MockResourceClient

* update tests

* update tests and lint

* fix test
2025-11-21 16:42:15 +00:00
Matheus Macabu 5e949fc955 Secrets: Fix secure value creation timestamp changing when updating it (#114290) 2025-11-21 16:31:17 +01:00
Bruno 0d67442f1a Secrets manager: create secure value using the active keeper (#114039)
* Secrets manager: create secure value using the active keeper

* SecureValueService.Update: fetch secure value from db to get the keeper

* fix keeper_store_test.go

* SecureValueService: fix bug in update where the current version keeper wasn't being passed to the createNewVersion method

* make gofmt

* remove outdated test

* update TestModel

* undo enterprise_imports changes

* use xkube.Namespace

* migrator: set secret_secure_value.keeper to 'system' when the column is null

* indent cue

* fix tests

* fix enterprise imports

* properly fix enterprise imports

* make update-workspace

* go mod tidy

---------

Co-authored-by: Matheus Macabu <macabu.matheus@gmail.com>
2025-11-21 11:20:16 -03:00
Oscar Kilhed e09905df35 SchemaV2: Add library panel repeat options to v2 schema during conversion (#114109)
* Add library panel repeat options to v2 schema during conversion

* use any instead of interface{}

* change to common.Unstructured instead of byte[] for model field

* Fix the tests and let the library panel behavior fetch repeat options in public and scripted dashboards

* fix library panel differences between backend and frontend conversion
2025-11-21 11:41:03 +01:00
Peter Štibraný 534ed3421b Fix search by both tags and folders. (#114246)
* Fix search by both tags and folders.

* Move // nolint:gocyclo to the new method.

* Revert unnecessary change.
2025-11-20 17:09:49 +01:00
Mustafa Sencer Özcan 30c04ab3fc feat: inject unified data migrations in dual writer (#114138)
* feat: draft changes for on-prem unified migration

* feat: further draft changes for on-prem unified migration

* fix: remove some tbis

* refactor: rename

* fix: another approach

* fix: background service related issues

* fix: address comments

* fix: make gen-go

* fix: background service related issues

* feat: refactor dual writer and legacy migrator

* fix: minor issues

* feat: working version in oss

* fix: wire

* fix: revert test data override

* fix: enterprise related issues

* chore: add todo

* fix: revert dual writer method

* fix: lint

* chore: logger format

* fix: reduce log level

* fix: log change

* fix: disable

* fix: address comments

* fix: return error on dual writer service

* fix: merge conflict

---------

Co-authored-by: Rafael Paulovic <rafael.paulovic@grafana.com>
2025-11-20 16:40:20 +01:00
Gabriel MABILLE b5a50e7772 grafana-iam: Use the UniStore as the default store (#113614)
* `grafana-iam`: Use the UniStore as the default store

* Refactor all instantiations

* Remove enableDualWriter

* Nit. dw is clear enough

* Use the correct access control client
2025-11-20 15:51:50 +01:00
Roberto Jiménez Sánchez 41276676eb Provisioning: add retry logic for transient errors in Kubernetes client (#114215)
* feat: add retry logic for transient errors in Kubernetes client

Add retry wrapper for dynamic.ResourceInterface that automatically retries
transient errors using Kubernetes' wait.ExponentialBackoff utility.

- Implements retry logic with exponential backoff for all Kubernetes API operations
- Detects transient errors: ServiceUnavailable, ServerTimeout, TooManyRequests,
  InternalError, Timeout, and network errors
- Uses wait.ExponentialBackoff from k8s.io/apimachinery/pkg/util/wait
- Respects context cancellation
- Includes comprehensive unit tests

Part of https://github.com/grafana/git-ui-sync-project/issues/634

* docs: add detailed documentation for defaultRetryBackoff

Document when retry attempts will happen, what errors trigger retries,
and the retry behavior (attempts, delays, exponential backoff, jitter).

* feat: add logging and increase retry attempts for Kubernetes client

- Add context logger to track retry attempts (Info for retries, Warn for exhaustion)
- Increase retry attempts from 5 to 8 steps (~10 seconds total retry window)
- Document when all retry attempts will fail:
  * API server completely unavailable/unreachable
  * Network connectivity issues persist beyond retry window
  * Consistent transient errors for entire retry duration
  * Context cancellation before retries complete

* chore: update retry client documentation

* fix: resolve linting issues in retry client

- Replace type assertions with errors.As for wrapped errors
- Remove deprecated Temporary() check (deprecated since Go 1.18)
- Update tests to remove temporary error test case

* fix: resolve staticcheck S1008 linting issue in retry_client.go

Simplify return statement to use errors.As directly instead of if-return pattern
2025-11-20 15:12:07 +01:00
Charandas 3c0d5745fe chore: remove remaining references to singular namespace (#114208) 2025-11-20 11:23:36 +01:00
Dave Henderson 7264803016 chore(deps): Switch to maintained gopkg.in/yaml fork (#114131)
Signed-off-by: Dave Henderson <dave.henderson@grafana.com>
2025-11-19 15:20:32 -05:00
Jean-Philippe Quéméner 0e4b701bd6 fix(dashboards): apply permission search filter if provided (#114147) 2025-11-19 13:05:07 +01:00
Misi 56c2c1cfe2 IAM: Add validation to ExternalGroupMapping (#113957)
* Add validation before ExternalGroupMapping creation

* Add FIXME to implement team lookup

* Lint
2025-11-19 09:48:09 +01:00
Mustafa Sencer Özcan 0d4ad01b65 feat: add unified data migrations for dashboard and folders (#113853) 2025-11-19 09:09:08 +01:00
Ryan McKinley 00329cab14 Stars: Move stars from preferences apiserver to a new collections apiserver (#114006) 2025-11-19 08:28:39 +03:00
owensmallwood 8dddff3ce4 Unified Storage: Pass ns, group, resource to GetResourceStats instead of just namespace (#114050)
* passes nsr to GetResourceStats instead of just namespace

* removes ns check

* fixes failing tests

* make update-workspace

* pass group and resource from rebuild request when getting resource stats
2025-11-18 13:05:21 -06:00
Gábor Farkas a1a73dde66 datasources: querier: forward more headers (#113897) 2025-11-18 10:33:59 +01:00
Costa Alexoglou faabe2e46d feat: add library elements to dash service (#114016) 2025-11-18 09:21:05 +01:00
Jean-Philippe Quéméner 64c61c6916 fix(dashboards): use index for schema migration datasource lookups (#113911) 2025-11-17 08:56:54 +01:00
Dave Henderson 1f1c75f817 Revert "OFREP: add ST builder for the authorizer, APIEnabled=false needs it" (#113983)
Revert "OFREP: add ST builder for the authorizer, APIEnabled=false needs it (…"

This reverts commit 9ca6ad3b49.
2025-11-14 21:54:48 +00:00
Andres Martinez Gotor bfa7ce9d78 Advisor: Remove legacy app register (#113773) 2025-11-14 12:25:30 +01:00
Roberto Jiménez Sánchez 1cc21a0705 Provisioning: Make image renderer note optional in PR comments (#113837)
* Provisioning: Remove image renderer note from PR comment template

Removes the 'NOTE: The image renderer is not configured' message from
the pull request comment template when image renderer is unavailable.
This addresses issue #656 in git-ui-sync-project.

- Updated commentTemplateMissingImageRenderer to be empty
- Updated testdata to reflect the change
- All unit tests pass

* Provisioning: Make image renderer note optional in PR comments

Make the image renderer note in pull request comments optional based on
the allowImageRendering configuration flag. When enabled, the note now
includes a link to the setup documentation.

- Add showImageRendererNote boolean field to commenter struct
- Update NewCommenter to accept showImageRendererNote parameter
- Update template to conditionally show note with documentation link
- Pass allowImageRendering from APIBuilder to commenter in register.go
- Update ProvidePullRequestWorker to use cfg.ProvisioningAllowImageRendering
- Add tests to verify note appears/disappears based on flag

Fixes https://github.com/grafana/git-ui-sync-project/issues/656

* Format code with go fmt

* Remove redundant text from image renderer note

Remove 'The image renderer is not configured.' from the note message.
The note now focuses on actionable guidance with the documentation link.

* Fix compilation error: use cfg.ProvisioningAllowImageRendering directly

Cannot access unexported field allowImageRendering from webhooks package.
Use cfg.ProvisioningAllowImageRendering directly since we have access to cfg.
2025-11-14 10:33:28 +01:00
Charandas 9ca6ad3b49 OFREP: add ST builder for the authorizer, APIEnabled=false needs it (#113809) 2025-11-13 09:19:16 -08:00
Roberto Jiménez Sánchez c1485ecf5f Provisioning: detect stale sync status and trigger resync (#113826)
* provisioning: detect stale sync status and trigger resync

When sync jobs expire and are cleaned up by the expired job cleanup
controller, the Repository sync status remains stuck in Pending or
Working state. This prevents new sync jobs from being queued because
shouldResync() blocks on these states.

This change adds detection logic in shouldResync() to check if a sync
job referenced in the sync status still exists. If the job doesn't exist
(NotFound), we trigger a resync to reconcile the stale state.

Fixes grafana/git-ui-sync-project#626

* test: remove unused mocks and fix test case

- Remove unused mockRepositoryLister and mockRepositoryNamespaceLister types
- Remove unused imports (labels, listers)
- Remove test case for sync disabled scenario as we don't care about sync enabled state when detecting stale status
2025-11-13 16:58:33 +00:00
Roberto Jiménez Sánchez 73657be5e7 Provisioning: Fix history write for expired jobs (#113764)
* refactor: Move job cleanup to separate controller and fix history write

- Created JobCleanupController in apps/provisioning/pkg/controller
- Separated cleanup logic from ConcurrentJobDriver
- Fixed bug where expired jobs were not written to history
- Added comprehensive tests with 93.8% coverage
- Removed cleanup interval parameter from ConcurrentJobDriver
- Cleanup now properly follows Complete + WriteJob pattern

Fixes expired jobs being lost instead of archived

* refactor: Update lease renewal interval to use jobExpiry variable

- Changed the lease renewal interval in the GetPostStartHooks method to utilize the jobExpiry variable for improved clarity and maintainability.

* Format code

* Fix Unix milliseconds

* fix: correct Unix timestamp assertions and remove duplicate test expectations

- Changed Unix() to UnixMilli() for correct millisecond timestamp validation
- Removed duplicate store.AssertExpectations(t) calls throughout tests
2025-11-13 10:54:07 +01:00
Charandas cbd794d0b8 Provisioning: fix regression with webhook authz failing in MT (#113793) 2025-11-12 13:21:28 -08:00
Andres Martinez Gotor d83c35fd71 Advisor: App installer setup (#113525) 2025-11-12 15:32:21 +01:00
Roberto Jiménez Sánchez cdc6a6114c Provisioning: Improve logging and tracing in job processing (#113454)
* Provisioning: Improve logging and tracing in job processing

- Add comprehensive tracing with OpenTelemetry spans across all job operations
- Enhance logging with consistent style: lowercase, concise messages, appropriate log levels
- Use past tense for completed lifecycle events (e.g., 'stopped' vs 'stop')
- Add structured logging with contextual attributes for better searchability
- Handle graceful shutdowns without throwing errors on context cancellation
- Refactor Cleanup method into listExpiredJobs and cleanUpExpiredJob for better code quality
- Avoid double logging by only logging errors when handled locally
- Add tracing and logging to historyjob controller cleanup operations

Files modified:
- pkg/registry/apis/provisioning/jobs/driver.go: Add tracing spans and improve error handling for graceful shutdown
- pkg/registry/apis/provisioning/jobs/concurrent_driver.go: Add tracing and consistent logging
- pkg/registry/apis/provisioning/jobs/persistentstore.go: Add comprehensive tracing and logging to all public methods, refactor cleanup
- apps/provisioning/pkg/controller/historyjob.go: Add tracing and improve logging consistency

* Update pkg/registry/apis/provisioning/jobs/persistentstore.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Refactor logging in persistentstore.go

- Remove debug log statements at the start of job operations for cleaner output
- Maintain structured logging with contextual attributes for improved traceability

Files modified:
- pkg/registry/apis/provisioning/jobs/persistentstore.go: Clean up logging for job operations

* Enhance logging and tracing in provisioning job operations

- Introduce OpenTelemetry spans for better observability in job processing and webhook handling
- Improve structured logging with contextual attributes for key operations
- Remove unnecessary tracing spans in long-running functions to streamline performance
- Update error handling to record errors in spans for better traceability

Files modified:
- pkg/registry/apis/provisioning/controller/repository.go: Add tracing and structured logging to sync job operations
- pkg/registry/apis/provisioning/jobs/concurrent_driver.go: Remove tracing span from long-running function
- pkg/registry/apis/provisioning/jobs/driver.go: Enhance logging and tracing in job processing
- pkg/registry/apis/provisioning/webhooks/webhook.go: Implement tracing and structured logging for webhook connections

* Update pkg/registry/apis/provisioning/jobs/driver.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Improve error handling in ConcurrentJobDriver to differentiate between graceful shutdown and unexpected stops

* Remove unused import in driver.go to clean up code

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-12 14:59:27 +01:00
beejeebus 6d64c373ce Allow FlagQueryServiceWithConnections to enable datasource config CRUD
The FlagGrafanaAPIServerWithExperimentalAPIs is only available when
`app_mode=development`. We have a more specific flag that is usable in
production, so use that.

Also, there was some old code constraining these APIs to a static list
of datasources. We don't need that anymore, so this PR removes it.

The FlagQueryServiceWithConnections is left as is, because there are
multiple existing tests that rely on this development-only, experimental
flag. I don't want to understand why that is.
2025-11-12 08:48:46 -05:00