* chore(deps): update nanogit to v0.3.0 in go.mod and go.sum files
* Add retry logic for nanogit client operations
- Configure retry logic in withGitContext to ensure all Git operations have retry support
- Use nanogit's ExponentialBackoffRetrier with 8 attempts (~10s retry window)
- Retry transient network errors and HTTP-specific server errors (5xx for GET/DELETE, 429 for all)
- Rename logger function to withGitContext to better reflect its responsibilities
* fix: resolve staticcheck S1008 linting issue in retry_client.go
Simplify return statement to use errors.As directly instead of if-return pattern
* Revert "fix: resolve staticcheck S1008 linting issue in retry_client.go"
This reverts commit bd367b5629.
* Provisioning: Improve logging and tracing in job processing
- Add comprehensive tracing with OpenTelemetry spans across all job operations
- Enhance logging with consistent style: lowercase, concise messages, appropriate log levels
- Use past tense for completed lifecycle events (e.g., 'stopped' vs 'stop')
- Add structured logging with contextual attributes for better searchability
- Handle graceful shutdowns without throwing errors on context cancellation
- Refactor Cleanup method into listExpiredJobs and cleanUpExpiredJob for better code quality
- Avoid double logging by only logging errors when handled locally
- Add tracing and logging to historyjob controller cleanup operations
Files modified:
- pkg/registry/apis/provisioning/jobs/driver.go: Add tracing spans and improve error handling for graceful shutdown
- pkg/registry/apis/provisioning/jobs/concurrent_driver.go: Add tracing and consistent logging
- pkg/registry/apis/provisioning/jobs/persistentstore.go: Add comprehensive tracing and logging to all public methods, refactor cleanup
- apps/provisioning/pkg/controller/historyjob.go: Add tracing and improve logging consistency
* Update pkg/registry/apis/provisioning/jobs/persistentstore.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Refactor logging in persistentstore.go
- Remove debug log statements at the start of job operations for cleaner output
- Maintain structured logging with contextual attributes for improved traceability
Files modified:
- pkg/registry/apis/provisioning/jobs/persistentstore.go: Clean up logging for job operations
* Enhance logging and tracing in provisioning job operations
- Introduce OpenTelemetry spans for better observability in job processing and webhook handling
- Improve structured logging with contextual attributes for key operations
- Remove unnecessary tracing spans in long-running functions to streamline performance
- Update error handling to record errors in spans for better traceability
Files modified:
- pkg/registry/apis/provisioning/controller/repository.go: Add tracing and structured logging to sync job operations
- pkg/registry/apis/provisioning/jobs/concurrent_driver.go: Remove tracing span from long-running function
- pkg/registry/apis/provisioning/jobs/driver.go: Enhance logging and tracing in job processing
- pkg/registry/apis/provisioning/webhooks/webhook.go: Implement tracing and structured logging for webhook connections
* Update pkg/registry/apis/provisioning/jobs/driver.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Improve error handling in ConcurrentJobDriver to differentiate between graceful shutdown and unexpected stops
* Remove unused import in driver.go to clean up code
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Validate Job Specs
* Add comprehensive unit test coverage for job validator
- Added 8 new test cases to improve coverage from 88.9% to ~100%
- Tests for migrate action without options
- Tests for delete/move actions with resources (missing kind)
- Tests for move action with valid resources
- Tests for move/delete with both paths and resources
- Tests for move action with invalid source paths
- Tests for push action with valid paths
Now covers all validation paths including resource validation and
edge cases for all job action types.
* Add integration tests for job validation
Added comprehensive integration tests that verify the job validator properly
rejects invalid job specifications via the API:
- Test job without action (required field)
- Test job with invalid action
- Test pull job without pull options
- Test push job without push options
- Test push job with invalid branch name (consecutive dots)
- Test push job with path traversal attempt
- Test delete job without paths or resources
- Test delete job with invalid path (path traversal)
- Test move job without target path
- Test move job without paths or resources
- Test move job with invalid target path (path traversal)
- Test migrate job without migrate options
- Test valid pull job to ensure validation doesn't block legitimate requests
These tests verify that the admission controller properly validates job specs
before they are persisted, ensuring security (path traversal prevention) and
data integrity (required fields/options).
* Remove valid job test case from integration tests
Removed the positive test case as it's not necessary for validation testing.
The integration tests now focus solely on verifying that invalid job specs
are properly rejected by the admission controller.
* Fix movejob_test to expect validation error at creation time
Updated the 'move without target path' test to expect the job creation
to fail with a validation error, rather than expecting the job to be
created and then fail during execution.
This aligns with the new job validation logic which rejects invalid
job specs at the API admission control level (422 Unprocessable Entity)
before they can be persisted.
This is better behavior as it prevents invalid jobs from being created
in the first place, rather than allowing them to be created and then
failing during execution.
* Simplify action validation using slices.Contains
Replaced manual loop with slices.Contains for cleaner, more idiomatic Go code.
This reduces code complexity while maintaining the same validation logic.
- Added import for 'slices' package
- Replaced 8-line loop with 1-line slices.Contains call
- All unit tests pass
* Refactor job action validation in ValidateJob function
Removed the hardcoded valid actions array and simplified the validation logic. The function now directly appends an error for invalid actions, improving code clarity and maintainability. This change aligns with the recent updates to job validation, ensuring that invalid job specifications are properly handled.
* adding some logs to better understand what might be happening
* only focus this PR on improve logging in finalizer handling
* debug log before calling finalizers
* working on finalizers
* removing last todos, adding unit tests
* better use SupportedFinalizers name
* addressing comments
* wip: fix tests and add delete error in status
* chore: codegen
* chore: codegen openapi
* Merge remote-tracking branch 'origin/main' into provisioning-finalisers-fix-2
* update frontend client
* fix: errors in testing
* fix: breaking test
---------
Co-authored-by: Daniele Ferru <daniele.ferru@grafana.com>
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* Configurable repository types in monolith and operator
* Default to Github in operators
* Regenerate wire
* Fix and implement unit tests
* Same types for enterprise tests
* Remove unnecessary conversion
* Remove the issue with import cycles
* Move repository package to apps
* Move operators to grafana/grafana
* Go mod tidy
* Own package by git sync team for now
* Merged
* Do not use settings in local extra
* Remove dependency on webhook extra
* Hack to work around issue with secure contracts
* Sync Go modules
* Revert "Move operators to grafana/grafana"
This reverts commit 9f19b30a2e.
* Add URLs to Job spec
* Rename them as RefURLs
* Implement RefURLs for Github
* Add Ref URLs to Jobs
* Worker Test
* Create the branch in the staged writer
* Regenerate Git mock
* Format code
* Consolidate ResourceURLs and RefURLs into one
* Fix broken tests
* Add HistoryJob back to Spec
* Generate client
* Put back History Jobs
* Add a controller to remove historic jobs after some minutes
* Start only if Loki is not used
* Format code
* Update OpenAPI spec
* Change log level
* Fix condition
* Fix staticcheck
* Use provisioning identity
* Fix registration APIs
* Fix readonly issue
* Update OpenAPI