7e45a300b9
* Deprecate Legacy Storage Migration in Backend * Change the messaging around legacy storage * Disable cards to connect * Commit import changes * Block repository creation if resources are in legacy storage * Update error message * Prettify * chore: uncomment unified migration * chore: adapt and fix tests * Remove legacy storage migration from frontend * Refactor provisioning job options by removing legacy storage and history fields - Removed the `History` field from `MigrateJobOptions` and related references in the codebase. - Eliminated the `LegacyStorage` field from `RepositoryViewList` and its associated comments. - Updated tests and generated OpenAPI schema to reflect these changes. - Simplified the `MigrationWorker` by removing dependencies on legacy storage checks. * Refactor OpenAPI schema and tests to remove deprecated fields - Removed the `history` field from `MigrateJobOptions` and updated the OpenAPI schema accordingly. - Eliminated the `legacyStorage` field from `RepositoryViewList` and its associated comments in the schema. - Updated integration tests to reflect the removal of these fields. * Fix typescript errors * Refactor provisioning code to remove legacy storage dependencies - Eliminated references to `dualwrite.Service` and related legacy storage checks across multiple files. - Updated `APIBuilder`, `RepositoryController`, and `SyncWorker` to streamline resource handling without legacy storage considerations. - Adjusted tests to reflect the removal of legacy storage mocks and dependencies, ensuring cleaner and more maintainable code. * Fix unit tests * Remove more references to legacy * Enhance provisioning wizard with migration options - Added a checkbox for migrating existing resources in the BootstrapStep component. - Updated the form context to track the new migration option. - Adjusted the SynchronizeStep and useCreateSyncJob hook to incorporate the migration logic. - Enhanced localization with new descriptions and labels for migration features. * Remove unused variable and dualwrite reference in provisioning code - Eliminated an unused variable declaration in `provisioning_manifest.go`. - Removed the `nil` reference for dualwrite in `repo_operator.go`, aligning with the standalone operator's assumption of unified storage. * Update go.mod and go.sum to include new dependencies - Added `github.com/grafana/grafana-app-sdk` version `0.48.5` and several indirect dependencies including `github.com/getkin/kin-openapi`, `github.com/hashicorp/errwrap`, and others. - Updated `go.sum` to reflect the new dependencies and their respective versions. * Refactor provisioning components for improved readability - Simplified the import statement in HomePage.tsx by removing unnecessary line breaks. - Consolidated props in the SynchronizeStep component for cleaner code. - Enhanced the layout of the ProvisioningWizard component by streamlining the rendering of the SynchronizeStep. * Deprecate MigrationWorker and clean up related comments - Removed the deprecated MigrationWorker implementation and its associated comments from the provisioning code. - This change reflects the ongoing effort to eliminate legacy components and improve code maintainability. * Fix linting issues * Add explicit comment * Update useResourceStats hook in BootstrapStep component to accept selected target - Modified the BootstrapStep component to pass the selected target to the useResourceStats hook. - Updated related tests to reflect the change in expected arguments for the useResourceStats hook. * fix(provisioning): Update migrate tests to match export-then-sync behavior for all repository types Updates test expectations for folder-type repositories to match the implementation changes where both folder and instance repository types now run export followed by sync. Only the namespace cleaner is skipped for folder-type repositories. Changes: - Update "should run export and sync for folder-type repositories" test to include export mocks - Update "should fail when sync job fails for folder-type repositories" test to include export mocks - Rename test to clarify that both export and sync run for folder types - Add proper mock expectations for SetMessage, StrictMaxErrors, Process, and ResetResults All migrate package tests now pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Update provisioning wizard text and improve resource counting display - Enhanced descriptions for migrating existing resources to clarify that unmanaged resources will also be included. - Refactored BootstrapStepResourceCounting component to simplify the rendering logic and ensure both external storage and unmanaged resources are displayed correctly. - Updated alert messages in SynchronizeStep to reflect accurate information regarding resource management during migration. - Adjusted localization strings for consistency with the new descriptions. * Update provisioning wizard alert messages for clarity and accuracy - Revised alert points to indicate that resources can still be modified during migration, with a note on potential export issues. - Clarified that resources will be marked as managed post-provisioning and that dashboards remain accessible throughout the process. * Fix issue with trigger wrong type of job * Fix export failure when folder already exists in repository When exporting resources to a repository, if a folder already exists, the Read() method would fail with "path component is empty" error. This occurred because: 1. Folders are identified by trailing slash (e.g., "Legacy Folder/") 2. The Read() method passes this path directly to GetTreeByPath() 3. GetTreeByPath() splits the path by "/" creating empty components 4. This causes the "path component is empty" error The fix strips the trailing slash before calling GetTreeByPath() to avoid empty path components, while still using the trailing slash convention to identify directories. The Create() method already handles this correctly by appending ".keep" to directory paths, which is why the first export succeeded but subsequent exports failed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix folder tree not updated when folder already exists in repository When exporting resources and a folder already exists in the repository, the folder was not being added to the FolderManager's tree. This caused subsequent dashboard exports to fail with "folder NOT found in tree". The fix adds the folder to fm.tree even when it already exists in the repository, ensuring all folders are available for resource lookups. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Revert "Merge remote-tracking branch 'origin/uncomment-unified-migration-code' into cleanup/deprecate-legacy-storage-migration-in-provisioning" This reverts commit6440fae342, reversing changes made toec39fb04f2. * fix: handle empty folder titles in path construction - Skip folders with empty titles in dirPath to avoid empty path components - Skip folders with empty paths before checking if they exist in repository - Fix unit tests to properly check useResourceStats hook calls with type annotations * Update workspace * Fix BootstrapStep tests after reverting unified migration merge Updated test expectations to match the current component behavior where resource counts are displayed for both instance and folder sync options. - Changed 'Empty' count expectation from 3 to 4 (2 cards × 2 counts each) - Changed '7 resources' test to use findAllByText instead of findByText since the count appears in multiple cards 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove bubbletee deps * Fix workspace * provisioning: update error message to reference enableMigration config Update the error message when provisioning cannot be used due to incompatible data format to instruct users to enable data migration for folders and dashboards using the enableMigration configuration introduced in PR #114857. Also update the test helper to include EnableMigration: true for both dashboards and folders to match the new configuration pattern. * provisioning: add comment explaining Mode5 and EnableMigration requirement Add a comment in the integration test helper explaining that Provisioning requires Mode5 (unified storage) and EnableMigration (data migration) as it expects resources to be fully migrated to unified storage. * Remove migrate resources checkbox from folder type provisioning wizard - Remove checkbox UI for migrating existing resources in folder type - Remove migrateExistingResources from migration logic - Simplify migration to only use requiresMigration flag - Remove unused translation keys - Update i18n strings * Fix linting * Remove unnecessary React Fragment wrapper in BootstrapStep * Address comments --------- Co-authored-by: Rafael Paulovic <rafael.paulovic@grafana.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
616 lines
20 KiB
Go
616 lines
20 KiB
Go
package controller
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
"fmt"
|
|
"time"
|
|
|
|
"go.opentelemetry.io/otel/attribute"
|
|
apierrors "k8s.io/apimachinery/pkg/api/errors"
|
|
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
|
|
"k8s.io/apimachinery/pkg/types"
|
|
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
|
|
"k8s.io/apimachinery/pkg/util/wait"
|
|
"k8s.io/apiserver/pkg/endpoints/request"
|
|
"k8s.io/client-go/tools/cache"
|
|
"k8s.io/client-go/util/workqueue"
|
|
|
|
"github.com/grafana/grafana-app-sdk/logging"
|
|
provisioning "github.com/grafana/grafana/apps/provisioning/pkg/apis/provisioning/v0alpha1"
|
|
client "github.com/grafana/grafana/apps/provisioning/pkg/generated/clientset/versioned/typed/provisioning/v0alpha1"
|
|
informer "github.com/grafana/grafana/apps/provisioning/pkg/generated/informers/externalversions/provisioning/v0alpha1"
|
|
listers "github.com/grafana/grafana/apps/provisioning/pkg/generated/listers/provisioning/v0alpha1"
|
|
"github.com/grafana/grafana/apps/provisioning/pkg/repository"
|
|
"github.com/grafana/grafana/pkg/apimachinery/identity"
|
|
"github.com/grafana/grafana/pkg/infra/tracing"
|
|
"github.com/grafana/grafana/pkg/registry/apis/provisioning/jobs"
|
|
"github.com/grafana/grafana/pkg/registry/apis/provisioning/resources"
|
|
"github.com/prometheus/client_golang/prometheus"
|
|
)
|
|
|
|
const loggerName = "provisioning-repository-controller"
|
|
|
|
const (
|
|
maxAttempts = 3
|
|
)
|
|
|
|
type queueItem struct {
|
|
key string
|
|
obj interface{}
|
|
attempts int
|
|
}
|
|
|
|
//go:generate mockery --name finalizerProcessor --structname MockFinalizerProcessor --inpackage --filename finalizer_mock.go --with-expecter
|
|
type finalizerProcessor interface {
|
|
process(ctx context.Context, repo repository.Repository, finalizers []string) error
|
|
}
|
|
|
|
// RepositoryController controls how and when CRD is established.
|
|
type RepositoryController struct {
|
|
client client.ProvisioningV0alpha1Interface
|
|
repoLister listers.RepositoryLister
|
|
repoSynced cache.InformerSynced
|
|
logger logging.Logger
|
|
|
|
jobs interface {
|
|
jobs.Queue
|
|
jobs.Store
|
|
}
|
|
finalizer finalizerProcessor
|
|
statusPatcher StatusPatcher
|
|
|
|
repoFactory repository.Factory
|
|
healthChecker *HealthChecker
|
|
// To allow injection for testing.
|
|
processFn func(item *queueItem) error
|
|
enqueueRepository func(obj any)
|
|
keyFunc func(obj any) (string, error)
|
|
|
|
queue workqueue.TypedRateLimitingInterface[*queueItem]
|
|
|
|
registry prometheus.Registerer
|
|
tracer tracing.Tracer
|
|
}
|
|
|
|
// NewRepositoryController creates new RepositoryController.
|
|
func NewRepositoryController(
|
|
provisioningClient client.ProvisioningV0alpha1Interface,
|
|
repoInformer informer.RepositoryInformer,
|
|
repoFactory repository.Factory,
|
|
resourceLister resources.ResourceLister,
|
|
clients resources.ClientFactory,
|
|
jobs interface {
|
|
jobs.Queue
|
|
jobs.Store
|
|
},
|
|
healthChecker *HealthChecker,
|
|
statusPatcher StatusPatcher,
|
|
registry prometheus.Registerer,
|
|
tracer tracing.Tracer,
|
|
parallelOperations int,
|
|
) (*RepositoryController, error) {
|
|
finalizerMetrics := registerFinalizerMetrics(registry)
|
|
|
|
rc := &RepositoryController{
|
|
client: provisioningClient,
|
|
repoLister: repoInformer.Lister(),
|
|
repoSynced: repoInformer.Informer().HasSynced,
|
|
queue: workqueue.NewTypedRateLimitingQueueWithConfig(
|
|
workqueue.DefaultTypedControllerRateLimiter[*queueItem](),
|
|
workqueue.TypedRateLimitingQueueConfig[*queueItem]{
|
|
Name: "provisioningRepositoryController",
|
|
},
|
|
),
|
|
repoFactory: repoFactory,
|
|
healthChecker: healthChecker,
|
|
statusPatcher: statusPatcher,
|
|
finalizer: &finalizer{
|
|
lister: resourceLister,
|
|
clientFactory: clients,
|
|
metrics: &finalizerMetrics,
|
|
maxWorkers: parallelOperations,
|
|
},
|
|
jobs: jobs,
|
|
logger: logging.DefaultLogger.With("logger", loggerName),
|
|
registry: registry,
|
|
tracer: tracer,
|
|
}
|
|
|
|
_, err := repoInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
|
|
AddFunc: rc.enqueue,
|
|
UpdateFunc: func(oldObj, newObj interface{}) {
|
|
rc.enqueue(newObj)
|
|
},
|
|
})
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
rc.processFn = rc.process
|
|
rc.enqueueRepository = rc.enqueue
|
|
rc.keyFunc = repoKeyFunc
|
|
|
|
return rc, nil
|
|
}
|
|
|
|
func repoKeyFunc(obj any) (string, error) {
|
|
repo, ok := obj.(*provisioning.Repository)
|
|
if !ok {
|
|
return "", fmt.Errorf("expected a Repository but got %T", obj)
|
|
}
|
|
return cache.DeletionHandlingMetaNamespaceKeyFunc(repo)
|
|
}
|
|
|
|
// Run starts the RepositoryController.
|
|
//
|
|
// Note: This function intentionally does NOT create a tracing span because it runs indefinitely
|
|
// until shutdown. Individual processing operations already have their own spans.
|
|
func (rc *RepositoryController) Run(ctx context.Context, workerCount int) {
|
|
defer utilruntime.HandleCrash()
|
|
defer rc.queue.ShutDown()
|
|
|
|
logger := rc.logger
|
|
ctx = logging.Context(ctx, logger)
|
|
logger.Info("Starting RepositoryController")
|
|
defer logger.Info("Shutting down RepositoryController")
|
|
|
|
if !cache.WaitForCacheSync(ctx.Done(), rc.repoSynced) {
|
|
return
|
|
}
|
|
|
|
logger.Info("Starting workers", "count", workerCount)
|
|
for i := 0; i < workerCount; i++ {
|
|
go wait.UntilWithContext(ctx, rc.runWorker, time.Second)
|
|
}
|
|
|
|
logger.Info("Started workers")
|
|
<-ctx.Done()
|
|
logger.Info("Shutting down workers")
|
|
}
|
|
|
|
func (rc *RepositoryController) runWorker(ctx context.Context) {
|
|
for rc.processNextWorkItem(ctx) {
|
|
}
|
|
}
|
|
|
|
func (rc *RepositoryController) enqueue(obj interface{}) {
|
|
key, err := rc.keyFunc(obj)
|
|
if err != nil {
|
|
utilruntime.HandleError(fmt.Errorf("couldn't get key for object: %v", err))
|
|
return
|
|
}
|
|
|
|
item := queueItem{key: key, obj: obj}
|
|
rc.queue.Add(&item)
|
|
}
|
|
|
|
// processNextWorkItem deals with one key off the queue.
|
|
// It returns false when it's time to quit.
|
|
func (rc *RepositoryController) processNextWorkItem(ctx context.Context) bool {
|
|
item, quit := rc.queue.Get()
|
|
if quit {
|
|
return false
|
|
}
|
|
defer rc.queue.Done(item)
|
|
|
|
// TODO: should we move tracking work to trace ids instead?
|
|
logger := logging.FromContext(ctx).With("work_key", item.key)
|
|
logger.Info("RepositoryController processing key")
|
|
|
|
err := rc.processFn(item)
|
|
if err == nil {
|
|
rc.queue.Forget(item)
|
|
return true
|
|
}
|
|
|
|
item.attempts++
|
|
logger = logger.With("error", err, "attempts", item.attempts)
|
|
logger.Error("RepositoryController failed to process key")
|
|
|
|
if item.attempts >= maxAttempts {
|
|
logger.Error("RepositoryController failed too many times")
|
|
rc.queue.Forget(item)
|
|
return true
|
|
}
|
|
|
|
if !apierrors.IsServiceUnavailable(err) {
|
|
logger.Info("RepositoryController will not retry")
|
|
rc.queue.Forget(item)
|
|
return true
|
|
} else {
|
|
logger.Info("RepositoryController will retry as service is unavailable")
|
|
}
|
|
|
|
utilruntime.HandleError(fmt.Errorf("%v failed with: %v", item, err))
|
|
rc.queue.AddRateLimited(item)
|
|
|
|
return true
|
|
}
|
|
|
|
func (rc *RepositoryController) handleDelete(ctx context.Context, obj *provisioning.Repository) error {
|
|
logger := logging.FromContext(ctx)
|
|
logger.Info("handle repository delete")
|
|
|
|
// Process any finalizers
|
|
if len(obj.Finalizers) > 0 {
|
|
repo, err := rc.repoFactory.Build(ctx, obj)
|
|
if err != nil {
|
|
return fmt.Errorf("create repository from configuration: %w", err)
|
|
}
|
|
|
|
err = rc.finalizer.process(ctx, repo, obj.Finalizers)
|
|
if err != nil {
|
|
if statusErr := rc.updateDeleteStatus(ctx, obj, fmt.Errorf("remove finalizers: %w", err)); statusErr != nil {
|
|
logger.Error("failed to update repository status after finalizer removal error", "error", statusErr)
|
|
}
|
|
return fmt.Errorf("process finalizers: %w", err)
|
|
}
|
|
|
|
// remove the finalizers
|
|
_, err = rc.client.Repositories(obj.GetNamespace()).
|
|
Patch(ctx, obj.Name, types.JSONPatchType, []byte(`[
|
|
{ "op": "remove", "path": "/metadata/finalizers" }
|
|
]`), v1.PatchOptions{
|
|
FieldManager: "provisioning-controller",
|
|
})
|
|
if err != nil {
|
|
return fmt.Errorf("remove finalizers: %w", err)
|
|
}
|
|
return nil
|
|
} else {
|
|
logger.Info("no finalizers to process")
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func (rc *RepositoryController) updateDeleteStatus(ctx context.Context, obj *provisioning.Repository, err error) error {
|
|
logger := logging.FromContext(ctx)
|
|
logger.Info("updating repository status with deletion error", "error", err.Error())
|
|
return rc.statusPatcher.Patch(ctx, obj, map[string]interface{}{
|
|
"op": "replace",
|
|
"path": "/status/deleteError",
|
|
"value": err.Error(),
|
|
})
|
|
}
|
|
|
|
func (rc *RepositoryController) shouldResync(ctx context.Context, obj *provisioning.Repository) bool {
|
|
// don't trigger resync if a sync was never started
|
|
if obj.Status.Sync.Finished == 0 && obj.Status.Sync.State == "" {
|
|
return false
|
|
}
|
|
|
|
syncAge := time.Since(time.UnixMilli(obj.Status.Sync.Finished))
|
|
syncInterval := time.Duration(obj.Spec.Sync.IntervalSeconds) * time.Second
|
|
tolerance := time.Second
|
|
|
|
// Check for stale sync status - if sync status indicates a job is running but the job no longer exists
|
|
// Only check if Finished is set (meaning a sync has completed before) to avoid interfering with initial syncs
|
|
// Only trigger resync if sync is enabled and sync interval has elapsed (to avoid unnecessary operations)
|
|
if obj.Status.Sync.Finished > 0 &&
|
|
obj.Spec.Sync.Enabled &&
|
|
(obj.Status.Sync.State == provisioning.JobStatePending || obj.Status.Sync.State == provisioning.JobStateWorking) &&
|
|
obj.Status.Sync.JobID != "" {
|
|
_, err := rc.jobs.Get(ctx, obj.Namespace, obj.Status.Sync.JobID)
|
|
if apierrors.IsNotFound(err) {
|
|
// Job was cleaned up but sync status wasn't updated - trigger resync to reconcile
|
|
// Only trigger if sync interval has elapsed to avoid unnecessary operations
|
|
if syncAge >= (syncInterval - tolerance) {
|
|
logger := logging.FromContext(ctx)
|
|
logger.Info("detected stale sync status", "job_id", obj.Status.Sync.JobID)
|
|
return true
|
|
}
|
|
}
|
|
// For other errors, log but continue with normal logic
|
|
if err != nil {
|
|
logger := logging.FromContext(ctx)
|
|
logger.Warn("failed to check job existence for stale sync status", "error", err, "job_id", obj.Status.Sync.JobID)
|
|
}
|
|
}
|
|
|
|
// HACK: how would this work in a multi-tenant world or under heavy load?
|
|
// It will start queueing up jobs and we will have to deal with that
|
|
pendingForTooLong := syncAge >= syncInterval/2 && obj.Status.Sync.State == provisioning.JobStatePending
|
|
isRunning := obj.Status.Sync.State == provisioning.JobStateWorking
|
|
|
|
return obj.Spec.Sync.Enabled && syncAge >= (syncInterval-tolerance) && !pendingForTooLong && !isRunning
|
|
}
|
|
|
|
func (rc *RepositoryController) runHooks(ctx context.Context, repo repository.Repository, obj *provisioning.Repository) ([]map[string]interface{}, error) {
|
|
logger := logging.FromContext(ctx)
|
|
hooks, _ := repo.(repository.Hooks)
|
|
if hooks == nil {
|
|
return nil, nil
|
|
}
|
|
|
|
if obj.Status.ObservedGeneration < 1 {
|
|
logger.Info("handle repository create")
|
|
patchOperations, err := hooks.OnCreate(ctx)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("error running OnCreate: %w", err)
|
|
}
|
|
return patchOperations, nil
|
|
}
|
|
|
|
logger.Info("handle repository spec update", "Generation", obj.Generation, "ObservedGeneration", obj.Status.ObservedGeneration)
|
|
patchOperations, err := hooks.OnUpdate(ctx)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("error running OnUpdate: %w", err)
|
|
}
|
|
|
|
return patchOperations, nil
|
|
}
|
|
|
|
func (rc *RepositoryController) determineSyncStrategy(ctx context.Context, obj *provisioning.Repository, repo repository.Repository, shouldResync bool, healthStatus provisioning.HealthStatus) *provisioning.SyncJobOptions {
|
|
logger := logging.FromContext(ctx)
|
|
|
|
switch {
|
|
case !obj.Spec.Sync.Enabled:
|
|
logger.Info("skip sync as it's disabled")
|
|
return nil
|
|
case !healthStatus.Healthy:
|
|
logger.Info("skip sync for unhealthy repository")
|
|
return nil
|
|
case healthStatus.Healthy != obj.Status.Health.Healthy:
|
|
logger.Info("repository became healthy, full resync")
|
|
return &provisioning.SyncJobOptions{}
|
|
case obj.Status.ObservedGeneration < 1:
|
|
logger.Info("full sync for new repository")
|
|
return &provisioning.SyncJobOptions{}
|
|
case obj.Generation != obj.Status.ObservedGeneration:
|
|
logger.Info("full sync for spec change")
|
|
return &provisioning.SyncJobOptions{}
|
|
case shouldResync:
|
|
// Continue to see if we could skip for other reasons
|
|
versioned, ok := repo.(repository.Versioned)
|
|
// If the repository is not versioned, we don't have a way to check for incremental updates
|
|
if !ok {
|
|
logger.Info("full sync on interval for non-versioned repository")
|
|
return &provisioning.SyncJobOptions{}
|
|
}
|
|
latestRef, err := versioned.LatestRef(ctx)
|
|
if err != nil {
|
|
logger.Warn("incremental sync on interval without knowing if ref has actually changed", "error", err)
|
|
return &provisioning.SyncJobOptions{Incremental: true}
|
|
}
|
|
|
|
// Only resync if the latest ref is different from the last synced ref
|
|
if latestRef == obj.Status.Sync.LastRef {
|
|
logger.Info("skip incremental sync as reference is the same")
|
|
return nil
|
|
}
|
|
|
|
// Whenever possible, we try to keep it as an incremental sync to keep things performant.
|
|
// However, if there are any .keep file deletions inside a folder with no other deletions, we need
|
|
// to do a full sync to see if the folder was deleted as well in git.
|
|
incremental, err := shouldUseIncrementalSync(ctx, versioned, obj, latestRef)
|
|
if err != nil {
|
|
logger.Warn("unable to compare files for incremental sync, doing full sync", "error", err)
|
|
return &provisioning.SyncJobOptions{}
|
|
}
|
|
|
|
logger.Info("sync on interval", "incremental", incremental)
|
|
return &provisioning.SyncJobOptions{Incremental: incremental}
|
|
default:
|
|
return nil
|
|
}
|
|
}
|
|
|
|
func shouldUseIncrementalSync(ctx context.Context, versioned repository.Versioned, obj *provisioning.Repository, latestRef string) (bool, error) {
|
|
changes, err := versioned.CompareFiles(ctx, obj.Status.Sync.LastRef, latestRef)
|
|
if err != nil {
|
|
return false, err
|
|
}
|
|
var deletedPaths []string
|
|
for _, change := range changes {
|
|
if change.Action == repository.FileActionDeleted {
|
|
deletedPaths = append(deletedPaths, change.Path)
|
|
}
|
|
}
|
|
|
|
return repository.CanUseIncrementalSync(deletedPaths), nil
|
|
}
|
|
|
|
func (rc *RepositoryController) addSyncJob(ctx context.Context, obj *provisioning.Repository, syncOptions *provisioning.SyncJobOptions) error {
|
|
ctx, span := rc.tracer.Start(ctx, "provisioning.controller.add_sync_job")
|
|
defer span.End()
|
|
|
|
span.SetAttributes(
|
|
attribute.String("repository", obj.GetName()),
|
|
attribute.String("namespace", obj.Namespace),
|
|
attribute.Bool("incremental", syncOptions != nil && syncOptions.Incremental),
|
|
)
|
|
|
|
job, err := rc.jobs.Insert(ctx, obj.Namespace, provisioning.JobSpec{
|
|
Repository: obj.GetName(),
|
|
Action: provisioning.JobActionPull,
|
|
Pull: syncOptions,
|
|
})
|
|
if apierrors.IsAlreadyExists(err) {
|
|
logging.FromContext(ctx).Info("sync job already exists")
|
|
return nil
|
|
}
|
|
if err != nil {
|
|
span.RecordError(err)
|
|
// FIXME: should we update the status of the repository if we fail to add the job?
|
|
return fmt.Errorf("error adding sync job: %w", err)
|
|
}
|
|
|
|
span.SetAttributes(attribute.String("job.name", job.Name))
|
|
return nil
|
|
}
|
|
|
|
func (rc *RepositoryController) determineSyncStatusOps(obj *provisioning.Repository, syncOptions *provisioning.SyncJobOptions, healthStatus provisioning.HealthStatus) []map[string]interface{} {
|
|
const unhealthyMessage = "Repository is unhealthy"
|
|
|
|
hasUnhealthyMessage := len(obj.Status.Sync.Message) > 0 && obj.Status.Sync.Message[0] == unhealthyMessage
|
|
var patchOperations []map[string]interface{}
|
|
|
|
switch {
|
|
case syncOptions != nil:
|
|
// We will try to trigger a new sync job if we have sync options
|
|
patchOperations = append(patchOperations, map[string]interface{}{
|
|
"op": "replace",
|
|
"path": "/status/sync/state",
|
|
"value": provisioning.JobStatePending,
|
|
})
|
|
patchOperations = append(patchOperations, map[string]interface{}{
|
|
"op": "replace",
|
|
"path": "/status/sync/started",
|
|
"value": int64(0),
|
|
})
|
|
case healthStatus.Healthy && hasUnhealthyMessage: // if the repository is healthy and the message is set, clear it
|
|
// FIXME: is this the clearest way to do this? Should we introduce another status or way of way of handling more
|
|
// specific errors?
|
|
patchOperations = append(patchOperations, map[string]interface{}{
|
|
"op": "replace",
|
|
"path": "/status/sync/message",
|
|
"value": []string{},
|
|
})
|
|
case !healthStatus.Healthy && !hasUnhealthyMessage: // if the repository is unhealthy and the message is not already set, set it
|
|
patchOperations = append(patchOperations, map[string]interface{}{
|
|
"op": "replace",
|
|
"path": "/status/sync/state",
|
|
"value": provisioning.JobStateError,
|
|
})
|
|
patchOperations = append(patchOperations, map[string]interface{}{
|
|
"op": "replace",
|
|
"path": "/status/sync/message",
|
|
"value": []string{unhealthyMessage},
|
|
})
|
|
}
|
|
|
|
return patchOperations
|
|
}
|
|
|
|
//nolint:gocyclo
|
|
func (rc *RepositoryController) process(item *queueItem) error {
|
|
logger := rc.logger.With("key", item.key)
|
|
ctx := logging.Context(context.Background(), logger)
|
|
|
|
namespace, name, err := cache.SplitMetaNamespaceKey(item.key)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
obj, err := rc.repoLister.Repositories(namespace).Get(name)
|
|
switch {
|
|
case apierrors.IsNotFound(err):
|
|
return errors.New("repository not found in cache")
|
|
case err != nil:
|
|
return err
|
|
}
|
|
|
|
ctx, _, err = identity.WithProvisioningIdentity(ctx, namespace)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
ctx = request.WithNamespace(ctx, namespace)
|
|
logger = logger.WithContext(ctx)
|
|
|
|
if obj.DeletionTimestamp != nil {
|
|
return rc.handleDelete(ctx, obj)
|
|
}
|
|
|
|
shouldResync := rc.shouldResync(ctx, obj)
|
|
shouldCheckHealth := rc.healthChecker.ShouldCheckHealth(obj)
|
|
hasSpecChanged := obj.Generation != obj.Status.ObservedGeneration
|
|
patchOperations := []map[string]interface{}{}
|
|
|
|
// Determine the main triggering condition
|
|
switch {
|
|
case hasSpecChanged:
|
|
logger.Info("spec changed", "Generation", obj.Generation, "ObservedGeneration", obj.Status.ObservedGeneration)
|
|
patchOperations = append(patchOperations, map[string]interface{}{
|
|
"op": "replace",
|
|
"path": "/status/observedGeneration",
|
|
"value": obj.Generation,
|
|
})
|
|
case shouldResync:
|
|
logger.Info("sync interval triggered", "sync_interval", time.Duration(obj.Spec.Sync.IntervalSeconds)*time.Second, "sync_status", obj.Status.Sync)
|
|
case shouldCheckHealth:
|
|
logger.Info("health is stale", "health_status", obj.Status.Health.Healthy)
|
|
default:
|
|
logger.Info("skipping as conditions are not met", "status", obj.Status, "generation", obj.Generation, "sync_spec", obj.Spec.Sync)
|
|
return nil
|
|
}
|
|
|
|
repo, err := rc.repoFactory.Build(ctx, obj)
|
|
if err != nil {
|
|
return fmt.Errorf("unable to create repository from configuration: %w", err)
|
|
}
|
|
|
|
// Handle hooks - may return early if hooks fail
|
|
hookOps, shouldContinue, err := rc.processHooks(ctx, repo, obj)
|
|
if err != nil {
|
|
return fmt.Errorf("process hooks: %w", err)
|
|
}
|
|
if !shouldContinue {
|
|
return nil // Hook handling already updated status and returned early
|
|
}
|
|
if len(hookOps) > 0 {
|
|
patchOperations = append(patchOperations, hookOps...)
|
|
}
|
|
|
|
// Handle health checks using the health checker
|
|
_, healthStatus, healthPatchOps, err := rc.healthChecker.RefreshHealthWithPatchOps(ctx, repo)
|
|
if err != nil {
|
|
return fmt.Errorf("update health status: %w", err)
|
|
}
|
|
|
|
// Add health patch operations first
|
|
if len(healthPatchOps) > 0 {
|
|
patchOperations = append(patchOperations, healthPatchOps...)
|
|
}
|
|
|
|
// determine the sync strategy and sync status to apply
|
|
syncOptions := rc.determineSyncStrategy(ctx, obj, repo, shouldResync, healthStatus)
|
|
patchOperations = append(patchOperations, rc.determineSyncStatusOps(obj, syncOptions, healthStatus)...)
|
|
|
|
// Apply all patch operations
|
|
if len(patchOperations) > 0 {
|
|
err := rc.statusPatcher.Patch(ctx, obj, patchOperations...)
|
|
if err != nil {
|
|
return fmt.Errorf("status patch operations failed: %w", err)
|
|
}
|
|
}
|
|
|
|
// QUESTION: should we trigger the sync job after we have applied all patch operations or before?
|
|
// Is there are risk of race condition here?
|
|
// Trigger sync job after we have applied all patch operations
|
|
if syncOptions != nil {
|
|
if err := rc.addSyncJob(ctx, obj, syncOptions); err != nil {
|
|
return err
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// processHooks handles hook execution with intelligent retry logic
|
|
// Returns hook operations, whether processing should continue, and any error
|
|
func (rc *RepositoryController) processHooks(ctx context.Context, repo repository.Repository, obj *provisioning.Repository) ([]map[string]interface{}, bool, error) {
|
|
shouldRunHooks := obj.Generation != obj.Status.ObservedGeneration
|
|
|
|
// Skip hooks if status already indicates recent hook failure to avoid infinite retry
|
|
if shouldRunHooks && rc.healthChecker.HasRecentFailure(obj.Status.Health, provisioning.HealthFailureHook) {
|
|
shouldRunHooks = false
|
|
}
|
|
|
|
if !shouldRunHooks {
|
|
return nil, true, nil
|
|
}
|
|
|
|
hookOps, err := rc.runHooks(ctx, repo, obj)
|
|
if err != nil {
|
|
if err := rc.healthChecker.RecordFailure(ctx, provisioning.HealthFailureHook, err, obj); err != nil {
|
|
return nil, false, fmt.Errorf("update status after hook failure: %w", err)
|
|
}
|
|
|
|
return nil, false, err
|
|
}
|
|
|
|
return hookOps, true, nil
|
|
}
|