Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncContext.runTasks doesn't wait for pruning to complete and creates the same resource again #536

Open
ashi009 opened this issue Aug 16, 2023 · 0 comments

Comments

@ashi009
Copy link

ashi009 commented Aug 16, 2023

We recently encountered an issue when using tigera-operator which ends up in infinity syncing:

time log
2023-08-13T20:00:30Z Adding resource result, status: 'SyncFailed', phase: '', message: 'the server is currently unable to handle the request'
2023-08-13T20:00:30Z Adding resource result, status: 'SyncFailed', phase: '', message: 'the server is currently unable to handle the request'
2023-08-13T20:00:40Z Adding resource result, status: 'SyncFailed', phase: '', message: 'the server is currently unable to handle the request'  
2023-08-13T20:00:40Z Adding resource result, status: 'SyncFailed', phase: '', message: 'the server is currently unable to handle the request'
2023-08-13T20:01:02Z Adding resource result, status: 'Pruned', phase: 'Succeeded', message: 'pruned'
2023-08-13T20:01:02Z Adding resource result, status: 'Synced', phase: 'Running', message: 'installation.operator.tigera.io/default serverside-applied'
2023-08-13T20:01:02Z Adding resource result, status: 'Synced', phase: 'Running', message: 'felixconfiguration.projectcalico.org/default serverside-applied. Warning: Detected changes to resource default which is currently being deleted.'

After pruning the CR felixconfiguration.projectcalico.org/default, it tries to sync to that CR again. However that CR is pending for deletion, thus API server returns a warning on that. Which then causes controller to assume the sync succeeded (though the CR got removed for real after a while,) then it tries to sync over and over again, and keep having this pattern:

time log
2023-08-13T20:02:09Z Adding resource result, status: 'Pruned', phase: 'Succeeded', message: 'pruned'
2023-08-13T20:02:09Z Adding resource result, status: 'Synced', phase: 'Running', message: 'felixconfiguration.projectcalico.org/default serverside-applied. Warning: Detected changes to resource default which is currently being deleted.'

This was resolved by restarting the controller itself:

time log
2023-08-14T04:41:14Z Refreshing app status (controller refresh requested), level (1)
2023-08-14T04:41:14Z Comparing app state (cluster: https://kubernetes.default.svc, namespace: calico-system)
2023-08-14T04:41:14Z getRepoObjs stats
2023-08-14T04:41:14Z Initiated automated sync to '0.1.6'
2023-08-14T04:41:14Z Initialized new operation: {&SyncOperation{Revision:0.1.6,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:projectcalico.org,Kind:FelixConfiguration,Name:default,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[ServerSideApply=true],} { true} [] {5 nil}}
2023-08-14T04:41:14Z Comparing app state (cluster: https://kubernetes.default.svc, namespace: calico-system)
2023-08-14T04:41:14Z Update successful
2023-08-14T04:41:14Z Reconciliation completed
2023-08-14T04:41:14Z getRepoObjs stats
2023-08-14T04:41:14Z Syncing
2023-08-14T04:41:14Z Tasks (dry-run)
2023-08-14T04:41:15Z Refreshing app status (controller refresh requested), level (1)
2023-08-14T04:41:15Z Comparing app state (cluster: https://kubernetes.default.svc, namespace: calico-system)
2023-08-14T04:41:15Z Updating operation state. phase: Running -> Running, message: '' -> 'one or more tasks are running'
2023-08-14T04:41:15Z Adding resource result, status: 'Synced', phase: 'Running', message: 'felixconfiguration.projectcalico.org/default serverside-applied'
2023-08-14T04:41:15Z Updating operation state. phase: Running -> Succeeded, message: 'one or more tasks are running' -> 'successfully synced (all tasks run)'
2023-08-14T04:41:15Z sync/terminate complete

This time it didn't prune first, and went ahead to sync it directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant