Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PreSync phase gets confused (and then stuck) when deleting PreSync resources #15292

Open
3 tasks done
bcbrockway opened this issue Aug 30, 2023 · 3 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@bcbrockway
Copy link

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Sometimes, when deleting PreSync hook resources with a BeforeHookCreation deletion policy, the whole sync can become stuck. The only fix is to manually delete the resource using kubectl which usually kicks the sync back into gear. Sometimes we need to do this and then manually terminate and start the sync again.

Possibly related to argoproj/gitops-engine#461 but really not sure.

To Reproduce

I've tried, really I have, to get some reproducible steps but I can't figure out what's going on.

Expected behavior

The PreSync resources should be deleted, then recreated, then the sync should continue as normal.

Version

v2.7.1+5e54351.dirty

Logs

You can see in the logs that the sync 00008-tkXZv seems to delete the ServiceAccount and ExternalSecret ok:

2023-08-25T09:42:16+01:00       {"application":"argocd/cds","level":"info","msg":"Wet-run","syncId":"00008-tkXZv","tasks":"[PreSync/-1 hook /ServiceAccount:clients/cds obj-\u003eobj (,,), PreSync/-1 hook external-secrets.io/ExternalSecret:clients/cds-app obj-\u003eobj (,,)]","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","dryRun":false,"level":"info","msg":"Running tasks","numTasks":2,"syncId":"00008-tkXZv","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","dryRun":false,"level":"info","msg":"Deleting","syncId":"00008-tkXZv","task":"PreSync/-1 hook /ServiceAccount:clients/cds obj-\u003eobj (,,)","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","level":"info","msg":"Deleting resource","syncId":"00008-tkXZv","task":"PreSync/-1 hook /ServiceAccount:clients/cds obj-\u003eobj (,,)","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","dryRun":false,"level":"info","msg":"Deleting","syncId":"00008-tkXZv","task":"PreSync/-1 hook external-secrets.io/ExternalSecret:clients/cds-app obj-\u003eobj (,,)","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","level":"info","msg":"Deleting resource","syncId":"00008-tkXZv","task":"PreSync/-1 hook external-secrets.io/ExternalSecret:clients/cds-app obj-\u003eobj (,,)","time":"2023-08-25T08:42:16Z"}

Then there's some weirdness... Maybe the selfHeal functionality is interfering here?

2023-08-25T09:42:16+01:00       {"level":"info","msg":"Start Update application operation state","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"level":"debug","msg":"Refreshing app argocd/cds for change in cluster of object clients/cds of type v1/ServiceAccount","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","level":"info","msg":"Refreshing app status (controller refresh requested), level (1)","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","level":"info","msg":"Comparing app state (cluster: https://kubernetes.default.svc, namespace: clients)","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","build_options_ms":0,"helm_ms":0,"level":"info","msg":"getRepoObjs stats","plugins_ms":0,"repo_ms":0,"time":"2023-08-25T08:42:16Z","time_ms":13,"unmarshal_ms":12,"version_ms":0}
2023-08-25T09:42:16+01:00       {"application":"argocd/cds","level":"debug","msg":"Retrieved live manifests","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"level":"debug","msg":"Refreshing app argocd/cds for change in cluster of object clients/cds-app of type external-secrets.io/v1beta1/ExternalSecret","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"level":"info","msg":"Completed Update application operation state","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"level":"debug","msg":"Refreshing app argocd/cds for change in cluster of object clients/cds-app of type v1/Secret","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:16+01:00       {"level":"debug","msg":"Refreshing app argocd/cds for change in cluster of object clients/cds-app of type external-secrets.io/v1beta1/ExternalSecret","time":"2023-08-25T08:42:16Z"}
2023-08-25T09:42:17+01:00       {"application":"argocd/cds","level":"info","msg":"Skipping auto-sync: another operation is in progress","time":"2023-08-25T08:42:17Z"}
2023-08-25T09:42:17+01:00       {"application":"argocd/cds","level":"info","msg":"No status changes. Skipping patch","time":"2023-08-25T08:42:17Z"}
2023-08-25T09:42:17+01:00       {"application":"argocd/cds","dedup_ms":0,"dest-name":"","dest-namespace":"clients","dest-server":"https://kubernetes.default.svc","diff_ms":564,"fields.level":1,"git_ms":13,"health_ms":4,"level":"info","live_ms":7,"msg":"Reconciliation completed","settings_ms":0,"sync_ms":0,"time":"2023-08-25T08:42:17Z","time_ms":1142}
2023-08-25T09:42:17+01:00       {"application":"argocd/cds","level":"info","msg":"Refreshing app status (controller refresh requested), level (1)","time":"2023-08-25T08:42:17Z"}
2023-08-25T09:42:17+01:00       {"application":"argocd/cds","level":"info","msg":"Comparing app state (cluster: https://kubernetes.default.svc, namespace: clients)","time":"2023-08-25T08:42:17Z"}
2023-08-25T09:42:17+01:00       {"application":"argocd/cds","build_options_ms":0,"helm_ms":2,"level":"info","msg":"getRepoObjs stats","plugins_ms":0,"repo_ms":0,"time":"2023-08-25T08:42:17Z","time_ms":120,"unmarshal_ms":118,"version_ms":0}
2023-08-25T09:42:17+01:00       {"application":"argocd/cds","level":"debug","msg":"Retrieved live manifests","time":"2023-08-25T08:42:17Z"}

Then finally the sync reports it's complete, but it hasn't even started the sync phase yet:

2023-08-25T09:42:18+01:00       {"application":"argocd/cds","duration":5060298213,"level":"info","msg":"sync/terminate complete","syncId":"00008-tkXZv","time":"2023-08-25T08:42:18Z"}

After this another sync starts:

2023-08-25T09:42:18+01:00       {"level":"info","msg":"Start Update application operation state","time":"2023-08-25T08:42:18Z"}
2023-08-25T09:42:18+01:00       {"level":"info","msg":"updated 'argocd/cds' operation (phase: Running)","time":"2023-08-25T08:42:18Z"}
2023-08-25T09:42:18+01:00       {"level":"info","msg":"Completed Update application operation state","time":"2023-08-25T08:42:18Z"}
2023-08-25T09:42:18+01:00       {"application":"argocd/cds","level":"info","msg":"Resuming in-progress operation. phase: Running, message: one or more tasks are running","time":"2023-08-25T08:42:18Z"}
2023-08-25T09:42:18+01:00       {"application":"argocd/cds","level":"info","msg":"Comparing app state (cluster: https://kubernetes.default.svc, namespace: clients)","time":"2023-08-25T08:42:18Z"}
2023-08-25T09:42:18+01:00       {"application":"argocd/cds","level":"info","msg":"Skipping auto-sync: another operation is in progress","time":"2023-08-25T08:42:18Z"}
2023-08-25T09:42:18+01:00       {"application":"argocd/cds","level":"info","msg":"Update successful","time":"2023-08-25T08:42:18Z"}
2023-08-25T09:42:18+01:00       {"application":"argocd/cds","dedup_ms":0,"dest-name":"","dest-namespace":"clients","dest-server":"https://kubernetes.default.svc","diff_ms":714,"fields.level":1,"git_ms":120,"health_ms":2,"level":"info","live_ms":6,"msg":"Reconciliation completed","settings_ms":0,"sync_ms":0,"time"
:"2023-08-25T08:42:18Z","time_ms":1226}
2023-08-25T09:42:18+01:00       {"application":"argocd/cds","build_options_ms":0,"helm_ms":0,"level":"info","msg":"getRepoObjs stats","plugins_ms":0,"repo_ms":0,"time":"2023-08-25T08:42:18Z","time_ms":135,"unmarshal_ms":135,"version_ms":0}
2023-08-25T09:42:18+01:00       {"application":"argocd/cds","level":"debug","msg":"Retrieved live manifests","time":"2023-08-25T08:42:18Z"}
2023-08-25T09:42:19+01:00       {"level":"info","msg":"Applying resource Application/cds in cluster: https://172.20.0.1:443, namespace: argocd","time":"2023-08-25T08:42:19Z"}
2023-08-25T09:42:19+01:00       {"application":"argocd/cds","level":"info","msg":"Syncing","skipHooks":false,"started":false,"syncId":"00010-hFJZD","time":"2023-08-25T08:42:19Z"}

But this time it looks like it fails completely when it attempts to update the ServiceAccount:

2023-08-25T09:42:21+01:00       {"application":"argocd/cds","level":"info","msg":"Wet-run","syncId":"00010-hFJZD","tasks":"[PreSync/-1 hook /ServiceAccount:clients/cds obj-\u003eobj (,,), PreSync/-1 hook external-secrets.io/ExternalSecret:clients/cds-app nil-\u003eobj (,,)]","time":"2023-08-25T08:42:21Z"}
2023-08-25T09:42:21+01:00       {"application":"argocd/cds","dryRun":false,"level":"info","msg":"Running tasks","numTasks":2,"syncId":"00010-hFJZD","time":"2023-08-25T08:42:21Z"}
2023-08-25T09:42:21+01:00       {"application":"argocd/cds","dryRun":false,"level":"info","msg":"Deleting","syncId":"00010-hFJZD","task":"PreSync/-1 hook /ServiceAccount:clients/cds obj-\u003eobj (,,)","time":"2023-08-25T08:42:21Z"}
2023-08-25T09:42:21+01:00       {"application":"argocd/cds","level":"info","msg":"Deleting resource","syncId":"00010-hFJZD","task":"PreSync/-1 hook /ServiceAccount:clients/cds obj-\u003eobj (,,)","time":"2023-08-25T08:42:21Z"}
2023-08-25T09:42:24+01:00       {"application":"argocd/cds","duration":4566322926,"level":"info","msg":"sync/terminate complete","syncId":"00010-hFJZD","time":"2023-08-25T08:42:24Z"}

This cycle just continues until I manually delete the ServiceAccount from the cluster and it suddenly kicks back into life.

@bcbrockway bcbrockway added the bug Something isn't working label Aug 30, 2023
@bcbrockway bcbrockway changed the title PreSync phase gets confused (and then stuck) when PreSync resources PreSync phase gets confused (and then stuck) when deleting PreSync resources Aug 30, 2023
@praveenadini
Copy link

I'm encountering the same issue and wonder if there is a fix available.

@imranismail
Copy link
Contributor

Bumping this as we're facing the same issue, and with service account as well. Removing the foregroundDeletion from SA fixes it.

@sstarcher
Copy link

We also hit this issue. We have background deletion configured, but it looks like these hook resources are still using foreground deletion and sometimes get stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants