-
Notifications
You must be signed in to change notification settings - Fork 51
feat(cli): agent create can now use existing TLS certificates #676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes add TLS secret handling to agent creation CLI, enabling users to source TLS certificates, keys, and CA data from existing Kubernetes secrets rather than generating them via PKI. New flags Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Areas needing attention:
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cmd/ctl/agent.go(4 hunks)cmd/ctl/agent_test.go(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
cmd/ctl/agent_test.go (2)
test/testutil/file.go (1)
MustReadFile(8-19)internal/kube/client.go (1)
KubernetesClient(37-44)
cmd/ctl/agent.go (1)
internal/kube/client.go (1)
KubernetesClient(37-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Build & cache Go code
- GitHub Check: Run unit tests
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (6)
cmd/ctl/agent.go (4)
111-163: Well-structured TLS secret reading logic with comprehensive error handling.The function correctly validates that
tls.crtandtls.keyform a valid pair. However, it doesn't verify that the client certificate was actually signed by the provided CA certificate. While this validation will eventually occur at runtime when the agent attempts to connect, catching the mismatch earlier would improve the user experience.Consider adding CA validation after line 143:
// Optionally validate that the client cert was signed by the CA certPool := x509.NewCertPool() if !certPool.AppendCertsFromPEM(caCertData) { err = fmt.Errorf("invalid CA certificate in secret %s/%s", caNamespace, caSecretName) return } cert, err := x509.ParseCertificate(certData) if err != nil { err = fmt.Errorf("could not parse client certificate: %w", err) return } _, err = cert.Verify(x509.VerifyOptions{Roots: certPool}) if err != nil { err = fmt.Errorf("client certificate in %s/%s was not signed by CA in %s/%s: %w", tlsNamespace, tlsSecretName, caNamespace, caSecretName, err) return }This verification is optional and you may choose to defer it to runtime.
183-186: Correct validation logic for mutually required flags.The XOR logic correctly ensures that both
--tls-from-secretand--ca-from-secretmust be provided together or not at all.
219-233: Clean conditional logic that maintains backward compatibility.The implementation correctly switches between reading from existing secrets and generating certificates from PKI based on flag presence, while preserving the existing behavior when flags are not provided.
269-270: Well-documented flag definitions.The flag descriptions clearly specify the expected secret format and required keys, which will help users understand how to use this feature.
cmd/ctl/agent_test.go (2)
251-296: Comprehensive test coverage for secret reference parsing.The test cases cover the main scenarios including edge cases. The test on line 281-286 documents the behavior when the secret reference starts with
/, which results in an empty namespace—this aligns with the concern raised in theparseSecretRefimplementation review.
298-555: Excellent test coverage with comprehensive error path validation.The test suite thoroughly validates both success and failure scenarios, including:
- Default and explicit namespace handling
- Missing secrets and keys
- Invalid certificate/key pairs
- Cross-namespace references
The use of a fake Kubernetes client and test data files from
testdata/directory provides robust unit testing without external dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cmd/ctl/agent.go (2)
101-112: The empty namespace fix looks good; consider trimming whitespace.The fix for the past review comment is properly implemented—inputs like
/my-secretnow resolve to the default namespace instead of returning an empty string. However, the function doesn't trim whitespace from the parsed namespace or name, which could cause Kubernetes API errors if users accidentally include spaces (e.g.,namespace / nameornamespace/name).Consider adding whitespace trimming:
func parseSecretRef(secretRef, defaultNamespace string) (namespace, name string) { + secretRef = strings.TrimSpace(secretRef) parts := strings.SplitN(secretRef, "/", 2) if len(parts) == 2 { + parts[0] = strings.TrimSpace(parts[0]) + parts[1] = strings.TrimSpace(parts[1]) if parts[0] == "" { return defaultNamespace, parts[1] } return parts[0], parts[1] } - return defaultNamespace, secretRef + return defaultNamespace, strings.TrimSpace(secretRef) }
114-166: Solid implementation; consider validating CA certificate format.The function correctly reads and validates the TLS certificate/key pair (line 142), which will catch configuration errors early. Error messages are clear and include the namespace/name context.
For a better user experience, consider validating that the CA data is actually a valid PEM-encoded certificate before returning:
caCertData, ok := caSecret.Data["ca.crt"] if !ok { err = fmt.Errorf("CA secret %s/%s does not contain key 'ca.crt'", caNamespace, caSecretName) return } + + // Validate CA certificate format + certPool := x509.NewCertPool() + if !certPool.AppendCertsFromPEM(caCertData) { + err = fmt.Errorf("CA secret %s/%s contains invalid PEM-encoded certificate", caNamespace, caSecretName) + return + } clientCert = string(certData)This would provide clearer error messages at configuration time rather than cryptic runtime errors when the agent attempts to use an invalid CA certificate.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
cmd/ctl/testdata/001_test_cert.pemis excluded by!**/*.pemcmd/ctl/testdata/001_test_key.pemis excluded by!**/*.pem
📒 Files selected for processing (1)
cmd/ctl/agent.go(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cmd/ctl/agent.go (1)
internal/kube/client.go (1)
KubernetesClient(37-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build and push image
- GitHub Check: Run unit tests
- GitHub Check: Run end-to-end tests
- GitHub Check: Lint Go code
- GitHub Check: Build & cache Go code
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
cmd/ctl/agent.go (3)
186-189: LGTM!The validation correctly ensures that both
--tls-from-secretand--ca-from-secretare provided together, preventing a half-configured state. The error message is clear.
222-236: LGTM!The implementation cleanly separates the two paths (external secrets vs. PKI generation) while maintaining backwards compatibility. When the new flags aren't provided, the original PKI generation flow is preserved.
272-273: LGTM!The flag definitions clearly document the expected format and secret keys, matching standard Kubernetes TLS secret conventions.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #676 +/- ##
==========================================
+ Coverage 46.15% 46.29% +0.14%
==========================================
Files 92 92
Lines 10689 10739 +50
==========================================
+ Hits 4933 4972 +39
- Misses 5259 5270 +11
Partials 497 497 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'm testing this out, a couple of things I'm running into:
Prefixing the secret name with the namespace, i.e.
The docs create a secret for the CA which does not include the field Should the command be able to work with a tls.crt or should I update the docs to create a secret with ca.crt and move the signing CA into a standalone secret? Updating the docs makes sense because for the agent I have a step to strip the key out of the CA secret so doing this as a standalone secret makes sense to me. |
|
I think the field name But I agree that in our case, it's a bit more complicated, since we refer the same secret that the CA uses to issue certificates and we're using a secret of type As a compromise, I could change the logic to first look for a key |
|
Like I said I don't mind updating the docs to have a separate secret with ca.crt and if the underlying code expects that then even more reason to do it. I'll work on an update to the cert-manager docs over the weekend. |
|
So using the Could this be because I have Also the command IMHO this should be consistent as with cert-manager, and presumably manual, as well it makes things complicated. So IMHO we need to do one of the following:
I think there is some benefit to keeping the CA key in a different secret then what is mounted in the Principal for security purposes so using a |
|
The |
Yes, because this is the CA's private secret.
It should, just like From the terminology, A certificate stored in So, from my point of view, it makes sense to have:
|
|
Thanks, what you laid out seems very reasonable to me, thanks for the clear and succint explanation. One question about this just to confirm:
By |
|
None of the secrets should actually be mounted to any of the pods. The processes use the Kubernetes client to access the secrets via the Kubernetes API. |
You lost me here, is this in the context of the process not needing the certs to be mounted? The principal and agent pods need the CA plus their specific TLS cert to be mounted in order for mTLS to work. What I was getting at is that some users will not want to have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/ci.yaml(2 hunks)hack/test.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/test.sh.github/workflows/ci.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run unit tests
- GitHub Check: Run end-to-end tests
- GitHub Check: Build & cache Go code
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
hack/test.sh (1)
19-19: Good addition:-failfastimproves test feedback loop.Stopping test execution at the first failure accelerates CI feedback without affecting coverage or determinism. This is a solid practice for development and CI workflows.
.github/workflows/ci.yaml (1)
206-211: Debug step enhances e2e observability.Capturing service and pod state across both vcluster contexts helps diagnose transient failures during e2e testing. This aligns well with the TLS secret feature testing.
Consider: Should this step run conditionally on failure (
if: ${{ failure() }}) to avoid noise in successful runs, or is the unconditional output intentional for post-test snapshots?
| with: | ||
| fail-on-diff: true | ||
| version: v1.13.1 | ||
| version: v1.13.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
helm-docs v1.13.1 release date latest stable version breaking changes
💡 Result:
-
Release date for helm-docs v1.13.1: March 2, 2024 (GitHub release). [1]
-
Latest stable release (as of now): v1.14.2 (marked "Latest" on the project releases; published July 8, 2024). [1][2]
-
Breaking changes in v1.13.1: none listed — the v1.13.1 release is primarily dependency fixes and documentation updates (chore: update golang.org/x/crypto, update default template/docs). No breaking/API changes recorded for that release. [2]
-
Security note: a CVE was later reported affecting older versions; maintainers / distributors recommend upgrading to ≥ v1.14.2. [3]
Sources:
- norwoodj/helm-docs releases (GitHub).
- Package changelog / update log (openSUSE Build Service) — v1.13.1 changelog.
- Snyk / vulnerability advisory recommending upgrade to 1.14.2.
🏁 Script executed:
# Check the CI workflow context around the helm-docs version usage
cd .github/workflows && head -300 ci.yaml | tail -50Repository: argoproj-labs/argocd-agent
Length of output: 2096
Upgrade helm-docs to v1.14.2 for security.
The version is pinned to v1.13.1 (released March 2024), but v1.14.2 is the current latest stable (July 2024). While v1.13.1 has no breaking changes, a security vulnerability was reported in versions before v1.14.2; upgrading to the latest version is recommended to mitigate this risk.
🤖 Prompt for AI Agents
.github/workflows/ci.yaml around line 274: the workflow pins helm-docs to
version v1.13.1 which has a reported security issue; update the pinned version
to v1.14.2. Modify the workflow step that sets the helm-docs version to use
v1.14.2 (replace v1.13.1 with v1.14.2) and verify the action runs successfully
in CI; if the version is referenced elsewhere in the file update those
occurrences as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/ci.yaml (1)
275-275: Helm-docs version security update (pre-existing issue).This version pin was already flagged in a previous review. Version v1.13.1 has a security vulnerability; upgrading to v1.14.2 is recommended.
If not already addressed elsewhere, please upgrade helm-docs to v1.14.2.
🧹 Nitpick comments (1)
.github/workflows/ci.yaml (1)
206-212: Consider a more descriptive step name.The step "Display something" is vague. Given that it displays cluster resources for debugging on failure, consider renaming it to something like "Display cluster state on failure" or "Display Kubernetes resources for debugging."
- - name: Display something + - name: Display cluster state on failure
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/ci.yaml(2 hunks)test/run-e2e.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
.github/workflows/ci.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build and push image
- GitHub Check: Run end-to-end tests
- GitHub Check: Build & cache Go code
- GitHub Check: Run unit tests
- GitHub Check: Lint Go code
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
test/run-e2e.sh (1)
23-23: Good practice: faster feedback with -failfast.Adding
-failfastto the test invocation improves CI feedback by exiting on the first failure, which aligns well with the e2e test enhancements in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/ci.yaml (1)
285-285: Helm-docs version remains unpinned against prior security recommendation.A previous review flagged helm-docs v1.13.1 as having a reported security vulnerability and recommended upgrading to v1.14.2 (the current stable, released July 2024). The version pinning remains unchanged in this PR.
If the security upgrade was deferred, please confirm this is intentional. Otherwise, apply the recommended upgrade to v1.14.2.
🧹 Nitpick comments (1)
.github/workflows/ci.yaml (1)
206-222: Good addition: enhanced e2e failure diagnostics.The new "Display something" step provides valuable debugging output when e2e tests fail, showing services, pods, deployments, and nodes across both vcluster contexts. The conditional
if: ${{ failure() }}ensures it runs only on test failures, and the placement before log uploads is logical. This improves observability for diagnosing e2e test issues.These kubectl output captures are helpful for troubleshooting. Consider also capturing events (
kubectl get events) in the future if pod startup or networking issues emerge during e2e debugging.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/ci.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
.github/workflows/ci.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Lint Go code
- GitHub Check: Build & cache Go code
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
Technically, they need access to the data in the secret. The pods do not mount the secret, though, see
Both perform a direct Kube API request to read the data from the secret. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/ci.yaml (1)
321-321: Unresolved: Upgrade helm-docs to v1.14.2 for security.The version is still pinned to v1.13.1, which has a reported security vulnerability. A previous review flagged this concern; upgrading to v1.14.2 (the current stable release from July 2024) mitigates the risk.
- version: v1.13.1 + version: v1.14.2
🧹 Nitpick comments (1)
.github/workflows/ci.yaml (1)
209-258: Step name "Display something" is too vague; consider a more descriptive name.The debugging step provides comprehensive failure diagnostics across control plane and agent vcluster contexts. However, the step name does not convey its purpose. Rename to something like "Display diagnostic information on failure" or "Collect debug details" to improve clarity.
- - name: Display something + - name: Display diagnostic information on failure
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/ci.yaml(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
.github/workflows/ci.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Build & cache Go code
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
.github/workflows/ci.yaml (1)
166-166: Disk diagnostics additions improve operational visibility.The df -h insertions provide valuable context for debugging resource constraints during CI setup. These are non-invasive improvements.
Also applies to: 175-175, 191-191
Signed-off-by: jannfis <[email protected]> Assisted-by: Cursor
Signed-off-by: jannfis <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Jann Fischer <[email protected]>
Signed-off-by: jannfis <[email protected]>
80babe1 to
5f4e8ac
Compare
Assisted-by: Cursor
What does this PR do / why we need it:
The
argocd-agentctl agent createcommand generates its own set of certificates for authenticating Argo CD to the resource proxy, using the CLI maintained PKI.However, the
argocd-agentctl pkicommand is not intended for production use, so users would have to roll their own cluster secrets for the use with the argocd-agent. This is error prone and not a good user experience.This change adds the
--tls-from-secretand--ca-from-secretswitches to theargocd-agentctl agent createcommand so users will be able to use the CLI to create required configuration, while using certificates managed externally (e.g. using cert-manager or other tooling).Which issue(s) this PR fixes:
Fixes #622
How to test changes / Special notes to the reviewer:
Checklist
Summary by CodeRabbit
Release Notes
--tls-from-secretand--ca-from-secretCLI flags for agent creation, enabling TLS certificates and keys to be loaded from existing Kubernetes secrets.namespace/nameorname).✏️ Tip: You can customize this high-level summary in your review settings.