-
Notifications
You must be signed in to change notification settings - Fork 51
feat: redis TLS encryption enabled by default for all connections #664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: redis TLS encryption enabled by default for all connections #664
Conversation
|
Caution Review failedAn error occurred during the review process. Please try again later. WalkthroughAdds end-to-end Redis TLS support: CLI flags, option wiring, TLS config propagation to Redis clients/proxy/cluster cache, Kubernetes manifests and Helm values, dev scripts for cert generation and TLS setup, and E2E/test updates to use and validate Redis TLS. Changes
Sequence Diagram(s)sequenceDiagram
participant Argo as ArgoCD (server/repo)
participant Proxy as Redis Proxy (principal)
participant RedisP as Redis (control-plane)
participant Agent as Agent-side Redis (workload)
Argo->>Proxy: Connect (TLS) to proxy endpoint
Note over Proxy: createServerTLSConfig() -> load cert/key
Proxy->>Argo: TLS Handshake (server cert)
Argo->>Proxy: Redis protocol over TLS
Proxy->>RedisP: Dial TCP -> wrap with TLS (upstream)
Note over Proxy: load CA pool or CA path or set InsecureSkipVerify
Proxy->>RedisP: TLS Handshake (SNI set)
RedisP->>Proxy: Handshake OK
Argo->>Proxy: AUTH / GET/SET (encrypted)
Proxy->>RedisP: Forward command (encrypted)
RedisP->>Proxy: Response
Proxy->>Argo: Response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5743de3 to
40254ae
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #664 +/- ##
==========================================
- Coverage 46.15% 45.04% -1.11%
==========================================
Files 92 92
Lines 10689 10973 +284
==========================================
+ Hits 4933 4943 +10
- Misses 5259 5529 +270
- Partials 497 501 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
40254ae to
3df4a33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (11)
hack/dev-env/start-principal.sh (1)
23-43: Port-forward setup looks good; consider addressing the shellcheck hint.The port-forward logic is sound. The shellcheck warning (SC2064) about using double quotes in the trap is a false positive here since
$PORT_FORWARD_PIDis set once and won't change. However, you could use single quotes for consistency with shellcheck best practices:- trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT + trap 'kill $PORT_FORWARD_PID 2>/dev/null || true' EXITNote: With single quotes, the variable will expand when the trap is triggered rather than when it's set, but in this case both work correctly since the PID doesn't change.
internal/argocd/cluster/cluster_test.go (1)
31-37: Cluster manager tests correctly updated for new constructor signatureAll
NewManagerinvocations now provide the Redis compression type and a trailingnilTLS config, matching the new constructor; existing test behavior is preserved. If you later want more coverage for the Redis TLS path introduced in this PR, adding dedicated tests around a non-nil TLS config in another test file would be a good follow-up.Also applies to: 223-226, 303-305
agent/inbound_redis.go (1)
345-372: Consider using TLS 1.3 as minimum version.The TLS configuration sets
MinVersion: tls.VersionTLS12, but for new implementations in 2025, TLS 1.3 should be preferred as the minimum version for better security. TLS 1.2 has known vulnerabilities in certain configurations.Apply this diff:
if a.redisProxyMsgHandler.redisTLSEnabled { tlsConfig = &tls.Config{ - MinVersion: tls.VersionTLS12, + MinVersion: tls.VersionTLS13, }That said, the CA loading logic and error handling are well-implemented with appropriate warnings for insecure mode and system CA fallback.
test/run-e2e.sh (1)
61-66: Usejqfor structured JSON parsing instead ofgrep.Line 62 uses
grep "tls-port"on JSON output, which is fragile and could produce false positives (e.g., matching in comments, annotations, or labels).Replace with structured JSON querying using
jq:- # Check if Redis is configured with TLS (it's a Deployment, not StatefulSet) - if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | grep -q "tls-port"; then + # Check if Redis is configured with TLS + if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | \ + jq -e '.spec.template.spec.containers[].args[] | select(contains("--tls-port"))' >/dev/null; then echo "ERROR: Redis Deployment in ${CONTEXT} is not configured with TLS!" echo "Please run: ./hack/dev-env/configure-redis-tls.sh ${CONTEXT}" exit 1 fiThis approach reliably checks for the
--tls-portargument in the container args array.hack/dev-env/gen-redis-tls-certs.sh (1)
17-17: Consider ECDSA keys for better performance.The script generates 4096-bit RSA keys, which are secure but relatively slow. For development and testing, consider using ECDSA P-256 keys instead, which provide equivalent security with better performance and smaller certificate sizes.
Example:
- openssl genrsa -out "${CREDS_DIR}/ca.key" 4096 + openssl ecparam -genkey -name prime256v1 -out "${CREDS_DIR}/ca.key"This is optional for a dev/test certificate generation script, but ECDSA is increasingly preferred in modern TLS implementations.
test/e2e/fixture/cluster.go (1)
40-54: E2E Redis TLS wiring is correct; consider small helper for TLSConfigThe new
*RedisTLSEnabledfields andgetCacheInstanceTLSConfig setup give tests a clear, deterministic TLS path (TLS 1.2,InsecureSkipVerifyonly in e2e). Defaulting both TLSEnabled flags to true in the config helpers matches the “TLS-only e2e” objective. You might later factor the repeated TLSConfig construction for principal/managed into a tiny helper, but it’s not required.Also applies to: 165-204, 251-268, 273-315
internal/argocd/cluster/cluster.go (1)
17-32: TLS parameterization of cluster cache is clean and backwards-compatibleExtending
NewClusterCacheInstancewith a*tls.Configand wiring it directly intoredis.Options.TLSConfigcleanly enables TLS while keeping nil as the “no TLS” path. Callers now own policy, which is appropriate. Consider updating any GoDoc on this function to mention the new TLS behavior, but the implementation itself looks solid.Also applies to: 168-178
hack/dev-env/configure-argocd-redis-tls.sh (1)
1-201: Dev script works; consider restoring context and clarifying the bannerThe script does what it needs for dev/e2e, but two improvements would help:
- Context restoration –
kubectl config use-context ${CONTEXT}permanently switches the user’s context. Mirroringhack/dev-env/configure-redis-tls.shby capturing the original context and restoring it in atrapwould make this safer to run manually.- Clarify the “proper TLS certificate validation” note – Redis connections are indeed validated via
--redis-use-tlsand--redis-ca-certificate, butargocd-serveris started with--insecure, which weakens client→server TLS. Rewording the banner to “Using proper Redis TLS certificate validation (server is insecure for dev only)” would avoid confusion.These are UX/docs-level tweaks; the functional Redis TLS wiring looks fine.
principal/server.go (1)
349-372: Redis proxy and cluster-manager TLS wiring is coherent and option-drivenThe server now cleanly drives Redis TLS from
ServerOptions: redisProxy is toggled viaredisTLSEnabled, with clear precedence for server cert sources (path vs in‑memory) and upstream verification (insecure vs CA path vs CA pool). The cluster manager reuses the same upstream TLS knobs to buildclusterMgrRedisTLSConfigand passes it down tocluster.NewManager, so its Redis cache observes the same trust policy. Error paths on CA file read/parse are explicit and early, which is good.If you want extra transparency, you could log a brief message when TLS is enabled but neither
redisUpstreamTLSInsecure,redisUpstreamTLSCA, norredisUpstreamTLSCAPathare set (i.e., relying on system CAs), but that’s optional.Also applies to: 400-427
agent/agent.go (1)
17-24: Cluster cache Redis TLS follows agent Redis TLS options; consider minor reuse/logging tweaksThe new
clusterCacheTLSConfigcorrectly mirrors the agent’s Redis TLS options (enabled flag, insecure mode, CA path) and feeds them intoNewClusterCacheInstance, so the cluster cache honors the same security posture as the main Redis client. Error handling on CA read/parse is clear and fails fast.Two optional refinements to consider later:
- Factor the TLSConfig construction shared between this file and
agent/inbound_redis.gointo a small helper to keep behavior perfectly in sync.- When TLS is enabled but no CA path is set and
redisTLSInsecureis false, add a log line indicating that system CAs are being used for the cluster cache as well (to match the visibility you already give the main Redis client).Also applies to: 323-345
principal/redisproxy/redisproxy.go (1)
131-165: Remove unused key parsing.Line 154 parses the private key but never uses the result. This validation is redundant since the key has already been marshaled from a valid crypto.PrivateKey interface at line 145.
Apply this diff to remove the dead code:
cert.Certificate = [][]byte{certDER} cert.PrivateKey = rp.tlsServerKey cert.Leaf = rp.tlsServerCert - - // Try to parse the key - if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil { - return nil, fmt.Errorf("failed to parse private key: %w", err) - } } else { return nil, fmt.Errorf("no TLS certificate configured") }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (37)
Makefile(1 hunks)agent/agent.go(2 hunks)agent/inbound_redis.go(3 hunks)agent/options.go(1 hunks)agent/outbound_test.go(1 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(3 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(2 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/README.md(3 hunks)install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml(3 hunks)install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml(1 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)install/helm-repo/argocd-agent-agent/values.yaml(1 hunks)install/kubernetes/agent/agent-deployment.yaml(3 hunks)install/kubernetes/agent/agent-params-cm.yaml(1 hunks)install/kubernetes/principal/principal-deployment.yaml(3 hunks)install/kubernetes/principal/principal-params-cm.yaml(1 hunks)internal/argocd/cluster/cluster.go(2 hunks)internal/argocd/cluster/cluster_test.go(3 hunks)internal/argocd/cluster/informer_test.go(6 hunks)internal/argocd/cluster/manager.go(3 hunks)internal/argocd/cluster/manager_test.go(3 hunks)principal/options.go(2 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/server.go(3 hunks)test/e2e/README.md(2 hunks)test/e2e/fixture/cluster.go(5 hunks)test/run-e2e.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/run-e2e.shMakefilehack/dev-env/start-e2e.shinstall/helm-repo/argocd-agent-agent/values.yamltest/e2e/README.mdhack/dev-env/Procfile.e2einstall/helm-repo/argocd-agent-agent/templates/agent-deployment.yamlinstall/kubernetes/agent/agent-params-cm.yamlinstall/kubernetes/agent/agent-deployment.yaml
🧬 Code graph analysis (12)
cmd/argocd-agent/agent.go (2)
agent/options.go (3)
WithRedisTLSEnabled(112-117)WithRedisTLSInsecure(128-133)WithRedisTLSCAPath(120-125)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
agent/inbound_redis.go (2)
internal/logging/logfields/logfields.go (1)
Config(127-127)internal/logging/logging.go (1)
Warn(300-302)
internal/argocd/cluster/manager_test.go (1)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)test/fake/kube/kubernetes.go (1)
NewFakeKubeClient(31-44)
principal/server.go (1)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/options.go (6)
WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSInsecure(543-548)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)internal/logging/logging.go (1)
Warn(300-302)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(169-185)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
NewFakeKubeClient(31-44)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(169-185)
principal/options.go (3)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/server.go (1)
Server(72-164)internal/tlsutil/kubernetes.go (2)
TLSCertFromSecret(44-66)X509CertPoolFromSecret(106-128)
agent/outbound_test.go (1)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)
🪛 Shellcheck (0.11.0)
hack/dev-env/start-e2e.sh
[warning] 58-58: Declare and assign separately to avoid masking return values.
(SC2155)
hack/dev-env/start-principal.sh
[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.
(SC2064)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build and push image
- GitHub Check: Run unit tests
- GitHub Check: Lint Go code
- GitHub Check: Run end-to-end tests
- GitHub Check: Build & cache Go code
- GitHub Check: Analyze (go)
🔇 Additional comments (43)
hack/dev-env/start-agent-managed.sh (3)
37-46: LGTM!The TLS certificate detection and conditional enablement logic is clear and well-structured. The user guidance for missing certificates is helpful.
48-61: LGTM!The Redis address configuration is well-documented and provides clear guidance for local development with TLS. The default localhost address appropriately aligns with certificate SANs.
66-67: LGTM!The TLS and address arguments are correctly injected into the agent startup command.
hack/dev-env/start-principal.sh (2)
58-74: LGTM!The TLS certificate detection correctly handles both server-side certificates (for the Redis proxy) and upstream CA validation. The logic is sound and well-documented.
82-82: LGTM!The TLS arguments are correctly injected into the principal startup command.
Makefile (1)
59-70: LGTM!The Redis TLS setup sequence is well-structured and correctly configures TLS for all three vclusters. The messaging clearly indicates that TLS is required for E2E tests, aligning with the PR objectives.
install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)
93-101: LGTM!The Redis TLS configuration keys are well-documented and follow the existing naming conventions. The "INSECURE" warning on the insecure flag is appropriate.
hack/dev-env/start-agent-autonomous.sh (3)
37-46: LGTM!The TLS certificate detection logic is consistent with the managed agent script and works correctly.
48-61: LGTM!The Redis address configuration correctly uses localhost:6382, allowing the autonomous agent to run alongside the managed agent without port conflicts.
66-67: LGTM!The TLS and address arguments are correctly injected into the agent startup command.
agent/outbound_test.go (1)
464-464: LGTM!The test correctly adapts to the extended
NewManagersignature by passingnilfor the newtlsConfigparameter. This is appropriate for a test that doesn't require TLS configuration.install/helm-repo/argocd-agent-agent/values.yaml (3)
136-136: LGTM!The default TLS root CA path provides a sensible default for users and aligns with conventional mount paths.
138-151: LGTM!The Redis TLS configuration is comprehensive and well-documented. TLS is appropriately enabled by default with secure settings, aligning with the PR objectives. The string values ("true"/"false") are appropriate for ConfigMap usage.
153-163: LGTM!The network policy configuration is a good security enhancement that allows users to restrict Redis traffic. The default selectors and structure are appropriate.
test/e2e/README.md (1)
41-65: Unable to verify the InsecureSkipVerify claim due to repository access limitations.The repository is currently inaccessible for automated verification. However, the review comment raises a valid concern: the documentation states that "principal and agents use
InsecureSkipVerify: true" when connecting to Redis via LoadBalancer addresses, but this conflicts with the described behavior of startup scripts that use localhost port-forwards (where localhost should be in the certificate SANs and proper certificate validation should work).This discrepancy needs manual verification by examining:
- The actual Redis client configuration in the principal and agent code
- Whether E2E tests genuinely use InsecureSkipVerify or perform proper certificate validation
- The difference between LoadBalancer connections (mentioned in docs) vs. localhost port-forward connections (described in startup scripts)
internal/argocd/cluster/manager_test.go (1)
11-11: NewManager signature update is wired correctly in testsImporting
cacheutiland passingcacheutil.RedisCompressionGZipplus a trailingnilfor the TLS config matches the updatedNewManagersignature; test behavior remains the same and looks correct.Also applies to: 57-58, 78-79
docs/configuration/redis-tls.md (1)
1-411: Redis TLS doc is consistent with implementation and manifestsThe new documentation cleanly matches the flags, env vars, ConfigMap keys, volume mount paths, and defaults introduced in the code/manifests; it provides enough guidance and the security caveats around
*_INSECUREare clear.cmd/argocd-agent/agent.go (1)
74-77: Agent Redis TLS flags and wiring look correctThe new Redis TLS flags/envs are correctly bound, and the options passed into the agent (
WithRedisTLSEnabled,WithRedisTLSInsecure,WithRedisTLSCAPath) implement the documented behavior (CLI default off, K8s/Helm on, optional CA or insecure skip) without obvious edge-case issues.Also applies to: 184-194, 236-245
install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (1)
136-153: Helm agent deployment TLS wiring is consistent and non-breakingThe new Redis TLS env vars and the conditional
redis-tls-cavolume/volumeMount are consistent with the agent ConfigMap keys and agent CLI expectations; theoptional: trueand.Values.redisTLS.secretNameguard make this backwards-compatible.Also applies to: 232-236, 258-266
install/kubernetes/agent/agent-params-cm.yaml (1)
88-99: Agent Redis TLS ConfigMap defaults align with deployment and docsThe new
agent.redis.tls.*entries are consistent with the agent deployment envs, default to TLS-on with a CA path that matches the mounted secret, and keep insecure mode explicitly off by default.install/kubernetes/principal/principal-deployment.yaml (1)
233-274: Principal deployment Redis TLS configuration is consistent and safe by defaultThe new Redis TLS env vars,
redis-proxyport, andredis-server-tls/redis-upstream-tls-cavolumes are wired consistently with the documented paths andargocd-redis-tlssecret; marking the secret volumes asoptional: truekeeps the manifest robust while still enabling TLS by default when the secret is present.Also applies to: 280-287, 302-307, 324-339
hack/dev-env/Procfile.e2e (1)
1-6: I was unable to access the repository directly due to a clone failure. However, based on the review comment structure and the concern it raises, I can provide a rewritten version that maintains the verification request while being clear about what needs to be confirmed:
Confirm that
ARGOCD_AGENT_REDIS_ADDRESSis properly consumed by agent startup scriptsThe Procfile sets
ARGOCD_AGENT_REDIS_ADDRESSwhen invokingstart-agent-managed.shandstart-agent-autonomous.sh, but the agent code may expect a different environment variable or command-line flag. Verify that:
start-agent-managed.shandstart-agent-autonomous.shtranslateARGOCD_AGENT_REDIS_ADDRESSinto a--redis-addrCLI flag or pass it through correctly to the agent process- The agent executable does not default to a hardcoded Redis address if the expected variable is absent
- The e2e setup actually uses the forwarded Redis ports (6380/6381/6382) rather than falling back to defaults
If the scripts or agent code expect
REDIS_ADDRinstead ofARGOCD_AGENT_REDIS_ADDRESS, either rename the variable here or update the scripts accordingly.agent/inbound_redis.go (1)
51-54: LGTM - Clean TLS configuration fields.The addition of these three fields provides a clear and straightforward mechanism to control Redis TLS behavior.
install/helm-repo/argocd-agent-agent/README.md (1)
68-72: LGTM - Clear Redis TLS configuration documentation.The Redis TLS configuration is well-documented with sensible defaults (
enabled: "true", CA path, and secret name). The inline documentation clearly indicates thatinsecuremode is for development only.agent/options.go (1)
111-133: LGTM - Redis TLS option setters follow established patterns.The three new option setters (
WithRedisTLSEnabled,WithRedisTLSCAPath,WithRedisTLSInsecure) are implemented consistently with existing option setters in the file. The pattern of setting the field and returning nil is appropriate.test/run-e2e.sh (1)
24-76: Good enforcement of Redis TLS as a hard requirement for E2E tests.The comprehensive verification checks (certificates, secrets, and deployment configuration) across all vclusters ensure that E2E tests run only in a properly secured environment. The clear error messages with remediation steps are helpful for developers.
hack/dev-env/gen-redis-tls-certs.sh (1)
28-123: Comprehensive certificate generation with proper SANs.The script generates certificates for all necessary components (control-plane, proxy, and agent vclusters) with appropriate Subject Alternative Names covering localhost, IP addresses, and cluster DNS. The idempotency checks ensure the script can be re-run safely.
docs/getting-started/kubernetes/index.md (2)
159-282: Redis TLS setup steps are consistent with manifests and defaultsGeneration of CA/server certs, shared
argocd-redis-tlssecret, Redis TLS args, and verification flow all line up with the principal/agent manifests and default paths. No issues from a functional perspective.
389-478: Workload-cluster Redis TLS mirrors control-plane flow correctlyThe workload-cluster TLS instructions reuse the same CA, secret structure, args, and verification pattern, which keeps the principal/workload Redis configuration aligned and predictable. Looks good.
internal/argocd/cluster/informer_test.go (1)
3-15: Tests correctly adapted to extended NewManager signatureUsing
cacheutil.RedisCompressionGZipand a trailingnilTLS argument keeps the tests aligned with the new constructor while preserving previous behavior. No further changes needed here.Also applies to: 17-51, 67-88, 96-116
install/kubernetes/principal/principal-params-cm.yaml (1)
140-166: Principal Redis TLS ConfigMap defaults align with deployment wiringThe new
principal.redis.tls.*keys (enable flag, server cert/key paths, server/CA secret names, upstream CA path, and insecure switch) match the principal Deployment’s volume mounts and the ServerOptions fields. Enabling TLS by default here is consistent with the PR’s objective, and the “INSECURE” comment on the upstream flag is clear.install/kubernetes/agent/agent-deployment.yaml (3)
149-166: LGTM! Redis TLS environment variables properly configured.The three Redis TLS environment variables follow the established pattern and are appropriately marked as optional, ensuring backward compatibility with existing deployments.
193-195: LGTM! Volume mount configured securely.The redis-tls-ca volume is correctly mounted as read-only, following security best practices.
205-211: LGTM! Secret-backed volume configured correctly.The redis-tls-ca volume is properly configured with optional: true, preventing deployment failures when TLS is not enabled while maintaining compatibility with TLS-enabled configurations.
internal/argocd/cluster/manager.go (1)
26-26: LGTM! TLS configuration properly integrated.The TLS config parameter is correctly threaded through NewManager to NewClusterCacheInstance, enabling TLS-protected Redis connections for cluster caching. The nil-able
*tls.Configtype allows optional TLS configuration while maintaining backward compatibility at the implementation level.Also applies to: 71-71, 81-81
hack/dev-env/configure-redis-tls.sh (2)
23-42: LGTM! Context validation is clear and robust.The case statement properly validates the context parameter and maps it to the appropriate certificate prefix with helpful error messages for invalid inputs.
47-54: LGTM! Cleanup trap follows best practices.The trap ensures the original kubectl context is restored on exit, preventing side effects from the script execution.
principal/redisproxy/redisproxy.go (4)
65-76: LGTM! TLS configuration fields well-organized.The TLS configuration fields clearly separate server-side and upstream concerns, and support both in-memory and file-based certificate loading for flexibility.
98-128: LGTM! TLS configuration API is clean and straightforward.The setter methods provide a clear API for configuring both server-side and upstream TLS, supporting multiple configuration sources.
167-211: LGTM! TLS listener creation properly implemented.The Start() method cleanly handles both TLS and non-TLS modes with appropriate logging and error handling.
847-908: LGTM! Upstream TLS connection properly implemented.The TLS upgrade logic for upstream Redis connections correctly handles CA certificate validation from both in-memory and file sources, SNI configuration, and test-mode insecure skip verify. The conditional TLS enablement (line 864) allows for flexible deployment modes including TLS-terminating proxy scenarios.
principal/options.go (2)
80-88: LGTM! Redis TLS configuration fields well-structured.The new Redis TLS fields follow the established pattern for TLS configuration in ServerOptions, maintaining consistency with existing TLS fields and supporting flexible configuration sources.
492-548: LGTM! Redis TLS option functions follow established patterns.The six new Redis TLS configuration functions are well-implemented, following the existing ServerOption pattern consistently. They provide flexible configuration through files, secrets, and direct values, with appropriate error handling and integration with the tlsutil package.
3df4a33 to
211af17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
hack/dev-env/gen-redis-tls-certs.sh (1)
14-26: Don’t hide OpenSSL errors; let failures surface and stop the script.All the
opensslcalls currently send stderr to/dev/null, which makes certificate generation failures very hard to diagnose, even withset -ein place. It’d be better to let stderr through (and optionally add explicit exit checks) so a broken OpenSSL invocation clearly reports why it failed and the script aborts.For example:
- openssl genrsa -out "${CREDS_DIR}/ca.key" 4096 2>/dev/null + openssl genrsa -out "${CREDS_DIR}/ca.key" 4096 + if [[ $? -ne 0 ]]; then + echo "Failed to generate CA private key" >&2 + exit 1 + fi - openssl req -new -x509 -days 3650 -key "${CREDS_DIR}/ca.key" \ - -out "${CREDS_DIR}/ca.crt" \ - -subj "/C=US/ST=State/L=City/O=Organization/OU=Unit/CN=Redis CA" 2>/dev/null + openssl req -new -x509 -days 3650 -key "${CREDS_DIR}/ca.key" \ + -out "${CREDS_DIR}/ca.crt" \ + -subj "/C=US/ST=State/L=City/O=Organization/OU=Unit/CN=Redis CA" + if [[ $? -ne 0 ]]; then + echo "Failed to generate CA certificate" >&2 + exit 1 + fiand apply the same pattern (remove
2>/dev/null, add clear error messages if desired) to the otheropenssl genrsa/req/x509calls in this script.Also applies to: 31-32, 47-58, 61-64, 79-90, 93-96, 111-121
hack/dev-env/start-e2e.sh (1)
58-58: Avoid maskingkubectlerrors when exportingREDIS_PASSWORDCombining
exportwith command substitution can hide failures fromkubectland is what ShellCheck SC2155 is warning about (also raised in an earlier review).Splitting assignment and export (and checking the exit code) makes failures explicit:
-export REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode) +REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode) || { + echo "Failed to read Redis password from argocd-redis secret in vcluster-agent-managed" >&2 + exit 1 +} +export REDIS_PASSWORDhack/dev-env/configure-redis-tls.sh (1)
61-66: Validate all required certificate/key files before creating the secretRight now the script only checks for
creds/redis-tls/ca.crt. If the per-context*.crtor*.keyis missing,kubectl create secretwill fail with a less obvious error. This was already raised in an earlier review and is still applicable.You can make the error clearer by validating all three files up front:
# Check certificates exist if [ ! -f "creds/redis-tls/ca.crt" ]; then echo "Error: Redis TLS certificates not found" echo "Please run: ./gen-redis-tls-certs.sh" exit 1 fi + +if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ]; then + echo "Error: Redis TLS certificate or key not found for ${REDIS_CERT_PREFIX}" + echo "Please run: ./gen-redis-tls-certs.sh" + exit 1 +fiAlso applies to: 81-88
🧹 Nitpick comments (10)
cmd/argocd-agent/principal.go (2)
258-288: Redis TLS wiring is sound; consider clarifying precedence and adding tests.The overall flow (
redisTLSEnabledgate, server TLS from path vs secret, upstream TLS with insecure/CA path/CA secret) mirrors existing TLS patterns and looks correct. Two refinements to consider:
Silent precedence between upstream CA path and secret
When both a CA path and a (possibly customized) CA secret are configured, the CA path branch wins and the secret is ignored, with no warning. That’s safe but can surprise operators troubleshooting TLS. A lightweight improvement would be to log (or optionally fatal on) the case where a non-default CA secret name is set alongside a CA path, e.g. log that the secret is being ignored in favor of the file-based CA. This keeps behavior but makes it explicit.Tests for flag/env combinations and TLS behavior
Given the number of new flags and the Codecov report pointing out missing coverage in this file, it would be valuable to add unit tests around:
redisTLSEnabledtrue/false.- Server TLS: (cert+key), partial, and secret-based.
- Upstream TLS: insecure vs CA path vs default secret, including precedence behavior.
Even a small table-driven test onNewPrincipalRunCommandoption wiring or a constructor helper would help lock in these semantics.Overall, the wiring itself looks correct; this is mainly about making edge-case behavior explicit and test-backed.
419-441: Redis TLS flags and env bindings look good; minor help-text tweak optional.The new flags and env variable bindings are consistent with existing patterns (
ARGOCD_PRINCIPAL_*), and the separation between server TLS and upstream TLS is clear. As a minor polish, you might clarify in the--redis-tls-enableddescription that it controls both the proxy’s listening TLS and the upstream TLS to argocd-redis (since the code configures both) to avoid ambiguity in CLI help output.agent/options.go (1)
127-133: Consider adding runtime warning for insecure mode.The comment indicates this option is "for testing only," but there's no runtime warning when this insecure mode is enabled. Consider adding a warning log message when TLS verification is disabled to alert operators of the security implications in production environments.
Example:
func WithRedisTLSInsecure(insecure bool) AgentOption { return func(o *Agent) error { o.redisProxyMsgHandler.redisTLSInsecure = insecure if insecure { log().Warn("INSECURE: Redis TLS certificate verification disabled. This should only be used for testing.") } return nil } }principal/server.go (1)
400-427: Consider extracting CA loading logic into a helper function.The CA certificate loading logic (lines 413-424) is duplicated in multiple files (agent/agent.go, principal/redisproxy/redisproxy.go). Consider extracting this into a shared helper function to improve maintainability.
Example helper:
// In internal/tlsutil or similar package func LoadCACertPool(caPath string) (*x509.CertPool, error) { caCert, err := os.ReadFile(caPath) if err != nil { return nil, fmt.Errorf("failed to read CA certificate from %s: %w", caPath, err) } caCertPool := x509.NewCertPool() if !caCertPool.AppendCertsFromPEM(caCert) { return nil, fmt.Errorf("failed to parse CA certificate from %s", caPath) } return caCertPool, nil }principal/redisproxy/redisproxy.go (1)
141-156: Simplify in-memory certificate handling.The PKCS8 marshaling (line 145) and parsing (line 154) appears to be validation only, as the parsed result is discarded. This roundtrip is unnecessary. The private key can be assigned directly to
cert.PrivateKey.Apply this diff:
} else if rp.tlsServerCert != nil && rp.tlsServerKey != nil { // Convert cert and key to tls.Certificate certDER := rp.tlsServerCert.Raw - // For private key, we need to marshal it - keyDER, err := x509.MarshalPKCS8PrivateKey(rp.tlsServerKey) - if err != nil { - return nil, fmt.Errorf("failed to marshal private key: %w", err) - } cert.Certificate = [][]byte{certDER} cert.PrivateKey = rp.tlsServerKey cert.Leaf = rp.tlsServerCert - - // Try to parse the key - if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil { - return nil, fmt.Errorf("failed to parse private key: %w", err) - } } else {install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (1)
136-153: Redis TLS env and volume wiring look correct (consider non‑optional CA secret).The new
ARGOCD_AGENT_REDIS_TLS_*env vars and theredis-tls-camount/volume are consistent with the ConfigMap keys and documented CA path (/app/config/redis-tls/ca.crt), so the wiring itself looks good.One behavioral nuance: the
redis-tls-casecret is markedoptional: true, so the pod will still start if the TLS secret is missing and the agent will only fail later at runtime. If you’d prefer a fail‑fast configuration error when TLS is enabled but the CA secret is absent, you could dropoptional: trueon that secret.Also applies to: 232-236, 257-266
hack/dev-env/start-agent-managed.sh (1)
37-62: Managed-agent Redis TLS handling is correct; consider de-duping with autonomous script.The managed-agent script’s Redis TLS and address handling matches the autonomous script and the Procfile port‑forwards (
localhost:6381), so behavior looks correct.If these scripts evolve further, you might consider factoring the shared Redis TLS/address logic into a small helper (or sourcing a common
start-agent-common.sh) to avoid future drift between managed and autonomous modes.Also applies to: 66-67
test/run-e2e.sh (1)
24-70: Redis TLS precheck is solid; consider stricter context detectionThe TLS gating logic (cert presence + per-context secret and
tls-portcheck) looks good and aligns with the “TLS-only E2E” objective.Minor robustness improvement: in the loop,
kubectl config get-contexts | grep -q "${CONTEXT}"will succeed on substring matches and silently skip missing contexts. If a vcluster context is missing, it might be clearer to fail early.You could tighten this and fail when a context is absent:
-for CONTEXT in vcluster-control-plane vcluster-agent-autonomous vcluster-agent-managed; do - if kubectl config get-contexts | grep -q "${CONTEXT}"; then +for CONTEXT in vcluster-control-plane vcluster-agent-autonomous vcluster-agent-managed; do + if kubectl config get-contexts | awk 'NR>1 { print $2 }' | grep -qx "${CONTEXT}"; then echo "Checking Redis TLS in ${CONTEXT}..." # ... echo "✓ Redis TLS configured in ${CONTEXT}" - fi + else + echo "ERROR: kube context ${CONTEXT} is not configured; missing setup?" >&2 + exit 1 + fi donehack/dev-env/start-principal.sh (1)
23-43: Fix trap quoting to avoid ShellCheck SC2064 warningThe port-forward logic looks good, but ShellCheck is right that the trap should avoid expanding
$PORT_FORWARD_PIDat definition time.You can keep behavior and silence SC2064 by using single quotes and quoting the variable inside:
- # Cleanup function to kill port-forward on exit - trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT + # Cleanup function to kill port-forward on exit + trap 'kill "$PORT_FORWARD_PID" 2>/dev/null || true' EXITThis expands
PORT_FORWARD_PIDwhen the trap runs, not when it’s set, and follows common shell best practices.test/e2e/fixture/cluster.go (1)
40-50: Redis TLS wiring for E2E cache clients is consistent with the TLS-only test requirementThe additions to
ClusterDetailsandgetCacheInstancecorrectly gate TLS usage on the new*RedisTLSEnabledflags and build atls.ConfigwithMinVersion: tls.VersionTLS12. Given this file is strictly undertest/e2e, usingInsecureSkipVerify: truehere is an acceptable trade-off to keep tests working against dynamically addressed Redis endpoints while still enforcing encrypted transport.The updated
getManagedAgentRedisConfig/getPrincipalRedisConfiglogic to:
- Prefer LoadBalancer ingress IP/hostname,
- Fall back to
spec.loadBalancerIP, thenClusterIP,- And unconditionally set the
*RedisTLSEnabledflags to true,matches the PR goal that Redis-with-TLS is now the “happy path” for tests and will loudly fail if TLS isn’t actually configured.
If you later stabilize the Redis hostnames to always match certificate SANs, you might consider tightening this further by dropping
InsecureSkipVerifyand wiring in aRootCAspool from the test CA, but that’s an optional hardening step and not required for this PR.Also applies to: 165-195, 225-333
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (37)
Makefile(1 hunks)agent/agent.go(2 hunks)agent/inbound_redis.go(3 hunks)agent/options.go(1 hunks)agent/outbound_test.go(1 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(3 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(2 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/README.md(3 hunks)install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml(3 hunks)install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml(1 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)install/helm-repo/argocd-agent-agent/values.yaml(1 hunks)install/kubernetes/agent/agent-deployment.yaml(3 hunks)install/kubernetes/agent/agent-params-cm.yaml(1 hunks)install/kubernetes/principal/principal-deployment.yaml(3 hunks)install/kubernetes/principal/principal-params-cm.yaml(1 hunks)internal/argocd/cluster/cluster.go(2 hunks)internal/argocd/cluster/cluster_test.go(3 hunks)internal/argocd/cluster/informer_test.go(6 hunks)internal/argocd/cluster/manager.go(3 hunks)internal/argocd/cluster/manager_test.go(3 hunks)principal/options.go(2 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/server.go(3 hunks)test/e2e/README.md(2 hunks)test/e2e/fixture/cluster.go(7 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (16)
- install/kubernetes/agent/agent-params-cm.yaml
- internal/argocd/cluster/cluster_test.go
- hack/dev-env/configure-argocd-redis-tls.sh
- install/helm-repo/argocd-agent-agent/README.md
- install/helm-repo/argocd-agent-agent/values.schema.json
- agent/inbound_redis.go
- internal/argocd/cluster/cluster.go
- cmd/argocd-agent/agent.go
- principal/options.go
- install/kubernetes/principal/principal-params-cm.yaml
- install/helm-repo/argocd-agent-agent/values.yaml
- install/kubernetes/agent/agent-deployment.yaml
- agent/outbound_test.go
- install/kubernetes/principal/principal-deployment.yaml
- docs/getting-started/kubernetes/index.md
- Makefile
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/run-e2e.shinstall/helm-repo/argocd-agent-agent/templates/agent-deployment.yamlhack/dev-env/Procfile.e2etest/e2e/README.mdhack/dev-env/start-e2e.sh
🧬 Code graph analysis (7)
internal/argocd/cluster/manager_test.go (1)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)
principal/server.go (2)
internal/logging/logging.go (3)
Warn(300-302)Debug(290-292)Info(295-297)internal/argocd/cluster/manager.go (1)
NewManager(71-119)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)test/fake/kube/kubernetes.go (1)
NewFakeKubeClient(31-44)
agent/options.go (1)
agent/agent.go (2)
AgentOption(136-136)Agent(65-117)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(169-185)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(169-185)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh
[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.
(SC2064)
hack/dev-env/start-e2e.sh
[warning] 58-58: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Lint Go code
- GitHub Check: Run unit tests
- GitHub Check: Build & cache Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (18)
cmd/argocd-agent/principal.go (1)
90-97: Redis TLS option variables are well-scoped and consistent.Names and grouping align with existing Redis options and distinguish clearly between server and upstream TLS concerns; no issues from a readability or correctness standpoint.
agent/options.go (2)
111-117: LGTM!The
WithRedisTLSEnabledoption follows the established pattern for agent options and correctly enables TLS for Redis connections.
119-125: LGTM!The CA path setter is correctly implemented. File validation occurs later during runtime initialization, which is an appropriate design choice.
principal/server.go (1)
352-371: LGTM!The Redis proxy TLS configuration correctly handles both server-side TLS (for incoming Argo CD connections) and upstream TLS (for connections to principal's Redis), with appropriate support for path-based and in-memory certificates.
internal/argocd/cluster/manager.go (1)
71-81: LGTM!The TLS configuration parameter is correctly added to
NewManagerand propagated toNewClusterCacheInstance. The API design allows nil to be passed for non-TLS scenarios, maintaining backward compatibility.principal/redisproxy/redisproxy.go (4)
65-76: LGTM!The TLS configuration fields are well-structured, with clear separation between server-side TLS (for incoming Argo CD connections) and upstream TLS (for connections to principal's Redis). Supporting both path-based and in-memory certificates provides good flexibility.
98-128: LGTM!The TLS setter methods follow a clean, straightforward pattern for configuration. Validation occurs later during TLS config creation or connection establishment, which is an appropriate design choice.
173-194: LGTM!The TLS listener setup is correctly implemented with appropriate error handling and clear logging to distinguish between TLS and non-TLS modes.
863-905: Verify the TLS wrapping condition.Line 864 checks
if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure). This means TLS wrapping only occurs if bothtlsEnabledis true and at least one upstream TLS option is set.Is this the intended behavior? Should upstream TLS be enabled whenever
tlsEnabledis true, even without CA configuration? The current logic might skip TLS wrapping iftlsEnabledis true but none of the CA/insecure options are set.Please verify the intended behavior:
- Should TLS be used for upstream Redis whenever
tlsEnabledis true?- Or should it only use TLS when CA/insecure options are explicitly configured?
If upstream TLS should always be enabled when
tlsEnabledis true, consider:- if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) { + if rp.tlsEnabled { tlsConfig := &tls.Config{ MinVersion: tls.VersionTLS12, }internal/argocd/cluster/manager_test.go (1)
57-57: LGTM!The test updates correctly pass
nilfor the TLS configuration parameter, maintaining test coverage for non-TLS scenarios. The API change is properly reflected in all test cases.Also applies to: 78-78
internal/argocd/cluster/informer_test.go (1)
19-19: LGTM!All test cases are consistently updated to pass the new TLS configuration parameter. The nil values are appropriate for testing non-TLS scenarios, and the tests continue to validate the core cluster informer functionality.
Also applies to: 33-33, 50-50, 87-87, 115-115
install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)
93-101: Redis TLS agent params wiring looks consistent.The new
agent.redis.tls.*keys and defaults line up with the Helm values and deployment env wiring; I don’t see issues here.hack/dev-env/start-agent-autonomous.sh (1)
37-62: Dev Redis TLS/address handling is coherent with the new TLS setup.Detecting the local CA, constructing
--redis-tls-*flags, and defaulting tolocalhost:6382(with matching port‑forward guidance) all look correct and align with the cert generation script and Procfile wiring.Also applies to: 66-67
test/e2e/README.md (1)
27-66: E2E Redis TLS documentation aligns well with the new tooling.The new note and “Redis TLS” section clearly describe how TLS is auto‑configured for
make setup-e2e, how to regenerate/reconfigure certs, and why InsecureSkipVerify is used in tests. Wording and commands match the added dev‑env scripts.hack/dev-env/Procfile.e2e (1)
1-6: Procfile wiring for Redis port‑forwards and agents looks consistent.Mapping Redis to 6380/6381/6382 and then starting principal/agents (with the agents honoring
ARGOCD_AGENT_REDIS_ADDRESS) lines up with the start‑agent scripts and the TLS/localhost assumptions; this looks good.hack/dev-env/start-principal.sh (1)
56-74: TLS enablement viaREDIS_TLS_ARGSis consistent with the new Redis TLS flowThe detection of Redis TLS certs under
${SCRIPTPATH}/creds/redis-tlsand conditional population ofREDIS_TLS_ARGSis clear and matches the new--redis-*flags exposed by the principal. Passing$REDIS_TLS_ARGSinto thego run ... principalinvocation is a straightforward way to keep TLS optional but default-on when certs are present.No functional issues spotted here.
hack/dev-env/start-e2e.sh (1)
50-55: Localhost Redis addresses fit the TLS/port-forward designHard-coding the Redis addresses to
localhost:6380/6381/6382and delegating port-forwards togoreman/Procfile.e2ematches the certificate SAN strategy and simplifies TLS validation for E2E.This looks consistent with the new principal dev script and the Redis TLS configuration flow.
hack/dev-env/configure-redis-tls.sh (1)
47-55: Redis TLS configuration script is thorough and idempotentThe script’s overall flow—mapping context to cert prefix, preserving/restoring the initial kube context via a trap, creating/updating the
argocd-redis-tlssecret with--dry-run=client | kubectl apply, and patching volumes, volumeMounts, and args only when needed—looks well thought out and idempotent for the dev/e2e environment.The JSON patches for both initial and append cases on
volumes/volumeMountsand the rollout wait logic are appropriate for ensuring Redis ends up in TLS-only mode before proceeding.Also applies to: 93-181, 188-204
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/clusterinfo_test.go (1)
150-156: Inconsistent timeout: this assertion still uses 30s/1s.The final re-connected check for the autonomous agent uses
30*time.Second, 1*time.Second, while all other similar assertions in this file were updated to60*time.Second, 2*time.Second. This inconsistency could lead to flaky tests under the same port-forward latency conditions mentioned in the comments.Apply this diff for consistency:
requires.Eventually(func() bool { return fixture.HasConnectionStatus(fixture.AgentAutonomousName, appv1.ConnectionState{ Status: appv1.ConnectionStatusSuccessful, Message: fmt.Sprintf(message, fixture.AgentAutonomousName, "connected"), ModifiedAt: &metav1.Time{Time: time.Now()}, }, clusterDetail) - }, 30*time.Second, 1*time.Second) + }, 60*time.Second, 2*time.Second)
🧹 Nitpick comments (8)
principal/resource.go (1)
39-39: Provide justification for the 3x timeout increase.The timeout has been increased from 10 to 30 seconds without explanation. While this may be necessary to accommodate TLS handshake and encryption overhead introduced by this PR, the lack of documentation makes it unclear whether this change masks underlying performance issues or is genuinely required.
Please clarify why this increase is needed and consider documenting it in a comment. Additionally, as noted in the TODO above, making this timeout configurable would allow better tuning for different deployment scenarios, especially given the variance in TLS overhead across environments.
principal/listen.go (1)
174-196: Helpful logging additions for debugging the startup flow.The logging statements provide useful visibility into the WebSocket enablement path and server startup sequence, which will help with troubleshooting.
Minor formatting suggestion: consider removing the emoji (line 174) and leading spaces in log messages (lines 176, 196) for consistency with standard structured logging conventions.
Apply this diff for consistent log message formatting:
- log().WithField("enableWebSocket", s.enableWebSocket).Info("🔧 Checking if WebSocket is enabled") + log().WithField("enableWebSocket", s.enableWebSocket).Info("Checking if WebSocket is enabled") if s.enableWebSocket { - log().Info(" WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC") + log().Info("WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC") opts := []grpchttp1server.Option{grpchttp1server.PreferGRPCWeb(true)} downgradingHandler := grpchttp1server.CreateDowngradingHandler(s.grpcServer, http.NotFoundHandler(), opts...)go func() { log().Info("Starting gRPC server.Serve() - server is now accepting connections") err = s.grpcServer.Serve(s.listener.l) - log().WithError(err).Warn(" gRPC server.Serve() exited") + log().WithError(err).Warn("gRPC server.Serve() exited") errch <- err }()hack/dev-env/start-agent-managed.sh (1)
63-74: Consider restricting permissions on extracted TLS credentials.The extracted TLS private key is written to
/tmp/agent-managed-tls.keywith default permissions, potentially making it readable by other users on shared systems.Apply restrictive permissions before writing sensitive files:
# Extract mTLS client certificates and CA from Kubernetes secret for agent authentication echo "Extracting mTLS client certificates and CA from Kubernetes..." TLS_CERT_PATH="/tmp/agent-managed-tls.crt" TLS_KEY_PATH="/tmp/agent-managed-tls.key" ROOT_CA_PATH="/tmp/agent-managed-ca.crt" + +# Set restrictive permissions for private key +umask 077 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \ -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \ -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}" kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-ca \ -o jsonpath='{.data.tls\.crt}' | base64 -d > "${ROOT_CA_PATH}" +# Restore default umask +umask 022 echo " mTLS client certificates and CA extracted"hack/dev-env/start-principal.sh (1)
41-42: Use single quotes in trap to defer variable expansion (shellcheck SC2064).While the current code works because
$PORT_FORWARD_PIDis set before the trap, using single quotes is the conventional and safer pattern.- trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT + trap 'kill $PORT_FORWARD_PID 2>/dev/null || true' EXITdocs/configuration/redis-tls.md (2)
487-494: Add language specifier to fenced code block.Per markdownlint, fenced code blocks should have a language specified. Since this is script output, use
textorconsole.-``` +```text Generating Redis TLS certificates in hack/dev-env/creds/redis-tls...
498-513: Add language specifiers to remaining script output blocks.Same issue as above - these console output examples should have a language specifier for markdownlint compliance.
-``` +```text ╔══════════════════════════════════════════════════════════╗ ║ Configure Redis Deployment for TLS ║-``` +```text ╔══════════════════════════════════════════════════════════╗ ║ Configure Argo CD Components for Redis TLS ║Also applies to: 516-532
test/e2e/redis_proxy_test.go (1)
120-123: Hard-coded sleep for SSE stream stabilization.While the 5-second sleep addresses the race condition mentioned in the comment, it's a fixed delay that may be insufficient under heavy load or excessive in fast environments. Consider using a more deterministic approach if flakiness persists.
An alternative would be to wait for an initial SSE message (e.g., the current resource tree state) before proceeding, though the current approach is pragmatic for E2E tests.
test/e2e/fixture/cluster.go (1)
259-267: Cleanup doesn't explicitly close Redis connections.
CleanupRedisCachedClientsclears the cache map but doesn't explicitly close the underlying Redis connections. While Go's garbage collector will eventually clean them up, explicit closure ensures immediate resource release and avoids connection pool exhaustion in long test runs.Consider closing the Redis clients explicitly. The
appstatecache.Cachewraps acacheutil.Cachewhich has aredisClient. You may need to expose or track the underlyingredis.Clientto callClose():// If the underlying redis.Client is accessible, close it explicitly // For now, this may require refactoring getCacheInstance to return both // the cache and the client, or using a wrapper structIf the current approach works reliably in tests without connection issues, this can be deferred.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/auth.go(1 hunks)principal/listen.go(3 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- Makefile
- hack/dev-env/start-e2e.sh
- hack/dev-env/configure-argocd-redis-tls.sh
- hack/dev-env/start-agent-autonomous.sh
- install/helm-repo/argocd-agent-agent/values.schema.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/dev-env/start-agent-managed.shtest/e2e/rp_test.gotest/run-e2e.shhack/dev-env/Procfile.e2etest/e2e/README.md
🧬 Code graph analysis (6)
principal/auth.go (1)
internal/logging/logging.go (2)
Trace(285-287)Warn(300-302)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
GetArgoCDServerEndpoint(315-337)GetInitialAdminSecret(302-313)NewArgoClient(52-66)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
Event(112-115)internal/logging/logfields/logfields.go (1)
Event(34-34)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(42-56)AgentManagedName(37-37)AgentClusterServerURL(39-39)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
487-487: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
498-498: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
516-516: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh
[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.
(SC2064)
🔇 Additional comments (35)
test/e2e/README.md (1)
21-107: Unable to verify referenced helper scripts and documentation due to repository access limitations.The documentation structure and Redis TLS guidance are well-presented, but the verification of referenced files could not be completed in this environment. The following items require manual verification by the developer or CI system:
hack/dev-env/gen-redis-tls-certs.sh(executable)hack/dev-env/configure-redis-tls.sh(executable)hack/dev-env/configure-argocd-redis-tls.sh(executable)hack/dev-env/reverse-tunnel/setup.sh(executable)hack/dev-env/reverse-tunnel/README.md(exists)Confirm these files exist and are properly executable before merging.
principal/listen.go (1)
224-230: LGTM! Clear service registration logging.The logging statements provide good visibility into the gRPC service registration flow, making it easier to debug startup issues and verify that all services are registered successfully.
principal/tracker/tracking.go (1)
75-78: Unable to verify review comment due to repository access limitation.The review comment cannot be verified at this time because the repository is inaccessible. To properly assess the concerns about the buffered channel change at lines 75-78, the following would need to be verified:
- Whether a real deadlock scenario exists between
processRedisEventResponseandsendSynchronousRedisMessageToAgent- Whether this change is directly related to the Redis TLS PR objective or should be split into a separate PR
- Whether a buffer capacity of 1 is sufficient for the actual send/receive patterns in the code
- Whether the buffering addresses the root cause or masks a deeper synchronization issue
Without access to the codebase to examine the sender/receiver implementations and their usage patterns, the original review comment's concerns remain unresolved.
principal/auth.go (1)
154-164: LGTM - Debug logging additions for auth interceptor.The trace-level logging provides useful debugging information for authentication flow. The emoji prefixes add visual distinction in logs, which can be helpful during debugging sessions.
Consider documenting the emoji convention if used elsewhere, or removing them to maintain consistent log formatting across the codebase.
test/e2e/fixture/argoclient.go (1)
316-334: LGTM - Environment variable override for ArgoCD server endpoint.The early return when
ARGOCD_SERVER_ADDRESSis set provides useful flexibility for E2E tests, particularly when dynamic LoadBalancer addresses don't match certificate SANs. The fallback to K8s service lookup is preserved correctly.test/e2e/fixture/fixture.go (3)
109-154: Increased polling timeouts for deletion operations.The timeout increase from 60 to 120 seconds accommodates potential TLS handshake overhead and certificate validation delays during E2E tests.
229-241: Improved cleanup resilience with non-fatal warnings.Using warnings instead of returning errors prevents cleanup failures from cascading and failing entire test suites. The
DeepCopy()usage correctly avoids mutating loop variables when adjusting namespace/name.
457-470: Graceful degradation when Redis is unavailable.The cleanup now logs a warning and continues if Redis is unavailable (e.g., port-forward died), rather than failing the cleanup. This improves test reliability.
hack/dev-env/start-agent-managed.sh (1)
37-61: LGTM - Redis TLS detection and address configuration.The conditional TLS enablement based on certificate presence is a clean pattern. The port-forward guidance message helps developers understand the setup requirements.
hack/dev-env/start-principal.sh (3)
23-43: LGTM - Port-forward setup with cleanup trap.The port-forward establishment with PID tracking, validation, and cleanup trap is a robust pattern for local development. The 2-second wait allows time for the connection to stabilize.
58-76: LGTM - Redis TLS certificate detection.Properly checks for all three required files (cert, key, CA) before enabling TLS. The descriptive comments about certificate SANs help future maintainers understand the setup.
84-86: Undefined variableMTLS_ARGSreferenced.
$MTLS_ARGSis used but not defined in this script. If it's intentionally optional (set externally), this is fine; otherwise, it may cause unintended behavior.internal/argocd/cluster/cluster.go (2)
135-142: LGTM - Defensive initialization of ConnectionState.Good defensive programming: initializing
ConnectionStatewhen the agent first connects prevents potential nil pointer issues and provides meaningful status information.
176-191: TLS configuration support added to Redis cache initialization.The signature change to accept
*tls.Configenables TLS for Redis connections. PassingnilfortlsConfigmaintains backward compatibility (no TLS).Note: Verification of caller updates could not be completed due to repository access limitations. Manual verification is required to ensure all calls to
NewClusterCacheInstancehave been updated to pass thetlsConfigparameter.hack/dev-env/Procfile.e2e (1)
1-7: Port-forward configuration for TLS-enabled Redis connections.The port-forward setup allows TLS certificate validation to work correctly since
localhostis included in certificate SANs. The staggered sleep delays (3s for principal, 5s for agents) ensure port-forwards are established before components start.Verify that
MANAGED_AGENT_REDIS_ADDRandAUTONOMOUS_AGENT_REDIS_ADDRenvironment variables are defined in your development environment setup or sourced before running this Procfile, as they are required by lines 6-7 but not defined locally in this file.agent/agent.go (2)
323-345: TLS configuration implementation looks correct.The TLS configuration for the cluster cache Redis client is well-structured:
- Uses minimum TLS 1.2 (appropriate security baseline)
- Properly handles insecure mode with warning log (line 330) - this addresses the previous review feedback
- Correctly loads and parses CA certificate with appropriate error handling
One minor observation: The error message at line 339 could be more consistent with the message at line 335 by using
%wfor error wrapping.- return nil, fmt.Errorf("failed to parse CA certificate for cluster cache from %s", a.redisProxyMsgHandler.redisTLSCAPath) + return nil, fmt.Errorf("failed to parse CA certificate for cluster cache from %s: no valid certificates found", a.redisProxyMsgHandler.redisTLSCAPath)
445-460: Good improvement: immediate cluster cache info update on startup.The refactored goroutine correctly:
- Sends an initial update immediately on startup (line 448) rather than waiting for the first ticker interval
- Properly defers
ticker.Stop()for cleanup- Handles context cancellation appropriately
This ensures the principal receives cluster cache info promptly after agent startup.
cmd/argocd-agent/principal.go (3)
258-288: Redis TLS configuration logic is well-structured.The implementation correctly:
- Validates mutual exclusivity between
--redis-upstream-tls-insecureand--redis-upstream-ca-path(lines 273-275)- Uses the secret as the default fallback when neither insecure nor CA path is specified (lines 285-286)
- Logs appropriate messages for each configuration path
The previous review comment suggested validating all three modes (insecure, CA path, CA secret) as mutually exclusive. However, the current behavior is actually reasonable: the secret serves as a default when no explicit configuration is provided, which is a common pattern. If you prefer explicit mutual exclusivity for all three, let me know.
419-441: CLI flags are well-defined with consistent naming.The Redis TLS flags follow the established patterns:
- Consistent
ARGOCD_PRINCIPAL_REDIS_*environment variable naming- Reasonable defaults (
falsefor CLI,argocd-redis-tlsfor secret names)- Clear descriptions for each flag
471-471: Timeout increase is reasonable for production reliability.The 30-second timeout for secret retrieval allows for network latency and Kubernetes API server load, which is appropriate for production environments.
test/e2e/clusterinfo_test.go (1)
108-115: Timeout increases are appropriate and well-documented.The increased timeouts (60s/2s) with explanatory comments appropriately account for port-forward latency in long test runs. This should improve test stability in CI environments.
Also applies to: 123-129
test/e2e/rp_test.go (3)
162-169: Good refactoring: centralized endpoint and secret retrieval.Using
fixture.GetArgoCDServerEndpointandfixture.GetInitialAdminSecrethelpers:
- Reduces code duplication across tests
- Centralizes the logic for environment variable checks and Kubernetes fallback
- Makes tests more maintainable when endpoint retrieval logic changes
The helper at
fixture/argoclient.go:314-336properly checksARGOCD_SERVER_ADDRESSenvironment variable first, then falls back to Kubernetes service lookup.
295-306: Consistent refactoring across test functions.The same fixture helper pattern is applied correctly here, maintaining consistency with
Test_ResourceProxy_Argo.
509-510: Minor formatting change, no functional impact.docs/configuration/redis-tls.md (2)
1-17: Comprehensive and well-structured documentation.This documentation thoroughly covers:
- Architecture overview with clear diagrams
- Quick start guides for different environments (dev, E2E, production)
- Detailed configuration options for both principal and agent
- Troubleshooting section with practical solutions
- Security best practices
The table of contents and section organization make it easy to navigate.
736-755: Security best practices are appropriately scoped.The security recommendations cover essential practices:
- Strong key sizes (4096-bit RSA)
- Appropriate certificate validity (1 year)
- Private key protection with RBAC
- Certificate rotation planning
- Clear warning against insecure options in production
hack/dev-env/gen-redis-tls-certs.sh (1)
1-10: Well-structured certificate generation script.The script properly handles idempotency with file existence checks, uses
set -efor error handling, and generates appropriate SANs for each component. The cleanup of temporary files (CSR, EXT, SRL) is good practice.test/run-e2e.sh (2)
24-45: Good enforcement of TLS prerequisites.The script properly validates TLS certificates exist before running tests and provides clear, actionable error messages guiding users to run the setup scripts. This aligns with the PR objective of making Redis TLS mandatory for E2E tests.
82-115: Environment variable exports are macOS-only; verify Linux CI Redis connectivity strategy.The Redis address environment variables (
ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS,MANAGED_AGENT_REDIS_ADDR,AUTONOMOUS_AGENT_REDIS_ADDR,ARGOCD_SERVER_ADDRESS) are only set when running on macOS. Confirm that Linux CI environments have a strategy for accessing Redis services—either through MetalLB LoadBalancer service IPs or by setting these environment variables explicitly for Linux as well.test/e2e/redis_proxy_test.go (2)
210-237: Good retry handling for transient Redis connection issues.Wrapping the
ResourceTreecall in anEventuallyblock with explicit error logging handles the EOF errors mentioned in the comment. This is a robust pattern for E2E tests dealing with TLS-enabled Redis connections that may experience transient failures.
642-653: Appropriate SSE transport configuration.The
Timeout: 0settings are correct for SSE streams which are long-lived connections. TheIdleConnTimeout: 300shelps maintain connections during test execution. TheInsecureSkipVerify: trueis documented in the PR as intentional for E2E tests with dynamic LoadBalancer addresses.hack/dev-env/configure-redis-tls.sh (2)
81-121: Good practice: scaling down components before TLS transition.Scaling down ArgoCD components before enabling Redis TLS prevents SSL handshake errors from pods attempting non-TLS connections. Storing replica counts in a ConfigMap enables proper restoration by the companion script.
199-215: Correct Redis TLS configuration.The Redis args properly configure TLS-only mode (
--port 0disables plain TCP,--tls-port 6379enables TLS). The--tls-auth-clients nosetting means clients authenticate via password only, not mutual TLS, which is appropriate for this use case.test/e2e/fixture/cluster.go (2)
182-201: TLS configuration with InsecureSkipVerify is appropriate for E2E tests.Using
InsecureSkipVerify: truefor E2E tests is explicitly documented in the PR description as a workaround for dynamic LoadBalancer addresses that may not match certificate SANs. TheMinVersion: tls.VersionTLS12ensures a reasonable security baseline.
319-327: Good: TLS enabled by default with environment override support.Enabling TLS by default (
ManagedAgentRedisTLSEnabled = true) aligns with the PR objective. The environment variable override (MANAGED_AGENT_REDIS_ADDR) supports local development with port-forwards, which is consistent with the macOS handling inrun-e2e.sh.
40d7b3c to
8b47b98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/rp_test.go (1)
162-169: Good refactor using fixture helpers.Replacing direct Kubernetes lookups with
fixture.GetArgoCDServerEndpointandfixture.GetInitialAdminSecretcentralizes configuration retrieval and supports environment variable overrides (e.g.,ARGOCD_SERVER_ADDRESS), improving test flexibility.
♻️ Duplicate comments (6)
docs/configuration/redis-tls.md (3)
487-495: Add language identifier to fenced code block.This output example block should have a language identifier. Based on markdownlint feedback and past review comments, apply this fix:
-``` +```text Generating Redis TLS certificates in hack/dev-env/creds/redis-tls... Generating CA key and certificate... ... -``` +```This resolves the MD040 linting error.
Based on learnings, past review flagged this same issue which was marked as addressed in commit 6247404, but the static analysis tool still reports it.
498-514: Add language identifier to fenced code block.This output example block should have a language identifier:
-``` +```text ╔══════════════════════════════════════════════════════════╗ ║ Configure Redis Deployment for TLS ║ ... -``` +```Based on learnings, this was flagged in a past review and marked as addressed, but the linter still reports it.
516-533: Add language identifier to fenced code block.This output example block should have a language identifier:
-``` +```text ╔══════════════════════════════════════════════════════════╗ ║ Configure Argo CD Components for Redis TLS ║ ... -``` +```Based on learnings, this was flagged in a past review and marked as addressed, but the linter still reports it.
hack/dev-env/gen-redis-tls-certs.sh (1)
68-86: Guard against emptyLOCAL_IPwhen emitting SAN IP entry.If
LOCAL_IPends up empty, the extension file will containIP.3 =, which can cause OpenSSL failures or produce an invalid certificate; only addIP.3when a non-empty IP was detected.- cat > "${CREDS_DIR}/redis-proxy.ext" <<EOF -subjectAltName = @alt_names -[alt_names] -DNS.1 = argocd-redis-proxy -DNS.2 = argocd-redis-proxy.argocd -DNS.3 = argocd-redis-proxy.argocd.svc -DNS.4 = argocd-redis-proxy.argocd.svc.cluster.local -DNS.5 = localhost -DNS.6 = rathole-container-internal -IP.1 = 127.0.0.1 -IP.2 = 127.0.0.2 -IP.3 = ${LOCAL_IP} -EOF + cat > "${CREDS_DIR}/redis-proxy.ext" <<EOF +subjectAltName = @alt_names +[alt_names] +DNS.1 = argocd-redis-proxy +DNS.2 = argocd-redis-proxy.argocd +DNS.3 = argocd-redis-proxy.argocd.svc +DNS.4 = argocd-redis-proxy.argocd.svc.cluster.local +DNS.5 = localhost +DNS.6 = rathole-container-internal +IP.1 = 127.0.0.1 +IP.2 = 127.0.0.2 +EOF + + if [ -n "${LOCAL_IP}" ]; then + echo "IP.3 = ${LOCAL_IP}" >> "${CREDS_DIR}/redis-proxy.ext" + fihack/dev-env/configure-redis-tls.sh (1)
61-66: CA certificate validation still missing.The validation checks for server certificate and key but not for
ca.crt, which is used at Line 128 when creating the secret. This is the same issue flagged in previous reviews.Apply this diff to add CA validation:
# Check certificates exist -if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ]; then - echo "Error: Redis TLS certificate or key not found for ${REDIS_CERT_PREFIX}" +if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ] || [ ! -f "creds/redis-tls/ca.crt" ]; then + echo "Error: Redis TLS certificates not found (${REDIS_CERT_PREFIX}.crt, ${REDIS_CERT_PREFIX}.key, or ca.crt)" echo "Please run: ./gen-redis-tls-certs.sh" exit 1 ficmd/argocd-agent/principal.go (1)
272-288: Incomplete mutual exclusivity validation for upstream TLS modes.Lines 273-275 validate that
--redis-upstream-tls-insecureand--redis-upstream-ca-pathare mutually exclusive, but don't check whether--redis-upstream-ca-pathand--redis-upstream-ca-secret-nameare both specified. If both are provided, Line 281 silently takes precedence, ignoring the secret configuration without warning.Consider validating all three modes for mutual exclusivity:
+ // Validate upstream TLS configuration - only one mode allowed + modesSet := 0 + if redisUpstreamTLSInsecure { + modesSet++ + } + if redisUpstreamTLSCAPath != "" { + modesSet++ + } + if redisUpstreamTLSCASecretName != "" { + modesSet++ + } + if modesSet > 1 { + cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name") + } + - // Validate upstream TLS configuration - insecure and CA path are mutually exclusive - if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" { - cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path") - } - // Redis upstream TLS (for connections to principal's argocd-redis) if redisUpstreamTLSInsecure {
🧹 Nitpick comments (4)
principal/resource.go (1)
39-39: Timeout extension appropriate for TLS operations.The increase from 10 to 30 seconds accommodates longer TLS handshake and certificate loading operations introduced by Redis TLS support across the system.
Consider the existing TODO at line 38—making this timeout configurable would provide better flexibility for different deployment scenarios.
principal/redisproxy/redisproxy.go (1)
131-165: Unused parse result at line 154.Line 154 parses the private key but discards the result. If this is validation-only, the error check suffices. Otherwise, consider removing the parse call.
Apply this diff to remove the unused parse:
cert.PrivateKey = rp.tlsServerKey cert.Leaf = rp.tlsServerCert - // Try to parse the key - if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil { - return nil, fmt.Errorf("failed to parse private key: %w", err) - } } else { return nil, fmt.Errorf("no TLS certificate configured") }hack/dev-env/start-principal.sh (1)
23-43: Minor: adjusttrapquoting to satisfy ShellCheck SC2064.The
trapcurrently interpolatesPORT_FORWARD_PIDat definition time; switching to a single-quoted trap and quoting the variable inside avoids SC2064 and is the idiomatic form while preserving behavior.- # Cleanup function to kill port-forward on exit - trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT + # Cleanup function to kill port-forward on exit + trap 'kill "$PORT_FORWARD_PID" 2>/dev/null || true' EXITAlso applies to: 58-76, 84-85
test/e2e/fixture/fixture.go (1)
109-155: Best-effort cleanup behavior change looks intentional; consider whether other paths should match.Doubling the deletion wait loops, using DeepCopy for cross-namespace waits, and downgrading many application/AppProject cleanup failures to warnings will reduce e2e flakiness, but may leave residual resources when deletions keep failing; if you want fully consistent best-effort semantics, you might also convert the remaining repo/namespace cleanup errors to warnings, otherwise this mixed strategy seems reasonable for tests.
Also applies to: 159-171, 218-292, 294-357, 457-470
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
Makefile(1 hunks)agent/agent.go(3 hunks)agent/inbound_redis.go(3 hunks)agent/options.go(1 hunks)agent/outbound_test.go(1 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(2 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/README.md(3 hunks)install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml(2 hunks)install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml(1 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)install/helm-repo/argocd-agent-agent/values.yaml(1 hunks)install/kubernetes/agent/agent-deployment.yaml(3 hunks)install/kubernetes/agent/agent-params-cm.yaml(1 hunks)install/kubernetes/principal/principal-deployment.yaml(3 hunks)install/kubernetes/principal/principal-params-cm.yaml(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)internal/argocd/cluster/cluster_test.go(3 hunks)internal/argocd/cluster/informer_test.go(6 hunks)internal/argocd/cluster/manager.go(3 hunks)internal/argocd/cluster/manager_test.go(3 hunks)principal/auth.go(1 hunks)principal/listen.go(3 hunks)principal/options.go(2 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/server.go(3 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/e2e/sync_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (15)
- Makefile
- principal/listen.go
- install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
- internal/argocd/cluster/manager_test.go
- install/kubernetes/principal/principal-deployment.yaml
- principal/tracker/tracking.go
- install/helm-repo/argocd-agent-agent/values.schema.json
- test/run-e2e.sh
- install/kubernetes/principal/principal-params-cm.yaml
- internal/argocd/cluster/informer_test.go
- docs/getting-started/kubernetes/index.md
- agent/outbound_test.go
- test/e2e/fixture/argoclient.go
- hack/dev-env/start-agent-autonomous.sh
- cmd/argocd-agent/agent.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/e2e/rp_test.gohack/dev-env/start-agent-managed.shinstall/kubernetes/agent/agent-deployment.yamlinstall/kubernetes/agent/agent-params-cm.yamlhack/dev-env/configure-argocd-redis-tls.shtest/e2e/README.mdhack/dev-env/Procfile.e2einstall/helm-repo/argocd-agent-agent/values.yamlhack/dev-env/start-e2e.sh
🧬 Code graph analysis (12)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
GetArgoCDServerEndpoint(315-337)GetInitialAdminSecret(302-313)NewArgoClient(52-66)
principal/server.go (1)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)
agent/inbound_redis.go (2)
internal/logging/logfields/logfields.go (1)
Config(127-127)internal/logging/logging.go (1)
Warn(300-302)
agent/options.go (2)
principal/options.go (1)
WithRedisTLSEnabled(493-498)agent/agent.go (2)
AgentOption(136-136)Agent(65-117)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (4)
HasConnectionStatus(60-74)AgentManagedName(37-37)ClusterDetails(42-56)AgentAutonomousName(38-38)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
NewFakeKubeClient(31-44)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(42-56)AgentManagedName(37-37)AgentClusterServerURL(39-39)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
Config(127-127)internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/options.go (6)
WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSInsecure(543-548)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
487-487: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
498-498: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
516-516: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh
[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.
(SC2064)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Build & cache Go code
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (29)
principal/auth.go (1)
154-165: Trace logging enhances auth flow observability.The added trace logs with visual markers help debug authentication flow during development. Trace level is appropriate for detailed debugging without impacting production logs.
test/e2e/rp_test.go (1)
295-305: Consistent fixture helper usage.The same fixture helper pattern applied here maintains consistency across test functions and eliminates code duplication.
principal/server.go (2)
352-371: Redis proxy TLS configuration properly structured.The TLS configuration logic correctly prioritizes configuration sources (path-based vs. in-memory) for both server TLS (incoming connections) and upstream TLS (connections to Redis). The conditional structure ensures TLS is only configured when explicitly enabled.
400-426: Cluster manager TLS configuration with robust error handling.The TLS configuration for the cluster manager includes:
- Proper error handling for CA certificate file operations (lines 414-421)
- Certificate parsing validation with clear error messages
- Security warning for insecure mode (line 408)
- Appropriate minimum TLS version (1.2)
The CA loading from file has good error propagation that will fail server startup if certificates are misconfigured.
internal/argocd/cluster/cluster.go (2)
176-185: TLS configuration properly integrated into Redis cache.The updated signature accepts an optional TLS config and correctly wires it into Redis client options. The nil-safe design allows TLS to be disabled when not needed.
135-142: Good defensive initialization of connection state.When
ConnectionStatedoesn't exist (agent just connected), this code properly initializes it with a Successful status and timestamp. This prevents gaps in connection tracking during cluster cache stats updates.hack/dev-env/start-e2e.sh (1)
50-59: Static localhost addresses simplify TLS-enabled E2E setup.Replacing dynamic LoadBalancer IP lookups with static localhost addresses (backed by port-forwards managed by Goreman) ensures TLS certificate validation works correctly, as
localhostis included in the certificate SANs.The separation of
REDIS_PASSWORDassignment and export (lines 58-59) correctly addresses the previous shellcheck SC2155 warning, allowing proper error handling if thekubectlcommand fails.test/e2e/README.md (1)
21-108: Comprehensive E2E workflow documentation with TLS guidance.The expanded documentation clearly describes:
- The multi-step setup process with Redis TLS configured automatically
- Remote cluster considerations with reverse tunnel setup
- Multi-terminal workflow requirements
- Manual TLS reconfiguration procedures
- Environment detection for local vs CI testing
This significantly improves the developer experience for running TLS-enabled E2E tests.
principal/redisproxy/redisproxy.go (2)
168-211: LGTM: TLS listener implementation is sound.The TLS-enabled listener is correctly configured with MinVersion set to TLS12, and the branching logic cleanly separates TLS and non-TLS paths with appropriate logging.
847-908: Verify upstream TLS condition covers all intended scenarios.The condition at line 864 requires
tlsEnabledAND at least one of (CA, CAPath, Insecure) to wrap the upstream connection with TLS. Confirm this aligns with the intended behavior—specifically, whethertlsEnabledalone (without CA/CAPath/Insecure) should skip upstream TLS or raise an error.install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)
93-101: LGTM: Redis TLS configuration added correctly.The three new TLS-related keys (enabled, ca-path, and insecure) are properly documented and bound to Helm values, consistent with the TLS implementation across the codebase.
test/e2e/clusterinfo_test.go (1)
108-115: LGTM: Timeout increases accommodate TLS latency.The timeout increases from 30s to 60s with adjusted polling intervals are appropriate for handling potential port-forward latency in long test runs, and the explanatory comments clarify the rationale.
Also applies to: 123-129, 141-142
internal/argocd/cluster/cluster_test.go (1)
36-36: LGTM: Test updated for new TLS parameter.The nil TLS config parameter correctly aligns test call sites with the updated NewManager signature, appropriately passing nil for tests that don't exercise TLS functionality.
Also applies to: 225-225, 304-304
install/helm-repo/argocd-agent-agent/values.yaml (1)
136-162: LGTM: TLS and network policy configuration added.The new redisTLS configuration block and networkPolicy settings are well-documented and use secure defaults (TLS enabled by default, insecure mode disabled), consistent with the broader TLS implementation.
install/helm-repo/argocd-agent-agent/README.md (1)
45-50: LGTM: Documentation updated for TLS configuration.The documentation entries for redisTLS, networkPolicy, and tlsRootCAPath accurately reflect the corresponding values.yaml changes.
Also applies to: 68-72, 96-96
install/kubernetes/agent/agent-params-cm.yaml (1)
88-99: LGTM: Kubernetes manifest updated with Redis TLS configuration.The three new Redis TLS configuration keys are properly documented with secure defaults and mount paths that align with the deployment configuration and Helm templates.
test/e2e/sync_test.go (1)
371-371: Verify hook name "before" matches the test fixture in test/data/pre-sync.The hook Job name was changed from "pre-post-sync-before" to "before" at lines 371 and 466. Ensure the Job name in the test fixture corresponds to this value to maintain test data consistency.
Also applies to: 466-466
hack/dev-env/Procfile.e2e (1)
1-7: Procfile e2e Redis/server port-forwards and startup ordering look consistent.Port mappings for Redis (6380/6381/6382) match the defaults used in the dev start scripts, and placing the port-forwards before principal/agent startup should ensure TLS-capable Redis endpoints are reachable when processes start.
agent/options.go (1)
111-133: Redis TLS AgentOptions cleanly mirror existing option pattern.The new WithRedisTLSEnabled / WithRedisTLSCAPath / WithRedisTLSInsecure helpers follow the same style as the existing Redis options and provide straightforward wiring into
redisProxyMsgHandlerwithout changing other Agent behavior.hack/dev-env/start-agent-managed.sh (1)
37-83: Managed agent Redis TLS and mTLS wiring look sound.Enabling Redis TLS based on
creds/redis-tls/ca.crt, defaulting the Redis address tolocalhost:6381to match the port-forward, and extracting mTLS client cert/key/CA into/tmpfor--tls-client-cert/--tls-client-key/--root-ca-pathgives a coherent, reproducible dev/e2e setup.internal/argocd/cluster/manager.go (1)
24-45: TLS config propagation into cluster cache is consistent with cache constructor.Adding the
*tls.Configparameter toNewManagerand forwarding it toNewClusterCacheInstancecleanly wires Redis TLS into the cluster cache without altering other manager responsibilities.Also applies to: 70-82
agent/inbound_redis.go (1)
20-24: Redis TLS client configuration ingetRedisClientAndCacheis robust and well-scoped.Conditionally creating
tls.Config(TLS 1.2+), warning and settingInsecureSkipVerifyonly when explicitly requested, and otherwise loading a CA fromredisTLSCAPath(or falling back to system CAs with a warning) is a solid pattern for securing the Redis connection while keeping dev/e2e knobs available.Also applies to: 51-55, 345-372
agent/agent.go (1)
323-349: LGTM! TLS configuration properly implemented.The cluster cache TLS configuration correctly:
- Creates TLS config with minimum TLS 1.2
- Logs a warning for insecure mode (Line 330)
- Loads and validates CA certificates with clear error messages
- Passes the config to cluster cache initialization
hack/dev-env/configure-argocd-redis-tls.sh (1)
1-261: Well-structured TLS configuration script.The script demonstrates good practices:
- Idempotent operations with existence checks before patching
- Clear user-facing messages and error handling
- Proper scaling sequence (scale down/configure/scale up) to prevent connection errors during TLS transition
- Context switching with cleanup on exit
Based on learnings, this is appropriate for E2E test environments.
test/e2e/redis_proxy_test.go (1)
120-123: Good resilience improvements for E2E tests.The changes address race conditions and improve test reliability:
- 5-second delay prevents pod deletion before Redis SUBSCRIBE is active (Lines 120-123)
- Message draining logic ensures all available SSE messages are processed (Lines 188-208)
- Retry logic for ResourceTree calls handles transient Redis EOF errors (Lines 211-237)
- Buffered channel prevents message loss (Line 588)
These are appropriate enhancements for test stability.
Also applies to: 188-208, 211-237
install/kubernetes/agent/agent-deployment.yaml (1)
149-166: TLS configuration correctly integrated into agent deployment.The manifest changes properly:
- Add TLS environment variables from ConfigMap with
optional: true- Mount the redis-tls-ca secret at /app/config/redis-tls
- Mark both the volume and secret as optional to support non-TLS deployments
Also applies to: 193-211
test/e2e/fixture/cluster.go (2)
227-267: Excellent addition of Redis client caching.The new
getCachedCacheInstanceandCleanupRedisCachedClientsfunctions prevent connection leaks in E2E tests by reusing Redis clients across test operations. The mutex protection ensures thread-safety.
206-217: Appropriate timeout and pool configuration for E2E tests.The Redis client configuration is well-tuned for port-forward latency:
- Generous timeouts (DialTimeout: 10s, ReadTimeout: 30s)
- Connection pooling (PoolSize: 10, retries: 3)
- Idle connection management
These settings are suitable for test environments where network latency is higher.
principal/options.go (1)
492-548: Redis TLS options follow established patterns.The new ServerOption functions are well-structured and consistent with existing options:
WithRedisTLSEnabledfor feature toggleWithRedisServerTLSFromPathandWithRedisServerTLSFromSecretfor server TLSWithRedisUpstreamTLSCAFromFileandWithRedisUpstreamTLSCAFromSecretfor upstream CAWithRedisUpstreamTLSInsecurefor test scenariosAll use the standard option pattern and leverage existing
tlsutilhelpers.
8b47b98 to
81c3402
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
docs/configuration/redis-tls.md (1)
149-183: Tag remaining bare fenced code blocks with a language (markdownlint MD040)There are still a few fenced blocks without a language, which markdownlint flags (MD040):
- “How the tunnel works” ASCII diagram around line 150.
- Script output examples for:
gen-redis-tls-certs.shconfigure-redis-tls.shconfigure-argocd-redis-tls.shTo keep linters happy and clarify formatting, consider tagging them as plain text, e.g.:
-``` +```text Argo CD Server (remote vcluster) ... -``` +```…and similarly for each of the script-output blocks.
Also applies to: 475-521
🧹 Nitpick comments (9)
principal/auth.go (1)
154-163: Trace logging in unaryAuthInterceptor is fine but consider toning down styleThe added trace/Warn logs are helpful for debugging, but:
- They partially duplicate information already logged inside
authenticate, and- The emojis in log messages may not play well with log parsers or existing logging conventions.
Consider reusing a structured logger context (e.g., add module/client fields as elsewhere in this file) and dropping emojis to keep logs machine-friendly and consistent.
test/e2e/fixture/argoclient.go (1)
27-33: Env override for Argo CD server endpoint is useful; clarify expected formatUsing
ARGOCD_SERVER_ADDRESSas a fast path is handy for TLS-aware E2E runs and avoids an extra K8s call. SinceNewArgoClientbuildshttps://URLs by treatingendpointas theHost, this env var should behost:port(no scheme).Consider:
- Documenting that expectation where this env var is set, and/or
- Adding a lightweight sanity check (e.g., rejecting values starting with
http://orhttps://) to fail fast on misconfiguration.Also applies to: 315-337
principal/listen.go (1)
172-199: New gRPC/WebSocket and service-registration logging is functionally safeThe added Info-level logs around WebSocket enablement, server startup, and service registration improve startup visibility without changing behavior.
One nit: as in
auth.go, the emoji-heavy messages (🔧, etc.) might not fit all log ingestion/alerting setups. If you want to keep logs easily greppable and machine-friendly, consider switching to plain-text messages while retaining the same structure and fields.Also applies to: 224-230
hack/dev-env/start-agent-autonomous.sh (1)
37-83: Autonomous agent Redis TLS wiring looks correct; consider temp-file cleanupThe script correctly:
- Enables Redis TLS when
creds/redis-tls/ca.crtexists and passes--redis-tls-enabled/--redis-tls-ca-path.- Defaults
ARGOCD_AGENT_REDIS_ADDRESStolocalhost:6382for local E2E with a clear port-forward hint.- Extracts client cert/key and CA from Kubernetes secrets and passes them via
--tls-client-cert/--tls-client-key/--root-ca-path.For local dev/E2E this is fine as-is. If you want to tighten things slightly, you could switch the
/tmp/...paths tomktempfiles and register atrapto remove them on exit, so TLS materials don’t linger longer than necessary.hack/dev-env/configure-argocd-redis-tls.sh (1)
52-216: Patching logic is pragmatic for dev/E2E; consider surfacing failuresThe pattern of:
- Checking for existing
redis-tls-cavolumes/volumeMounts and--redis-use-tlsargs, and- Applying JSON patches with
... || truegives you an idempotent script that won’t die if the manifests drift slightly, which is good for local/E2E usage.
One trade-off is that if a future manifest change causes a patch to fail (e.g.,
argsorvolumesarrays are removed/renamed), the script will silently skip adding TLS CA mounts/flags and you’ll only see failures later when components can’t talk to Redis.Not urgent, but for easier debugging you might consider:
- Logging a warning when a patch fails (e.g., capture stderr/stdout and echo a “could not patch X for Redis TLS” line), or
- Tightening the presence checks (e.g., verifying
args/volumesarrays exist) so failures are more explicit.This would keep the script resilient while making TLS misconfigurations easier to diagnose.
hack/dev-env/start-agent-managed.sh (1)
37-83: Managed agent TLS and Redis address wiring look correctThe script correctly:
- Enables Redis TLS when the dev CA is present and passes the CA path via
--redis-tls-*flags.- Defaults the Redis address to
localhost:6381(aligned with the Procfile port-forward) while allowing override viaARGOCD_AGENT_REDIS_ADDRESS.- Extracts the agent mTLS cert/key and CA from Kubernetes secrets and injects them into the agent flags.
This matches the documented E2E flow and ensures proper certificate validation over the localhost port‑forward, while still allowing non‑TLS operation in ad‑hoc dev setups.
test/e2e/fixture/fixture.go (1)
107-155: Fixture cleanup and Redis-backed cluster info reset are safer and more robust
- The bounded deletion loops and the
WaitForDeletionpolling remain clear and avoid unbounded waits.- Switching to
DeepCopy()for applications and AppProjects before mutating namespace/name prevents subtle bugs caused by reusing the range loop variable.- The new
resetManagedAgentClusterInfohelper, invoked at the end ofCleanUp, ensures the managed agent’s cluster info in Redis is reset between tests, and the choice to log (rather than fail) when Redis is unavailable is appropriate for E2E teardown.Also applies to: 218-266, 294-357, 457-471
agent/agent.go (1)
443-460: EnsurecacheRefreshIntervalis always positive before starting the cluster cache info tickerThe new goroutine that sends initial and periodic cluster cache info updates for both managed and autonomous agents is a good consolidation of behaviour. However,
time.NewTicker(a.cacheRefreshInterval)will panic ifcacheRefreshIntervalis zero or negative, so it’s important that:
a.cacheRefreshIntervalis always initialized to a positive duration via options or defaults beforeStartis called, or- a defensive check is added here to guard against an uninitialized value.
hack/dev-env/start-e2e.sh (1)
19-48: Consider removing unused helper function.The
getExternalLoadBalancerIPfunction is no longer called after switching to localhost-based addresses. While it may have future utility, removing unused code improves maintainability.Apply this diff to remove the unused function:
-# getExternalLoadBalancerIP will set EXTERNAL_IP with the load balancer hostname from the specified Service -getExternalLoadBalancerIP() { - SERVICE_NAME=$1 - - MAX_ATTEMPTS=120 - - for ((i=1; i<=MAX_ATTEMPTS; i++)); do - - echo "" - EXTERNAL_IP=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].ip}') - EXTERNAL_HOST=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') - - if [ -n "$EXTERNAL_IP" ]; then - echo "External IP for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_IP" - break - elif [ -n "$EXTERNAL_HOST" ]; then - echo "External host for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_HOST" - EXTERNAL_IP=$EXTERNAL_HOST - break - else - echo "External IP for $SERVICE_NAME on $K8S_CONTEXT not yet available, attempting again in 5 seconds..." - sleep 5 - fi - done - - if [ $i -gt $MAX_ATTEMPTS ]; then - echo "Failed to obtain external IP after $MAX_ATTEMPTS attempts." - exit 1 - fi - -} -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/auth.go(1 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/e2e/sync_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- principal/tracker/tracking.go
- test/e2e/sync_test.go
- test/run-e2e.sh
- test/e2e/redis_proxy_test.go
- principal/resource.go
- test/e2e/clusterinfo_test.go
- install/helm-repo/argocd-agent-agent/values.schema.json
- cmd/argocd-agent/principal.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
Makefiletest/e2e/rp_test.gohack/dev-env/Procfile.e2ehack/dev-env/start-agent-managed.shhack/dev-env/start-e2e.shhack/dev-env/configure-argocd-redis-tls.shtest/e2e/README.md
🧬 Code graph analysis (7)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
GetArgoCDServerEndpoint(315-337)GetInitialAdminSecret(302-313)NewArgoClient(52-66)
principal/listen.go (4)
internal/logging/logging.go (2)
Info(295-297)Warn(300-302)pkg/api/grpc/authapi/auth_grpc.pb.go (1)
RegisterAuthenticationServer(83-85)pkg/api/grpc/versionapi/version_grpc.pb.go (1)
RegisterVersionServer(69-71)pkg/api/grpc/eventstreamapi/eventstream_grpc.pb.go (1)
RegisterEventStreamServer(144-146)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
agent/agent.go (2)
internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(42-56)AgentManagedName(37-37)AgentClusterServerURL(39-39)
principal/auth.go (1)
internal/logging/logging.go (2)
Trace(285-287)Warn(300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
test/e2e/README.md
32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Build & cache Go code
- GitHub Check: Run unit tests
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (22)
Makefile (1)
59-79: Redis TLS setup insetup-e2elooks consistent with the E2E storyThe new block cleanly wires the TLS cert generation and per-cluster Redis/Argo CD TLS configuration into
make setup-e2e, matching the “TLS required for E2E” design. No functional issues from the Makefile side.docs/getting-started/kubernetes/index.md (1)
159-230: Redis TLS steps in the Kubernetes getting-started guide are consistent and clearThe new Sections 2.4 and 4.4 cleanly walk through:
- Generating CA/server certs,
- Creating the shared
argocd-redis-tlssecret,- Patching
argocd-redisto enable TLS-only,- Verifying with
redis-cli --tls,and mirror the control-plane vs workload-cluster story correctly. The cross-link to the dedicated “Redis TLS Configuration” doc ties it together nicely.
No issues from a correctness or usability standpoint.
Also applies to: 337-381, 646-646
test/e2e/README.md (1)
21-107: E2E flow and Redis TLS documentation are coherent and aligned with the scriptsThe new multi‑step flow (setup, optional reverse tunnel, start principal/agents, run tests) together with the Redis TLS section matches the dev scripts and TLS wiring (make setup-e2e, reverse-tunnel, Redis TLS cert/config scripts, and port‑forwards). The note about
InsecureSkipVerifybeing limited to the test fixture while agents/principal do full TLS validation is clear and appropriate for E2E usage.hack/dev-env/Procfile.e2e (1)
1-7: Procfile-based Redis and Argo CD port-forwards look consistent with the E2E flowThe added pf-* entries correctly establish Redis port-forwards for the three vclusters and an Argo CD server port-forward, and the staggered startup of principal and agents fits with the new TLS/localhost-based Redis configuration. Note that the principal script also starts a Redis port-forward by default; see the comment on
hack/dev-env/start-principal.shto avoid double port-forwarding onlocalhost:6380.test/e2e/rp_test.go (1)
161-245: Fixture helpers for Argo endpoint and admin secret are a solid cleanupSwitching to
fixture.GetArgoCDServerEndpointandfixture.GetInitialAdminSecret, and then building the client viafixture.NewArgoClient, removes duplicate K8s plumbing in the tests and aligns them with the TLS-aware endpoint discovery used elsewhere. The resulting Argo login and application flows are unchanged and easier to maintain.Also applies to: 294-307
hack/dev-env/gen-redis-tls-certs.sh (1)
1-150: Redis TLS cert generation is robust and idempotentThe script cleanly generates a CA plus per‑role Redis certificates with appropriate SANs (including localhost/loopback), skips regeneration when artifacts exist, conditionally adds the local IP, and cleans up temporary files. With
set -eand no stderr suppression on OpenSSL commands, failures will be surfaced instead of silently ignored.hack/dev-env/start-principal.sh (1)
58-76: Redis TLS argument construction for principal is consistent with the dev CA layoutThe TLS detection block correctly checks for the proxy cert/key and CA in
creds/redis-tlsand, when present, passes them via--redis-tls-enabled,--redis-server-tls-cert/--redis-server-tls-key, and--redis-upstream-ca-path. This lines up with the certs generated bygen-redis-tls-certs.shand ensures proper validation for both localhost port‑forward and reverse‑tunnel scenarios.agent/agent.go (1)
17-24: Cluster cache Redis client now correctly honors Redis TLS settingsThe new
clusterCacheTLSConfigconstruction mirrors the Redis proxy’s TLS behaviour:
- When Redis TLS is enabled, the cluster cache client enforces TLS 1.2+.
- If
redisTLSInsecureis set, a clear warning is logged and certificate verification is disabled.- Otherwise, when
redisTLSCAPathis provided, the CA bundle is loaded and set asRootCAs, and any read/parse errors fail agent construction with an explicit error.Passing this TLS config into
cluster.NewClusterCacheInstanceensures the cluster cache uses the same secure Redis connection settings as the proxy.Also applies to: 323-346
internal/argocd/cluster/cluster.go (2)
176-192: LGTM! TLS configuration properly integrated.The TLS configuration is correctly passed through to the Redis client options. The
tlsConfigparameter allows for optional TLS (nil is acceptable for non-TLS connections), and the Redis client will handle nil TLSConfig appropriately.
135-142: Connection state initialization is appropriate.The fallback initialization when no existing ConnectionState is present correctly sets a Successful status with a timestamp. This ensures the cluster appears connected when cache stats are first received from an agent, which is the expected behavior.
hack/dev-env/start-e2e.sh (1)
50-59: Well-structured E2E test configuration.The localhost-based Redis addresses are appropriate for TLS certificate validation in E2E tests, and the REDIS_PASSWORD retrieval is now correctly split into separate assignment and export to avoid masking kubectl errors (addressing the previous shellcheck warning).
test/e2e/fixture/cluster.go (4)
182-201: TLS configuration appropriate for E2E tests.The use of
InsecureSkipVerify: trueis intentional for E2E tests to accommodate dynamic LoadBalancer addresses, as noted in the PR description. The TLS encryption is still enabled, which is the primary security goal.
206-217: Well-tuned Redis client configuration for E2E.The generous timeouts, connection pool sizing, and retry configuration are appropriate for E2E test scenarios with port-forwarding latency. The settings balance test reliability with resource usage.
227-267: Effective connection leak prevention.The caching mechanism with mutex protection prevents connection leaks across test runs. The cache key design (source + address) correctly handles multiple Redis instances, and the
CleanupRedisCachedClientsfunction enables proper test teardown.
154-165: Helpful debugging additions.The added log statements provide useful context for troubleshooting E2E test failures, especially when investigating TLS-enabled Redis connectivity issues.
hack/dev-env/configure-redis-tls.sh (4)
61-66: Certificate validation is complete.All required TLS certificate files are now validated (server certificate, key, and CA), addressing previous review feedback. The error message clearly guides users to run the certificate generation script.
81-122: Excellent transition strategy to prevent SSL errors.Scaling down Argo CD components before enabling Redis TLS prevents connection errors during the transition. Saving replica counts to a ConfigMap for restoration by the
configure-argocd-redis-tls.shscript is a clean approach to maintaining the desired deployment state.
138-196: Robust idempotent patching logic.The script correctly checks for existing volumes and volumeMounts before patching, and handles both empty/null arrays and existing arrays. The JSON patch operations are well-structured with proper error handling.
199-215: Verify TLS configuration flags for Redis.The Redis TLS configuration uses
--tls-auth-clients no(line 212), which disables client certificate authentication. Confirm this is intentional for the E2E environment and aligns with how Redis clients in the codebase are configured.principal/redisproxy/redisproxy.go (3)
130-154: TLS configuration builder handles both cert sources correctly.The function properly supports both file-based and in-memory certificates with appropriate error handling. Setting MinVersion to TLS 1.2 is a reasonable baseline for compatibility while maintaining security.
157-183: TLS listener setup is well-structured.The TLS-enabled listener creation with fallback to plain TCP is correctly implemented. The logging clearly indicates whether TLS is enabled, which aids troubleshooting.
836-897: Upstream TLS connection handling is thorough.The method properly:
- Establishes the TCP connection first
- Conditionally wraps with TLS based on configuration
- Handles CA from pool, CA from file, and insecure mode
- Extracts SNI correctly from the address
- Performs the TLS handshake
- Cleans up the connection on errors
The warning log for InsecureSkipVerify (Line 859) is good practice.
81c3402 to
4a2ee37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
install/helm-repo/argocd-agent-agent/values.schema.json (1)
302-330: Type inconsistency persists:redisTLS.enabled/insecurevsnetworkPolicy.enableduse different schema patterns.The
redisTLS.enabledandredisTLS.insecurefields useanyOfwith string enum["true", "false"]plus boolean type, whilenetworkPolicy.enableduses a simple boolean type. This creates inconsistency despite a prior review comment marking this as addressed.Recommendation: Either standardize all boolean flags to use
type: "boolean"for clarity, or document why string enums are needed (likely for environment variable compatibility in templates). The pattern already exists fortlsClientInSecure(line 152–158), so the choice should be applied consistently across all similar fields.Also applies to: 337-340
cmd/argocd-agent/principal.go (1)
272-287: Incomplete mutual exclusivity validation for upstream TLS modes.The validation at lines 273-275 only checks
--redis-upstream-tls-insecureagainst--redis-upstream-ca-path, but there are three mutually exclusive upstream TLS modes:
--redis-upstream-tls-insecure(skip verification)--redis-upstream-ca-path(CA from file)--redis-upstream-ca-secret-name(CA from secret — has default value)If a user specifies
--redis-upstream-tls-insecure=truewithout also explicitly setting the CA secret name to empty, the insecure mode silently wins over the default secret. This behavior may be intentional, but it differs from the explicit validation done for insecure vs CA path.Consider whether the current validation is sufficient for your use case, or if you need to also check for explicit user-provided
--redis-upstream-ca-secret-namevalues that conflict with insecure mode.hack/dev-env/configure-argocd-redis-tls.sh (1)
228-231: Replica guard logic still has operator precedence issue.The shell operator precedence means
[ "$X" = "0" ] || [ -z "$X" ] && X="1"is parsed ascond1 || (cond2 && assign). WhenREPO_SERVER_REPLICAS="0", the first test succeeds and short-circuits, so the assignment never runs.Apply this fix:
# Ensure we have at least 1 replica -[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1" -[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1" -[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1" +if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then + REPO_SERVER_REPLICAS="1" +fi +if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then + CONTROLLER_REPLICAS="1" +fi +if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then + SERVER_REPLICAS="1" +fihack/dev-env/start-principal.sh (1)
23-29: Defaulting Redis address and delegating port-forward to Procfile is correctUsing
localhost:6380as the defaultARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESSand leaving the actual port‑forward to Procfile (or manualkubectl port-forward) cleanly resolves the earlier conflict and keeps this script focused on principal startup.principal/redisproxy/redisproxy.go (1)
836-897: Avoid silently downgrading upstream Redis to plaintext when server TLS is enabledWith the current condition:
if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) { // wrap conn in TLS }if the proxy server has TLS enabled but no upstream TLS config is provided, the upstream connection stays unencrypted. That’s a surprising and weaker posture for a “Redis TLS by default” setup, and can leak data in‑cluster while clients believe they’re on a fully‑TLS path.
Recommend at least logging a clear warning when
rp.tlsEnabledis true but no upstream TLS config is present, and strongly consider enforcing TLS (e.g., treat that configuration as an error) so misconfiguration is caught early.For example:
hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure if rp.tlsEnabled && !hasUpstreamTLSConfig { logCtx.Warn("Redis proxy TLS is enabled but no upstream Redis TLS configuration is set; upstream traffic will be plaintext") } if rp.tlsEnabled && hasUpstreamTLSConfig { // current TLS wrapping logic }
🧹 Nitpick comments (11)
principal/auth.go (1)
154-165: LGTM! Observability improvements for auth flow.The trace logging additions improve visibility into the authentication interceptor's decision path. The emoji markers (🔵🟢🟡🔴) provide visual cues for different paths, and the log levels are appropriate (Trace for flow, Warn for failures).
Note: While these changes are orthogonal to the main PR objective (Redis TLS), they're valuable observability improvements.
Optional: Consider whether emoji markers in logs might cause issues with log aggregation or parsing systems in your environment. If so, you could replace them with text prefixes like
[RECV],[NOAUTH],[AUTH_REQ],[AUTH_FAIL]. However, since these are Trace-level logs (typically disabled in production), the risk is minimal.principal/listen.go (2)
174-199: Normalize WebSocket / gRPC startup & shutdown loggingThe added logs help clarify which mode is used, but there are a couple of polish points:
- Log messages have leading spaces (
" WebSocket is ENABLED..."," gRPC server.Serve() exited"), which will look odd and make grepping harder.- The emoji in Line 174 may be inconsistent with the rest of the project’s logging style.
- In the WebSocket branch you now log startup (Line 186) but not shutdown, while in the gRPC branch you log both startup and exit (Lines 194–197). For symmetry and debugging, consider adding a
WithError(err)log afterServeTLSreturns as well, and possibly downgrading the exit log toDebugor only warning on unexpected errors.These are non-blocking, but tightening them up would keep logs cleaner and more consistent.
224-231: Reassess verbosity and level of new gRPC service registration logsThe per-service Info logs make startup more transparent, but four Info-level lines here may become noisy in larger deployments:
- Consider either collapsing into a single Info message listing all registered services, or moving the detailed per-service logs to Debug.
- The initial “Registering gRPC services on principal” message (Line 224) is useful; the three “... registered successfully” lines could be demoted if log volume is a concern.
No functional issues, just a suggestion to balance observability vs log noise.
hack/dev-env/start-e2e.sh (1)
19-48: Consider removing unusedgetExternalLoadBalancerIPfunction.This function is no longer called in the script since the switch to static localhost addresses. Dead code increases maintenance burden.
-# getExternalLoadBalancerIP will set EXTERNAL_IP with the load balancer hostname from the specified Service -getExternalLoadBalancerIP() { - SERVICE_NAME=$1 - - MAX_ATTEMPTS=120 - - for ((i=1; i<=MAX_ATTEMPTS; i++)); do - - echo "" - EXTERNAL_IP=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].ip}') - EXTERNAL_HOST=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') - - if [ -n "$EXTERNAL_IP" ]; then - echo "External IP for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_IP" - break - elif [ -n "$EXTERNAL_HOST" ]; then - echo "External host for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_HOST" - EXTERNAL_IP=$EXTERNAL_HOST - break - else - echo "External IP for $SERVICE_NAME on $K8S_CONTEXT not yet available, attempting again in 5 seconds..." - sleep 5 - fi - done - - if [ $i -gt $MAX_ATTEMPTS ]; then - echo "Failed to obtain external IP after $MAX_ATTEMPTS attempts." - exit 1 - fi - -}hack/dev-env/configure-argocd-redis-tls.sh (1)
29-31: Consider using--contextflag instead of switching global context.Using
kubectl config use-contextmodifies the user's kubeconfig globally, which could cause issues if the script is interrupted or if parallel operations are running. Consider usingkubectl --context=${CONTEXT}for each command instead.-# Switch context -echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +# Use context flag for all kubectl commands instead of switching globally +KUBECTL="kubectl --context=${CONTEXT}"Then replace all
kubectlcalls with${KUBECTL}.test/e2e/fixture/cluster.go (1)
259-267:CleanupRedisCachedClientsshould explicitly close Redis connections.The cleanup function only clears the map and relies on garbage collection. Redis clients should be explicitly closed to release connections immediately and avoid potential resource leaks during test suite execution.
// CleanupRedisCachedClients closes all cached Redis clients (should be called at end of test suite) func CleanupRedisCachedClients() { cachedRedisClientMutex.Lock() defer cachedRedisClientMutex.Unlock() fmt.Printf("Cleaning up %d cached Redis clients\n", len(cachedRedisClients)) + // Note: appstatecache.Cache doesn't expose Close() method, so we rely on GC + // If connection leaks become an issue, consider storing the underlying redis.Client + // separately to enable explicit Close() calls // Clear the cache map - connections will be garbage collected cachedRedisClients = make(map[string]*appstatecache.Cache) }Alternatively, if the underlying
redis.Clientcan be stored separately, implement explicit closure:// Store both cache and client for proper cleanup type cachedRedisEntry struct { cache *appstatecache.Cache client *redis.Client }principal/redisproxy/redisproxy.go (1)
65-154: TLS server configuration looks sound; consider preloading CA if needed laterThe added TLS fields and
createServerTLSConfigcorrectly handle both file-based and in‑memory cert+key, and enforce TLS 1.2+. If this proxy ever becomes connection‑heavy, you might later consider preloading / reusing cert material (rather than rebuildingtls.Certificatefrom fields on each start), but it’s not required for current usage.test/e2e/fixture/fixture.go (2)
229-291: Treating cleanup failures as warnings is appropriate for E2E testsThe new
fmt.Printf("Warning: ...")paths during application/AppProject cleanup ensure teardown issues (especially on remote/slow clusters) don’t cascade into hard test failures. That’s a good trade‑off for E2E stability.Also applies to: 269-291, 295-357, 372-373
457-471: GuardresetManagedAgentClusterInfoagainst nilclusterDetails
resetManagedAgentClusterInfoassumesclusterDetailsis non‑nil. That’s true when called viaBaseSuite, butCleanUpis exported and could be invoked with a nil pointer elsewhere, leading to a panic whengetCachedCacheInstancedereferences it.Consider a light guard:
func resetManagedAgentClusterInfo(clusterDetails *ClusterDetails) error { if clusterDetails == nil { return nil } if err := getCachedCacheInstance(AgentManagedName, clusterDetails). SetClusterInfo(AgentClusterServerURL, &argoapp.ClusterInfo{}); err != nil { return fmt.Errorf("resetManagedAgentClusterInfo: %w", err) } return nil }Optionally, if you have a
CleanupRedisCachedClientshelper, calling it fromCleanUpafterresetManagedAgentClusterInfowould fully reset Redis client state between tests.hack/dev-env/start-agent-managed.sh (1)
37-62: Consider failing fast when Redis TLS certs are missing in TLS-only setupsThe script correctly enables Redis TLS when
creds/redis-tls/ca.crtexists and wires--redis-tls-enabled/--redis-tls-ca-pathinto the agent command, with a sensible defaultlocalhost:6381address for the port‑forward.Given the rest of the dev/E2E setup now configures Redis as TLS‑only by default, the
"running without TLS"fallback path is likely to just produce connection errors later. You might consider turning the “certificates not found” case into a hard failure (or at least a stronger warning) in the e2e flow so misconfigured environments are surfaced early.Also applies to: 48-62, 63-75, 76-83
hack/dev-env/start-principal.sh (1)
44-62: TLS wiring for principal looks good; consider stricter handling when certs are absentThe detection of
redis-proxy.{crt,key}andca.crtundercreds/redis-tlsand construction of:--redis-tls-enabled=true --redis-server-tls-cert=... --redis-server-tls-key=... --redis-upstream-ca-path=...is consistent with the documented principal Redis TLS options.
Similar to the managed-agent script, now that dev/E2E flows configure Redis as TLS‑only by default, you might want to treat the “certificates not found, running without TLS” branch as a hard failure (or at least a very loud warning) so misconfigured environments don’t just fail later with opaque connection errors.
Also applies to: 64-71
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/auth.go(1 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/e2e/sync_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- hack/dev-env/gen-redis-tls-certs.sh
- principal/resource.go
- test/e2e/rp_test.go
- test/run-e2e.sh
- test/e2e/fixture/argoclient.go
- hack/dev-env/start-agent-autonomous.sh
- test/e2e/clusterinfo_test.go
- docs/getting-started/kubernetes/index.md
- Makefile
- test/e2e/redis_proxy_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/e2e/README.mdhack/dev-env/configure-argocd-redis-tls.shhack/dev-env/Procfile.e2ehack/dev-env/start-e2e.shhack/dev-env/start-agent-managed.sh
🧬 Code graph analysis (4)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/options.go (6)
WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSInsecure(543-548)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)internal/logging/logging.go (1)
Warn(300-302)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
principal/auth.go (1)
internal/logging/logging.go (2)
Trace(285-287)Warn(300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (21)
install/helm-repo/argocd-agent-agent/values.schema.json (1)
332-382: Schema structure fornetworkPolicy.redisis well-formed and permissive for selectors.The nested object structure correctly allows
additionalProperties: trueonagentSelectorandredisSelector, which is appropriate for Kubernetes label selectors that may include custom labels beyond the documentedapp.kubernetes.io/namekey.cmd/argocd-agent/principal.go (3)
89-98: LGTM!The Redis TLS configuration variables are well-organized with a clear comment header, follow consistent naming conventions, and use appropriate types.
420-422: Verify: Redis TLS default value vs PR objective.The PR title states "redis TLS encryption enabled by default for all connections", but
--redis-tls-enableddefaults tofalsehere.If the intent is for Redis TLS to be enabled by default in production, verify that this is handled in the Helm charts or Kubernetes manifests rather than the CLI defaults. If the CLI should also default to TLS enabled, this would need to change to
true.
471-471: Significant timeout increase noted.The secret retrieval timeout increased from 2 seconds to 30 seconds. This improves reliability for slow Kubernetes API responses, but extends the startup failure time if secrets are misconfigured or unavailable. The tradeoff seems reasonable for production environments.
test/e2e/sync_test.go (1)
371-373: Pre-sync hook Job name alignment looks goodUpdating the hook Job name to
"before"in both tests keeps the termination assertions intact, assuming the manifest intest/data/pre-syncuses the same name. No issues from the test logic perspective.Also applies to: 465-468
principal/tracker/tracking.go (1)
75-78: Appropriate concurrency fix for request-response pattern.The buffered channel (capacity 1) correctly prevents deadlock when the sender and receiver operate asynchronously in goroutines. This is the standard pattern for 1:1 request-response scenarios where exactly one response is expected per tracked request.
Verify these assumptions in code review:
- Each tracked event receives at most one response (no multiple sends to the same channel)
StopTrackingis always called to close the channel and prevent resource leaks- The sender handles scenarios where the channel might be closed before sending
agent/agent.go (2)
323-343: TLS configuration for cluster cache looks well-structured.The TLS configuration properly:
- Sets minimum TLS version to 1.2
- Logs a warning for insecure mode (addressing the previous review comment)
- Loads and validates CA certificates when a path is provided
- Returns clear error messages on failure
445-460: Improved startup logic for cluster cache info updates.Sending an initial update immediately on startup (before waiting for the first ticker interval) improves the time-to-first-sync. The unified code path for both managed and autonomous modes simplifies maintenance.
hack/dev-env/start-e2e.sh (1)
50-59: Static localhost addresses and fixed REDIS_PASSWORD handling look good.The switch to localhost-based addresses for TLS certificate validation is appropriate for E2E tests. The
REDIS_PASSWORDretrieval is now correctly separated into declaration and export (addressing the previous shellcheck warning).internal/argocd/cluster/cluster.go (2)
135-142: Good defensive initialization of ConnectionState.Initializing
ConnectionStatewhen it doesn't exist yet prevents nil-related issues and provides meaningful status for newly connected agents. The timestamp usestime.Now()which is appropriate since this represents the moment the cache stats update was received.
176-184: TLS configuration properly wired to Redis client.The
tlsConfigparameter is correctly passed through to the Redis client options. This follows the pattern established in the relevant code snippet and integrates cleanly with the existing cache creation logic.hack/dev-env/configure-argocd-redis-tls.sh (1)
56-70: Idempotency checks and patching pattern looks reasonable for E2E/dev use.The script properly checks for existing configuration before applying patches, preventing duplicate volumes/mounts/args. The
2>/dev/null || truepattern handles edge cases gracefully for a development script.test/e2e/fixture/cluster.go (3)
206-217: Generous timeouts and connection pool settings are appropriate for E2E tests.The extended timeouts (10s dial, 30s read) and retry configuration help handle port-forward latency and test environment variability. The pool size of 10 with min/max idle settings is reasonable for concurrent test load.
180-201: InsecureSkipVerify is acceptable for E2E tests with appropriate comment.The comment clearly documents that this is for E2E test simplicity. For production code, CA certificate validation would be required (which is implemented elsewhere in this PR).
320-326: Good use of environment variable overrides for local development.The
MANAGED_AGENT_REDIS_ADDRandARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESSenvironment variables allow developers to use port-forwarding with localhost addresses while the production code uses service discovery. This aligns with the static localhost addresses exported instart-e2e.sh.Also applies to: 380-386
principal/redisproxy/redisproxy.go (2)
159-183: TLS listener startup branching is correctThe
Startmethod cleanly switches betweentls.Listenand plainnet.Listenbased onrp.tlsEnabled, with appropriate error logging and success messages; this matches the new TLS configuration surface.
221-270: Connection handling change to use method receiver is fineSwitching
handleConnectionto callrp.establishConnectionToPrincipalRedis(method receiver) instead of a standalone function keeps Redis proxy state encapsulated without altering behavior.test/e2e/fixture/fixture.go (1)
109-155: Bounded deletion waits improve test robustnessCapping
EnsureDeletionandWaitForDeletionat 120×1s iterations gives deterministic test-time behavior and avoids potential infinite waits on stuck resources; the structure of the retry loops looks correct.Also applies to: 160-171
test/e2e/README.md (1)
21-108: E2E flow and Redis TLS documentation are clear and consistentThe restructured steps (environment setup, optional reverse tunnel, start processes, run tests) and the explicit Redis TLS section align well with the new scripts and Procfile; the notes about
InsecureSkipVerifybeing test‑fixture only are also clear.hack/dev-env/Procfile.e2e (1)
1-7: Centralizing port-forwards in Procfile avoids conflictsHaving
pf-*entries own the Redis and argocd-server port‑forwards and starting principal/agents afterward resolves the previous “double port‑forward on 6380” issue and gives a clear, reproducible startup model formake start-e2e.hack/dev-env/configure-redis-tls.sh (1)
1-246: Redis TLS configuration script is robust and idempotent for dev/E2E useThe script cleanly validates cert material, preserves/restores kube context, scales down dependent Argo CD components, creates the
argocd-redis-tlssecret, patches the deployment with volumes/mounts, and replaces args to enable TLS‑only on 6379. The volume/mount existence checks make it safe to re‑run, which is valuable during iterative E2E setup.
4a2ee37 to
dd9cf85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (1)
docs/configuration/redis-tls.md (1)
149-156: Tag remaining fenced blocks with a language to satisfy markdownlint.As flagged in a previous review, these code blocks still need language tags. Apply
textto the diagram and script output blocks.- **How the tunnel works:** - ``` +**How the tunnel works:** + ```text Argo CD Server (remote vcluster) → rathole Deployment (remote) → rathole Container (local Mac) → Principal process (local Mac) - ``` + ```The same fix applies to lines 475, 486, and 504.
🧹 Nitpick comments (3)
test/e2e/redis_proxy_test.go (1)
120-124: The hardcoded sleep is a pragmatic workaround, but consider documenting the root cause.The 5-second delay to wait for Redis SUBSCRIBE propagation is a reasonable workaround for the race condition. The comment explains the issue well.
If this race condition is specific to the test setup, it might be worth adding a TODO to investigate whether the subscription can be verified more deterministically in the future:
// Wait for SSE stream to fully establish and Redis SUBSCRIBE to propagate // This prevents a race condition where the pod is deleted before the subscription is active + // TODO: Consider implementing a more deterministic check for subscription readiness t.Log("Waiting for SSE stream to fully establish...") time.Sleep(5 * time.Second)principal/redisproxy/redisproxy.go (1)
853-894: Consider logging a warning when server TLS is enabled but upstream TLS is not configured.The upstream TLS connection is only established when
rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure)(line 853). This means if the Redis proxy server has TLS enabled for incoming connections but no upstream TLS configuration is provided, it will connect to the principal's Redis over plain TCP within the cluster.While this may be intentional for some deployment scenarios (e.g., trusting internal cluster network), it creates an inconsistent security posture that operators should be aware of.
Consider adding a warning log when this configuration mismatch occurs:
+ // Warn if server TLS is enabled but no upstream TLS is configured + hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure + if rp.tlsEnabled && !hasUpstreamTLSConfig { + logCtx.Warn("Redis proxy server has TLS enabled for incoming connections, but upstream connection to principal Redis will be unencrypted") + } + // If TLS is enabled for upstream, wrap the connection with TLS - if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) { + if rp.tlsEnabled && hasUpstreamTLSConfig {This improves operator awareness without enforcing a specific security policy.
hack/dev-env/configure-redis-tls.sh (1)
199-215: Add error handling to args patch.The args patch lacks error checking (
||exit 1) present in other patches (lines 153, 162, 183, 192). If this critical patch fails, the deployment won't have TLS configured, but the script reports success.Add error handling to the args patch:
kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[ { "op": "replace", "path": "/spec/template/spec/containers/0/args", "value": [ "--save", "", "--appendonly", "no", "--requirepass", "$(REDIS_PASSWORD)", "--tls-port", "6379", "--port", "0", "--tls-cert-file", "/app/tls/tls.crt", "--tls-key-file", "/app/tls/tls.key", "--tls-ca-cert-file", "/app/tls/ca.crt", "--tls-auth-clients", "no" ] } -]' +]' || { echo "Failed to patch Redis args"; exit 1; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/e2e/sync_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- principal/resource.go
- principal/listen.go
- install/helm-repo/argocd-agent-agent/values.schema.json
- hack/dev-env/start-principal.sh
- hack/dev-env/start-agent-autonomous.sh
- hack/dev-env/configure-argocd-redis-tls.sh
- test/e2e/rp_test.go
- test/e2e/fixture/argoclient.go
- test/e2e/clusterinfo_test.go
- Makefile
- hack/dev-env/start-e2e.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/e2e/README.mdtest/run-e2e.shhack/dev-env/start-agent-managed.shhack/dev-env/Procfile.e2e
🧬 Code graph analysis (5)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
Event(112-115)internal/logging/logfields/logfields.go (1)
Event(34-34)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/options.go (6)
WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSInsecure(543-548)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
agent/agent.go (2)
internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
principal/redisproxy/redisproxy.go (3)
internal/logging/logfields/logfields.go (3)
Config(127-127)Error(101-101)Client(37-37)principal/listen.go (1)
Listener(54-60)internal/logging/logging.go (2)
Error(305-307)Warn(300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md
229-229: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Lint Go code
- GitHub Check: Run unit tests
- GitHub Check: Run end-to-end tests
- GitHub Check: Build & cache Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (42)
test/run-e2e.sh (2)
104-115: Verify environment variable names match test code expectations.Environment variables are exported for local development with inconsistent naming conventions:
REDIS_SERVER_ADDRESSvs.REDIS_ADDRsuffixes, and inconsistentAGENTprefixes. Confirm these names match what the test code actually consumes.
81-102: Thelsof -i :6380 -i :6381 -i :6382syntax is standard and correct. Multiple-iflags are ORed by default in lsof (matching any of the specified ports), which is the intended behavior—the script warns if none of the three ports are detected. The proposed fix using&&operators would incorrectly require all three ports to be listening, changing the detection logic. No action needed.principal/tracker/tracking.go (1)
75-78: Verify that buffered capacity 1 is sufficient and handles edge cases correctly.The change from an unbuffered to a buffered channel addresses potential deadlocks when the sender runs before the receiver is ready. However, verify:
- One event per request: Confirm that each tracked request receives exactly one response event, ensuring capacity 1 is sufficient.
- Closed channel handling: Ensure
processRedisEventResponsedoesn't send on a closed channel ifStopTrackingis called prematurely (would cause a panic).- Abandoned channel cleanup: Verify that
sendSynchronousRedisMessageToAgentalways consumes from the channel or that proper timeout/cleanup mechanisms exist to prevent goroutine leaks.test/e2e/sync_test.go (1)
371-371: Verify the Job name "before" matches the test data definition.The pre-sync hook Job name has been updated to "before" in both
Test_TerminateOperationManaged(line 371) andTest_TerminateOperationAutonomous(line 466). Confirm this name matches the actual Job resource defined intest/data/pre-sync, since the tests use this name to verify Job cleanup within a 120-second timeout—a mismatch will cause test failures.Also note: This change appears unrelated to the PR's Redis TLS enablement objectives and may warrant a separate commit for clarity.
Also applies to: 466-466
hack/dev-env/Procfile.e2e (1)
1-7: LGTM! Well-structured E2E process configuration.The port-forward setup correctly maps Redis ports for each vcluster (6380-6382 → 6379) and the argocd-server port (8444 → 443). The sleep delays appropriately sequence the startup to allow port-forwards to establish before services start.
docs/getting-started/kubernetes/index.md (2)
159-230: Clear and comprehensive Redis TLS setup documentation.The certificate generation steps are secure (4096-bit RSA keys, appropriate SANs). The warning about Redis TLS being required and the step-by-step kubectl patches are well-documented.
337-381: Good approach to workload cluster TLS setup.Using distinct file names (
redis-workload.key/crt) while reusing the same CA is a clean pattern that prevents confusion. The instructions maintain consistency with the control-plane setup in Section 2.4.docs/configuration/redis-tls.md (2)
677-697: Solid security best practices section.The recommendations for strong keys (4096-bit RSA), certificate rotation, and the explicit warning against insecure options in production are valuable. Good callout about using
readOnly: truefor volume mounts.
1-50: Comprehensive and well-organized Redis TLS documentation.The documentation provides clear guidance from quick start through production deployment, with thorough troubleshooting. The architecture diagram effectively illustrates the TLS configuration points.
test/e2e/README.md (2)
21-108: Clear and practical E2E test documentation updates.The multi-terminal workflow is well-explained, and the distinction between local/remote cluster setups helps users understand when reverse tunnel is needed. The Redis TLS section appropriately documents that TLS is mandatory and provides manual reconfiguration steps.
107-108: Good clarification on InsecureSkipVerify usage.The documentation correctly explains that
InsecureSkipVerify: trueis used only in test fixtures for cross-environment compatibility, while TLS encryption remains fully enabled. This aligns with the PR description's request for feedback on this approach.test/e2e/redis_proxy_test.go (3)
187-208: Improved message draining logic with proper retry semantics.The
messagesDrainedflag correctly tracks whether any messages were processed, and the drain-all-then-retry pattern is more robust than checking one message at a time. This should reduce flaky test failures.
210-238: Good resilience added to ResourceTree verification.Wrapping the ResourceTree call in
Eventuallywith proper error handling addresses transient Redis EOF errors that can occur during TLS connection resets. The 30-second timeout with 2-second intervals provides adequate retries.
586-653: Well-configured HTTP transport for SSE streams.The transport settings are appropriate:
- Buffered channel (100) prevents message loss during processing
Timeout: 0is correct for long-lived SSE connectionsIdleConnTimeout: 300skeeps connections alive for extended test runsInsecureSkipVerify: trueis documented in the E2E README as test-only behaviorinternal/argocd/cluster/cluster.go (3)
18-18: LGTM!The
crypto/tlsimport is necessary for the new TLS configuration parameter added toNewClusterCacheInstance.
135-142: LGTM!The initialization of
ConnectionStatewhen it doesn't exist provides a sensible default when cluster cache stats are received before an explicit connection status update. This improves agent connection tracking.
176-184: LGTM!The TLS configuration is properly integrated into the Redis client initialization. The signature change is consistent with the broader TLS enablement across the codebase.
agent/agent.go (3)
19-23: LGTM!The new imports are necessary for TLS configuration and CA certificate loading from files.
323-343: LGTM!The TLS configuration logic is well-structured:
- Properly handles insecure mode with appropriate warning
- Loads and validates CA certificates from file
- Provides clear error messages on failure
345-349: LGTM!The TLS configuration is correctly passed to the cluster cache instance creation, matching the updated signature.
cmd/argocd-agent/principal.go (3)
90-97: LGTM!The Redis TLS flag variables are well-named and cover all necessary configuration options for both server and upstream TLS.
419-440: LGTM!The CLI flag definitions are comprehensive and follow consistent naming conventions. Environment variable support is properly integrated with sensible defaults.
471-471: LGTM!The increased timeout (30s) for TLS configuration retrieval from Kubernetes is reasonable and aligns with the broader TLS enablement changes.
principal/redisproxy/redisproxy.go (3)
21-27: LGTM!The TLS-related fields are well-structured with clear separation between server (incoming connections) and upstream (outgoing connections) TLS configurations. The comments provide good context.
Also applies to: 65-76
98-154: LGTM!The TLS configuration methods are well-designed:
- Clean public API for both server and upstream TLS
- Proper handling of both file-based and memory-based certificates
- Appropriate error handling and minimum TLS version
157-183: LGTM!The Start() method cleanly handles both TLS and non-TLS modes with appropriate logging and error handling.
hack/dev-env/start-agent-managed.sh (4)
37-46: LGTM!The Redis TLS detection logic is appropriate for a development script, with helpful guidance when certificates are not found.
48-62: LGTM!The Redis address configuration is well-documented with helpful comments explaining the localhost default and port-forward requirements for TLS validation.
63-74: LGTM!The mTLS certificate extraction properly retrieves client certificates and CA from Kubernetes secrets. The use of temporary files is appropriate for a development script.
76-90: LGTM!The agent startup command properly includes all TLS-related arguments (client certificates, Redis TLS, etc.) in a logical order.
hack/dev-env/gen-redis-tls-certs.sh (4)
14-26: LGTM!The CA generation logic is idempotent and uses strong cryptographic parameters (4096-bit RSA). The conditional generation prevents overwriting existing certificates.
28-58: LGTM!The certificate generation pattern is well-structured and idempotent. The use of extension files for Subject Alternative Names follows modern OpenSSL practices.
67-90: LGTM!The local IP detection and conditional SAN addition is well-handled. The script correctly avoids adding an empty IP entry when local IP detection fails, which was a previously identified issue.
138-150: LGTM!The cleanup of temporary files is appropriate, and the success message provides a helpful summary of all generated certificates.
test/e2e/fixture/fixture.go (4)
109-112: LGTM!The timeout increases (60s → 120s) are appropriate for TLS-enabled E2E tests, which may experience additional latency from TLS handshakes and port-forwarding in the test environment.
Also applies to: 143-143, 160-160
231-240: LGTM!The error handling changes improve test cleanup resilience by continuing cleanup even when individual deletions fail. This is appropriate for test teardown where partial cleanup is preferable to complete failure, and warnings ensure issues are still visible.
Also applies to: 256-265, 277-278, 290-291, 312-324, 344-356, 371-373
235-240: LGTM!The DeepCopy calls properly prevent mutation of loop variables when modifying objects for deletion waiting. This is correct and defensive programming.
Also applies to: 260-265, 316-324, 349-356
457-470: LGTM!The non-fatal cluster info reset is appropriate for test cleanup scenarios where Redis may be unavailable (e.g., port-forward terminated). Using the cached cache instance is consistent with the broader caching pattern introduced in cluster.go.
test/e2e/fixture/cluster.go (4)
19-22: LGTM!The new imports and TLS-enabled flags appropriately extend the test fixture to support TLS-enabled Redis configurations.
Also applies to: 44-51
181-224: LGTM!The TLS configuration with
InsecureSkipVerifyis appropriately documented for E2E tests. The connection tuning parameters (timeouts, pool sizing, retries) are generous and suitable for handling port-forwarding latency in test environments.
226-267: LGTM!The Redis client caching mechanism properly prevents connection leaks in E2E tests:
- Thread-safe with mutex protection
- Cache key includes address for proper isolation
- Exported cleanup function for test suite teardown
- Reasonable reliance on GC for connection cleanup in test code
299-327: LGTM!The Redis configuration retrieval logic is robust:
- Comprehensive fallback chain (LoadBalancer → LoadBalancerIP → ClusterIP)
- TLS always enabled for E2E tests
- Environment variable overrides support local development workflows
- Clear error messages document the fallback chain
Also applies to: 360-387
dd9cf85 to
a349781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
cmd/argocd-agent/principal.go (1)
258-288: Harden upstream Redis TLS mode validation to cover all flag combinationsThe overall Redis TLS wiring looks good: server-side TLS validates cert/key pairing and falls back to a secret, and upstream TLS correctly selects between insecure mode, CA file, and CA secret. However, the validation currently only rejects
--redis-upstream-tls-insecuretogether with--redis-upstream-ca-path, while combinations involving--redis-upstream-ca-secret-name(e.g., CA path + CA secret, or insecure + CA secret) are still allowed and silently prefer one mode.To make configuration failures explicit and align with the stricter server TLS validation, consider validating that at most one upstream TLS mode is set (insecure, CA file, or CA secret):
- // Validate upstream TLS configuration - insecure and CA path are mutually exclusive - if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" { - cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path") - } + // Validate upstream TLS configuration - only one mode may be specified + modesSet := 0 + if redisUpstreamTLSInsecure { + modesSet++ + } + if redisUpstreamTLSCAPath != "" { + modesSet++ + } + if redisUpstreamTLSCASecretName != "" { + modesSet++ + } + if modesSet > 1 { + cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name") + }This keeps the existing selection logic intact while preventing ambiguous configurations from being silently accepted.
hack/dev-env/configure-argocd-redis-tls.sh (1)
309-312: Fix replica guard logic to enforce minimum of 1 replica.The current shell operator precedence causes the replica guards to fail. The expression
[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"is parsed ascond1 || (cond2 && assignment), which means:
- If
REPO_SERVER_REPLICAS="0", the first test succeeds and short-circuits, so the assignment never executes- Only empty values get set to "1"
This contradicts the comment "Ensure we have at least 1 replica" and could lead to components being scaled to 0 replicas during E2E tests.
Apply this diff to fix the logic:
-# Ensure we have at least 1 replica -[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1" -[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1" -[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1" +# Ensure we have at least 1 replica +if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then + REPO_SERVER_REPLICAS="1" +fi +if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then + CONTROLLER_REPLICAS="1" +fi +if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then + SERVER_REPLICAS="1" +fitest/run-e2e.sh (2)
33-45: Validate all required certificate files, not just ca.crt.The script only checks for
ca.crtbut does not validate thatserver.crtandserver.keyexist. If these files are missing, tests will fail downstream with cryptic TLS errors.Apply this diff:
# Check if Redis TLS certificates exist -if [ ! -f "${REDIS_TLS_DIR}/ca.crt" ]; then +if [ ! -f "${REDIS_TLS_DIR}/ca.crt" ] || [ ! -f "${REDIS_TLS_DIR}/server.crt" ] || [ ! -f "${REDIS_TLS_DIR}/server.key" ]; then echo "ERROR: Redis TLS certificates not found!" echo "" echo "Redis TLS is REQUIRED for E2E tests (security requirement)." echo "" echo "Please run the following commands:" echo " ./hack/dev-env/gen-redis-tls-certs.sh"
62-66: Replace text grep with proper JSON parsing for TLS validation.Using
grep -q "tls-port"on JSON output is fragile:
- Text matching can produce false positives if "tls-port" appears in unexpected locations
- Does not confirm the field is in the correct location within the deployment spec
- Provides no debugging information when validation fails
Replace with robust JSON parsing:
- if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | grep -q "tls-port"; then + if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | jq -e '.spec.template.spec.containers[].ports[] | select(.name == "tls-port")' >/dev/null 2>&1; then echo "ERROR: Redis Deployment in ${CONTEXT} is not configured with TLS!" echo "Please run: ./hack/dev-env/configure-redis-tls.sh ${CONTEXT}" exit 1 fi
🧹 Nitpick comments (3)
principal/listen.go (2)
174-196: Inconsistent log formatting and unrelated changes.Several issues with the new logging statements:
- Emoji in production logs (line 174): The "🔧" emoji may not render correctly in all log aggregation systems and is non-standard for production logging.
- Leading whitespace (lines 176, 196): Messages like
" WebSocket is ENABLED"and" gRPC server.Serve() exited"have leading spaces, creating inconsistent formatting compared to other log statements.- Disconnect from PR objectives: This PR is focused on enabling Redis TLS encryption by default, but these changes add WebSocket and gRPC server startup logging, which appears unrelated to the stated objectives.
Apply this diff to fix the formatting issues:
- log().WithField("enableWebSocket", s.enableWebSocket).Info("🔧 Checking if WebSocket is enabled") + log().WithField("enableWebSocket", s.enableWebSocket).Info("Checking if WebSocket is enabled") if s.enableWebSocket { - log().Info(" WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC") + log().Info("WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC") opts := []grpchttp1server.Option{grpchttp1server.PreferGRPCWeb(true)} downgradingHandler := grpchttp1server.CreateDowngradingHandler(s.grpcServer, http.NotFoundHandler(), opts...) @@ -193,7 +193,7 @@ go func() { log().Info("Starting gRPC server.Serve() - server is now accepting connections") err = s.grpcServer.Serve(s.listener.l) - log().WithError(err).Warn(" gRPC server.Serve() exited") + log().WithError(err).Warn("gRPC server.Serve() exited") errch <- err }() }
224-230: Service registration logging appears unrelated to PR objectives.While these logging statements improve observability during service initialization, they appear disconnected from the PR's core objective of enabling Redis TLS encryption by default. Consider whether these observability improvements belong in a separate PR focused on logging enhancements.
test/e2e/fixture/cluster.go (1)
259-267: Consider explicitly closing Redis connections in cleanup.The current cleanup simply recreates the map and relies on garbage collection to close connections. While this works, explicitly closing the underlying Redis clients would provide more deterministic cleanup.
Consider this enhancement:
func CleanupRedisCachedClients() { cachedRedisClientMutex.Lock() defer cachedRedisClientMutex.Unlock() fmt.Printf("Cleaning up %d cached Redis clients\n", len(cachedRedisClients)) - // Clear the cache map - connections will be garbage collected + // Note: Explicit close is not available on appstatecache.Cache + // Connections will be garbage collected when map is cleared cachedRedisClients = make(map[string]*appstatecache.Cache) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/e2e/sync_test.go(2 hunks)test/run-e2e.sh(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- hack/dev-env/start-agent-autonomous.sh
🚧 Files skipped from review as they are similar to previous changes (9)
- principal/resource.go
- test/e2e/clusterinfo_test.go
- principal/tracker/tracking.go
- hack/dev-env/Procfile.e2e
- hack/dev-env/start-principal.sh
- hack/dev-env/gen-redis-tls-certs.sh
- test/e2e/README.md
- hack/dev-env/configure-redis-tls.sh
- test/e2e/sync_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/dev-env/start-e2e.shhack/dev-env/start-agent-managed.shMakefiletest/run-e2e.shtest/e2e/rp_test.go
🧬 Code graph analysis (5)
test/e2e/fixture/fixture.go (2)
internal/backend/interface.go (1)
Namespace(124-127)test/e2e/fixture/cluster.go (2)
AgentManagedName(37-37)AgentClusterServerURL(39-39)
principal/listen.go (1)
internal/logging/logging.go (2)
Info(295-297)Warn(300-302)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
Config(127-127)internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
principal/redisproxy/redisproxy.go (2)
internal/logging/logfields/logfields.go (3)
Config(127-127)Error(101-101)Client(37-37)internal/logging/logging.go (3)
Error(305-307)Warn(300-302)Trace(285-287)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
GetArgoCDServerEndpoint(315-337)GetInitialAdminSecret(302-313)NewArgoClient(52-66)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/getting-started/kubernetes/index.md
229-229: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Lint Go code
- GitHub Check: Build & cache Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (30)
install/helm-repo/argocd-agent-agent/values.schema.json (3)
302-330: Well-structured Redis TLS schema with clear configuration options.The
redisTLSproperty cleanly encapsulates TLS-related configuration (enable/disable, CA path/secret, insecure mode). Descriptions are informative, and theanyOfpattern forenabledandinsecureallows both native boolean values and string representations for broader compatibility.
332-383: NetworkPolicy schema correctly models Kubernetes policy structure.The
networkPolicyproperty provides appropriate control over network policies for the agent and Redis. The nested structure for selectors (agentSelector,redisSelector) properly supports Kubernetes label matching while allowing flexibility viaadditionalProperties: true. The scoping and descriptions are clear.
302-330:The type inconsistency between
redisTLSfields (usinganyOffor string/boolean) andnetworkPolicy.enabled(using boolean only) was already flagged in a prior review and marked as addressed. Skipping duplicate comment; developer can investigate commit 6247404 if clarification is needed on the design rationale.cmd/argocd-agent/principal.go (3)
90-98: Redis TLS option variables are coherent with intended configurationThe newly added Redis TLS variables are clearly named and match the later flag wiring and option usage; no issues here.
419-441: Redis TLS flags and env wiring look consistentFlag names, env variable keys, defaults (notably the shared
argocd-redis-tlssecret), and help strings are all consistent with the new Redis TLS behavior; no issues from a CLI/config surface perspective.
471-471: Increasing resource proxy TLS secret fetch timeout to 30s is reasonableExtending the timeout to 30 seconds for loading proxy TLS material from Kubernetes is a safe change and should better tolerate slow API servers without introducing new correctness risks.
test/e2e/fixture/argoclient.go (1)
316-336: LGTM! Environment variable override improves test flexibility.The addition of the
ARGOCD_SERVER_ADDRESSenvironment variable check before falling back to Kubernetes API queries is a good optimization for test environments. The fallback logic is preserved correctly, ensuring backward compatibility.hack/dev-env/start-agent-managed.sh (1)
37-90: LGTM! Redis TLS and mTLS configuration properly integrated.The script correctly:
- Detects Redis TLS certificates and provides helpful guidance when missing
- Sets appropriate defaults for local development with clear documentation
- Extracts mTLS certificates from Kubernetes secrets
- Passes all necessary TLS arguments to the agent startup
The explicit comments about port-forward requirements are particularly helpful for developers.
docs/configuration/redis-tls.md (1)
1-700: Excellent comprehensive Redis TLS documentation.This documentation provides thorough coverage of Redis TLS configuration including:
- Clear architecture diagrams and TLS configuration points
- Step-by-step quick start for development/testing
- Detailed certificate management guidance
- Complete Kubernetes installation instructions
- Comprehensive troubleshooting section with common issues and solutions
- Security best practices
The documentation is well-structured with a table of contents and clear separation between development/testing and production scenarios.
test/e2e/fixture/fixture.go (2)
109-171: LGTM! Extended timeouts improve resilience for TLS-enabled Redis.The timeout increase from 60 to 120 seconds in
EnsureDeletionandWaitForDeletionis appropriate for TLS-enabled Redis connections, which may have slightly higher latency during connection establishment and teardown.
200-462: LGTM! Non-fatal cleanup warnings prevent cascading test failures.The changes to log warnings instead of returning errors during cleanup are appropriate for handling transient issues like port-forward failures. Key improvements:
- Uses
DeepCopy()to avoid mutating loop variables (lines 235, 260, 317, 350)- Logs warnings for cleanup failures instead of failing the entire test
- Gracefully handles Redis unavailability during cluster info reset (lines 457-461)
This makes the test suite more robust in environments with flaky port-forwards or temporary connectivity issues.
test/e2e/rp_test.go (1)
162-169: LGTM! Refactoring to fixture helpers improves consistency.The refactoring to use
fixture.GetArgoCDServerEndpointandfixture.GetInitialAdminSecreteliminates code duplication and centralizes the logic for retrieving test credentials. This aligns with the environment variable override capability added to the fixture helpers.Also applies to: 295-305
docs/getting-started/kubernetes/index.md (2)
159-229: LGTM! Clear Redis TLS setup instructions with proper warnings.The new section 2.4 provides comprehensive Redis TLS setup guidance:
- Clear warning that Redis TLS is required
- Step-by-step certificate generation with appropriate SANs
- Deployment patching commands
- Verification steps
- Note about automatic TLS configuration in manifests
The instructions are well-structured and include all necessary details for setting up Redis TLS on the control plane.
337-381: LGTM! Workload cluster Redis TLS setup mirrors control plane.Section 4.4 appropriately repeats the Redis TLS setup for workload clusters with a clear note to reuse the same CA from Step 2.4. The instructions maintain consistency with the control plane setup while properly scoping the certificate generation to the workload cluster context.
hack/dev-env/start-e2e.sh (1)
50-61: LGTM! Static localhost addresses enable TLS certificate validation.The use of static
localhostaddresses with fixed ports is appropriate for E2E tests because:
localhostis included in the Redis TLS certificate SANs- Port-forwards (managed by goreman) provide stable local endpoints
- Enables proper TLS certificate validation during tests
The Redis password retrieval correctly separates assignment from export, addressing the previous shellcheck warning.
hack/dev-env/configure-argocd-redis-tls.sh (1)
1-342: Overall script design is solid for E2E Redis TLS configuration.The script provides comprehensive Redis TLS configuration for Argo CD components:
- Idempotent volume and volumeMount additions with existence checks
- Clear error messages and exit codes
- Appropriate handling of different cluster contexts (control-plane vs agent)
- Graceful scaling with rollout status waits
The replica guard logic issue aside, the script structure and approach are well-designed for the E2E test environment.
agent/agent.go (2)
323-343: LGTM! TLS configuration properly implemented.The TLS config construction for the cluster cache correctly handles:
- Warning log when InsecureSkipVerify is enabled (matching principal code)
- CA certificate loading with clear error messages
- Proper certificate pool validation
445-460: LGTM! Immediate startup update improves observability.Sending cluster cache info immediately on startup (before the first ticker interval) ensures the principal receives agent state promptly, improving observability and reducing the delay in initial metrics.
internal/argocd/cluster/cluster.go (2)
175-191: LGTM! TLS integration properly implemented.The signature change to
NewClusterCacheInstanceand TLS configuration wiring are correct. The TLSConfig is properly passed through to the Redis client options.
135-142: LGTM! Defensive initialization of ConnectionState.Initializing ConnectionState when absent ensures cluster info is properly set even when the agent sends cache stats before connection status is explicitly set, preventing nil-reference issues.
test/e2e/fixture/cluster.go (2)
181-201: LGTM! InsecureSkipVerify acceptable for E2E tests.Using
InsecureSkipVerify: truein E2E tests is appropriate given the dynamic LoadBalancer addresses in test environments. The PR objectives explicitly mention this trade-off to retain TLS encryption while accommodating test infrastructure limitations.Based on learnings, test fixtures under test/ directories do not require production-grade security hardening.
298-327: LGTM! Comprehensive address resolution with TLS enforcement.The multi-level fallback approach (LoadBalancer → spec.loadBalancerIP → ClusterIP) handles various deployment scenarios well. TLS enforcement and environment variable overrides for local development are appropriate for E2E tests.
Also applies to: 359-387
principal/redisproxy/redisproxy.go (3)
98-128: LGTM! Clean and composable TLS configuration API.The public setter methods provide a clear and flexible API for configuring TLS. Separating in-memory and file-based certificate configuration is appropriate, and validation is deferred to
createServerTLSConfigwhere it's needed.
130-154: LGTM! Robust TLS configuration with proper error handling.The
createServerTLSConfigmethod correctly handles both file-based and in-memory certificates, with clear error messages and appropriate TLS version constraints (min TLS 1.2).
852-894: LGTM! TLS handshake and upstream connection properly implemented.The upstream TLS implementation correctly:
- Handles InsecureSkipVerify with warning log
- Supports CA certificate pool from memory or file
- Configures SNI based on hostname
- Performs explicit handshake with error handling
Note: A past review suggested warning when server TLS is enabled but upstream TLS is not configured (to avoid unencrypted connections within the cluster). This remains a potential enhancement but is not blocking.
test/run-e2e.sh (1)
81-115: LGTM! macOS development support with helpful warnings.The port-forward detection and environment variable setup provide a good developer experience for local testing. The non-blocking warning allows CI environments (with MetalLB) to proceed normally.
test/e2e/redis_proxy_test.go (4)
120-123: LGTM! Wait period reduces race condition with subscription activation.The 5-second wait after SSE stream establishment allows Redis SUBSCRIBE commands to fully propagate before the pod is deleted, reducing race conditions in the test.
Also applies to: 326-329
188-208: LGTM! Message draining improves test reliability.The enhanced message-handling logic drains all available messages before retrying, preventing false negatives when messages arrive in bursts. The extended 120-second timeout accommodates realistic network latency.
Also applies to: 407-427
211-237: LGTM! Retry logic handles transient Redis connection issues.Wrapping
ResourceTreecalls inEventuallyblocks with logging handles transient EOF errors and Redis connection issues gracefully, improving test stability in distributed environments.Also applies to: 430-456
642-653: LGTM! HTTP client configuration optimized for SSE streams.The HTTP transport settings (no overall timeout, extended idle timeout, connection pooling) are appropriate for long-lived SSE streams.
InsecureSkipVerify: trueis acceptable for E2E tests.
a349781 to
6b246bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
hack/dev-env/configure-argocd-redis-tls.sh (1)
301-311: Replica guard still fails to enforce minimum of 1 replicaThe guard:
# Ensure we have at least 1 replica [ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1" [ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1" [ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"is parsed as
cond1 || (cond2 && assign), so if a value is"0"the first test succeeds and the assignment never runs. That means components can be scaled back to 0 replicas, contradicting the comment and breaking the “scale back up after TLS config” intent.Use explicit
ifblocks so both"0"and empty values are normalized to"1":-# Ensure we have at least 1 replica -[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1" -[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1" -[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1" +# Ensure we have at least 1 replica +if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then + REPO_SERVER_REPLICAS="1" +fi +if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then + CONTROLLER_REPLICAS="1" +fi +if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then + SERVER_REPLICAS="1" +fiThis matches the comment and guarantees components are brought back to at least one replica for TLS-enabled testing.
cmd/argocd-agent/principal.go (1)
90-98: Upstream Redis TLS modes still aren’t fully mutually exclusiveWithin the
redisTLSEnabledblock you only guard against--redis-upstream-tls-insecurebeing combined with--redis-upstream-ca-path:if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" { cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path") }But there are three upstream TLS modes:
--redis-upstream-tls-insecure--redis-upstream-ca-path--redis-upstream-ca-secret-nameUsers can still specify conflicting combinations such as insecure+secret or ca-path+secret; the current
if/else if/elsechain will silently prefer one, dropping the others.To make configuration predictable (and consistent with the strict pairing you already enforce for server cert/key), validate that at most one mode is set before applying the options, e.g.:
if redisTLSEnabled { // Validate upstream TLS configuration - only one mode allowed modesSet := 0 if redisUpstreamTLSInsecure { modesSet++ } if redisUpstreamTLSCAPath != "" { modesSet++ } if redisUpstreamTLSCASecretName != "" { modesSet++ } if modesSet > 1 { cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name") } // existing server TLS + upstream TLS selection logic... }This prevents ambiguous configurations and aligns the upstream TLS behavior with the rest of the principal’s TLS validation.
Also applies to: 258-288, 419-441
hack/dev-env/start-e2e.sh (1)
50-59: Fail fast if Redis password lookup fails.If the
kubectl get secretorbase64 --decodestep fails,REDIS_PASSWORDwill be empty but the script will still start the E2E stack, leading to confusing Redis auth failures later.Recommend checking the command result and the value before exporting:
-REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode) -export REDIS_PASSWORD +REDIS_PASSWORD=$(kubectl get secret argocd-redis \ + --context=vcluster-agent-managed \ + -n argocd \ + -o jsonpath='{.data.auth}' | base64 --decode) +if [ -z "${REDIS_PASSWORD}" ]; then + echo "Error: Failed to retrieve Redis password from argocd-redis secret in vcluster-agent-managed/argocd" >&2 + exit 1 +fi +export REDIS_PASSWORDThis makes Redis auth problems surface immediately when starting E2E.
hack/dev-env/configure-redis-tls.sh (1)
198-215: Fix literal"$(REDIS_PASSWORD)"in Redis args patch (TLS + auth will break).Inside the JSON patch,
"$(REDIS_PASSWORD)"is single-quoted, so the shell never expands the environment variable. Redis will literally be configured with the password$(REDIS_PASSWORD), which won’t match the secret and will break all authenticated connections.You should expand
REDIS_PASSWORDbefore or during patch construction and (optionally) warn if it’s unset. For example:-# Update Redis args for TLS -kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[ +# Update Redis args for TLS +REDIS_PASSWORD="${REDIS_PASSWORD:-}" +if [ -z "${REDIS_PASSWORD}" ]; then + echo "Warning: REDIS_PASSWORD not set; Redis will be configured without a usable password value" +fi + +kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[ { "op": "replace", "path": "/spec/template/spec/containers/0/args", "value": [ "--save", "", "--appendonly", "no", - "--requirepass", "$(REDIS_PASSWORD)", + "--requirepass", "'"${REDIS_PASSWORD}"'", "--tls-port", "6379", "--port", "0", "--tls-cert-file", "/app/tls/tls.crt", "--tls-key-file", "/app/tls/tls.key", "--tls-ca-cert-file", "/app/tls/ca.crt", "--tls-auth-clients", "no" ] } ]'You may also want to hard‑fail if
REDIS_PASSWORDis empty to avoid silently misconfiguring Redis in dev/e2e.principal/redisproxy/redisproxy.go (1)
836-897: Warn (or fail) when proxy TLS is enabled but upstream TLS is not, to avoid silent plaintext hops.In
establishConnectionToPrincipalRedis, upstream TLS is only used when:if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) { // wrap with TLS }If
tlsEnabledis true but no upstream TLS config is provided, the proxy will:
- Terminate TLS from Argo CD on the proxy, but
- Connect to principal Redis over plain TCP,
creating a surprising “TLS‑terminated at proxy only” hop that contradicts the PR goal of “Redis TLS encryption enabled by default for all connections”.
Consider making this mismatch explicit, e.g.:
- // If TLS is enabled for upstream, wrap the connection with TLS - if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) { + hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure + + if rp.tlsEnabled && !hasUpstreamTLSConfig { + logCtx.Warn("Redis proxy TLS is enabled, but no upstream TLS configuration provided; connection to principal Redis will be unencrypted") + } + + // If TLS is enabled for upstream, wrap the connection with TLS + if rp.tlsEnabled && hasUpstreamTLSConfig { tlsConfig := &tls.Config{ MinVersion: tls.VersionTLS12, } // ... existing CA / InsecureSkipVerify logic ...Optionally, you might also allow
hasUpstreamTLSConfigto trigger TLS even whentlsEnabledis false, if you anticipate scenarios where only the proxy→Redis hop should be encrypted.
🧹 Nitpick comments (4)
hack/dev-env/start-agent-managed.sh (1)
63-74: Consider cleanup for temporary credential files.The mTLS certificates are extracted to
/tmpfiles but never cleaned up. While acceptable for development, consider adding a trap to remove these files on script exit:TLS_CERT_PATH="/tmp/agent-managed-tls.crt" TLS_KEY_PATH="/tmp/agent-managed-tls.key" ROOT_CA_PATH="/tmp/agent-managed-ca.crt" + +# Cleanup temp files on exit +trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXIT + kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \This prevents credential accumulation in
/tmpand follows security best practices.test/e2e/fixture/fixture.go (1)
109-171: Cleanup robustness improvements are reasonable; consider minor hardeningThe extended 120s deletion waits and the shift to warning-only errors in
CleanUpplus use ofDeepCopy()for principal/managedApplicationandAppProjectwaits all improve e2e stability without changing production behavior. One small follow-up you might consider (optional) is:
- Guarding
resetManagedAgentClusterInfoagainst a nilclusterDetailsto make it safer ifCleanUpis ever reused outsideBaseSuite.SetupSuite.- If deletion timing keeps growing, factoring the “spin for up to N seconds with 1s sleep” pattern into a helper that can use
contextdeadlines instead of manual counters.These are non-blocking and the current changes are fine for e2e usage.
Also applies to: 218-291, 295-375, 457-471
hack/dev-env/start-agent-autonomous.sh (1)
37-47: Redis TLS and mTLS wiring in dev script looks correct; consider ephemeral key filesThe script correctly:
- Detects the Redis TLS CA and enables
--redis-tls-enabled/--redis-tls-ca-path.- Defaults
--redis-addrto a localhost port-forward.- Extracts agent client cert/key/CA and passes them via
--tls-client-cert/--tls-client-key/--root-ca-path.For local dev this is fine. As an optional hardening, you could write the TLS material to
mktemppaths andtrapa cleanup (rm) on exit to avoid leaving private keys in/tmpacross runs.Also applies to: 48-62, 63-75, 79-83
test/e2e/redis_proxy_test.go (1)
120-137: SSE stream and Redis proxy e2e reliability improvements look solidThe added 5s post-connect delay, buffered
msgChanwith “drain all messages” semantics, and theEventually-wrappedResourceTreecalls with logging should significantly reduce flakes from subscription races and transient EOFs. The SSE client transport is correctly tuned for long-lived streams (no global timeout, longer idle) and usingInsecureSkipVerifyis acceptable here given these are TLS-only e2e tests, not production code.Also applies to: 184-237, 326-337, 402-457, 584-670
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/e2e/sync_test.go(2 hunks)test/run-e2e.sh(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- principal/resource.go
🚧 Files skipped from review as they are similar to previous changes (8)
- principal/listen.go
- install/helm-repo/argocd-agent-agent/values.schema.json
- test/run-e2e.sh
- internal/argocd/cluster/cluster.go
- test/e2e/clusterinfo_test.go
- principal/tracker/tracking.go
- Makefile
- test/e2e/README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/e2e/rp_test.gohack/dev-env/start-agent-managed.shhack/dev-env/Procfile.e2ehack/dev-env/start-e2e.sh
🧬 Code graph analysis (4)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
GetArgoCDServerEndpoint(315-337)GetInitialAdminSecret(302-313)NewArgoClient(52-66)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/options.go (6)
WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSInsecure(543-548)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)internal/logging/logging.go (1)
Warn(300-302)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
Config(127-127)internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
principal/redisproxy/redisproxy.go (2)
principal/listen.go (1)
Listener(54-60)internal/logging/logging.go (3)
Error(305-307)Warn(300-302)Trace(285-287)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md
229-229: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Lint Go code
- GitHub Check: Build & cache Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (22)
test/e2e/fixture/argoclient.go (1)
316-336: LGTM!The environment variable check is a clean optimization that avoids unnecessary Kubernetes API calls when the server address is explicitly provided. The fallback logic is preserved, maintaining backward compatibility.
hack/dev-env/start-agent-managed.sh (3)
37-46: LGTM!The Redis TLS detection logic is clean and provides helpful user guidance. Checking for the CA certificate presence is the right approach to determine whether TLS should be enabled.
48-61: LGTM!The Redis address handling with sensible defaults and clear port-forward instructions is well-designed for local development workflows.
76-90: LGTM!The startup command properly integrates the Redis TLS arguments with existing mTLS configuration. The variable expansion and flag ordering are correct.
hack/dev-env/start-principal.sh (2)
23-29: LGTM!The Redis address defaulting is correctly implemented. As noted in the past review, this script no longer starts its own port-forward, avoiding conflicts with Procfile.e2e while providing a sensible default for TLS-friendly connections.
42-62: LGTM!The Redis TLS configuration properly checks for all required certificate files and constructs the appropriate arguments. The inline comments about SANs (localhost, rathole-container-internal, local IP) are helpful for understanding the certificate requirements.
agent/agent.go (2)
323-343: LGTM!The TLS configuration for cluster cache Redis is well-implemented:
- Proper TLS 1.2 minimum version
- Warning log for insecure mode (addresses past review feedback)
- Clean CA certificate loading with descriptive error messages
- Appropriate error handling
443-460: LGTM!The updated cluster cache info logic is an improvement:
- Immediate update on startup provides faster feedback
- Consistent behavior for both agent modes
- Proper cleanup with ticker.Stop()
test/e2e/fixture/cluster.go (5)
180-201: LGTM!Using
InsecureSkipVerify: truefor E2E tests is acceptable to accommodate dynamic LoadBalancer addresses (as noted in the PR objectives). The TLS encryption is retained, which still provides value for testing the TLS code paths.
206-218: LGTM!The generous timeout and connection pool settings are appropriate for E2E test environments, especially considering the port-forward latency mentioned in the comments. The retry configuration with exponential backoff is sensible.
298-327: LGTM!The Redis address resolution with multiple fallbacks (LoadBalancer ingress → LoadBalancerIP → ClusterIP) is robust and handles various cluster configurations. The environment variable override for local development is a good addition. Setting
ManagedAgentRedisTLSEnabled = truealigns with the PR objective of Redis TLS being required for E2E tests.
359-387: LGTM!The principal Redis configuration mirrors the managed agent approach with the same robust fallback chain. Consistent behavior across both configurations is good for maintainability.
226-267: Verify cleanup function is invoked at test suite end.The Redis client caching infrastructure prevents connection leaks. Confirm that
CleanupRedisCachedClients()is called in your test suite's teardown or cleanup phase to ensure cached connections are properly released.test/e2e/sync_test.go (1)
369-378: LGTM!The pre-sync hook Job name correction from
"pre-post-sync-before"to"before"aligns the test expectations with the actual test fixture. This is a straightforward test maintenance update.Also applies to: 464-473
docs/configuration/redis-tls.md (4)
1-68: LGTM!The overview and architecture sections provide clear explanations of the three TLS configuration points (Redis Proxy Server TLS, Upstream Redis TLS, Agent Redis TLS) and how they fit together. The text-based architecture diagram effectively illustrates the flow.
70-247: LGTM!The quick start and local development sections are comprehensive and practical:
- Clear statement that Redis TLS is required for E2E tests
- Well-structured explanations of local vcluster vs. remote vcluster setups
- Reverse tunnel documentation addresses a real need
- Manual testing steps align with the provided scripts
306-368: LGTM!The configuration tables provide excellent reference documentation with all flags, environment variables, and defaults clearly documented. The distinction between Kubernetes/Helm defaults (
true) and CLI defaults (false) is important and well-noted.
468-700: LGTM!The troubleshooting section addresses real-world issues comprehensively:
- Script output examples help users understand what to expect
- Common problems (connection refused, certificate verification, port-forward instability) are well-covered
- Security best practices provide appropriate guidance
- Solutions are actionable and clear
hack/dev-env/Procfile.e2e (1)
1-7: Verify environment variables are set.The Procfile uses environment variables
MANAGED_AGENT_REDIS_ADDRandAUTONOMOUS_AGENT_REDIS_ADDRon lines 6-7 for agent startup. Ensure these are exported by the parent script (e.g.,start-e2eor Makefile) before invoking this Procfile.test/e2e/rp_test.go (1)
295-305: Fixture-based Argo endpoint and admin secret lookup looks goodSwitching to
fixture.GetArgoCDServerEndpointandfixture.GetInitialAdminSecretkeeps this suite consistent with other e2e tests and centralizes TLS/endpoint logic; the updatedNewArgoClientusage is sound and maintains the existing behavior.hack/dev-env/gen-redis-tls-certs.sh (1)
14-27: Redis TLS cert generation script is sound and addresses earlier concernsThe script now:
- Lets
opensslerrors surface (no2>/dev/null) underset -e.- Generates appropriate SANs for control-plane, proxy, and agent Redis instances.
- Only appends
IP.3 = ${LOCAL_IP}whenLOCAL_IPis non-empty, avoiding invalid SANs.This looks correct and is appropriately idempotent for dev/e2e usage.
Also applies to: 60-91, 105-136
docs/getting-started/kubernetes/index.md (1)
159-230: Redis TLS documentation is consistent with the tooling and default behaviorThe new sections:
- Clearly mark Redis TLS as required and walk through CA/server cert generation,
argocd-redis-tlssecret creation, and TLS enablement on both control-plane and workload clusters.- Reuse the same CA for workload Redis and mirror the deployment patches used by the dev/e2e scripts.
- Cross-link to the dedicated Redis TLS configuration doc.
This aligns with the CLI flags and helper scripts added in this PR and provides a practical, copy-pastable path for users.
Also applies to: 337-381, 646-646
c546df9 to
65cd0ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/e2e/fixture/cluster.go (1)
226-267: Cached Redis clients: lock scope and explicit cleanup; note prior concern about closingThe cached client map and mutex give you reuse across calls, but there are a couple of details to consider:
getCachedCacheInstancecurrently holdscachedRedisClientMutexwhile constructing a new client viagetCacheInstance. It’s cheap today, but you could narrow the critical section by computing the cache key and constructing the client outside the lock, only locking around the map access/update.CleanupRedisCachedClientsonly resets the map; it doesn’t close underlying connections, so the comment “stores Redis clients to prevent connection leaks” is a bit misleading. Ifappstatecache.Cacheever exposes aClose/Shutdownmethod, or if you can track the underlying*redis.Clientalongside the cache in a small wrapper struct, it would be preferable to callClose()here before dropping references. If that’s not feasible, consider updating the comment to clarify that the fixture relies on process teardown/GC for actual connection cleanup.This restates an earlier review note about explicit connection closing in
CleanupRedisCachedClients.
🧹 Nitpick comments (15)
hack/dev-env/setup-vcluster-env.sh (1)
159-190: Well-structured environment-driven Redis endpoint configuration with clear scenario documentation.The branching logic correctly handles the three deployment scenarios (in-cluster principal, CI, and local development) and substitutes the appropriate Redis endpoint for each. Comments (lines 159–170) effectively document the control flow and explain why different strategies are needed across environments.
Minor suggestion: Add error handling for IP address discovery in local development scenario.
On lines 182–186, if
ipconfig getifaddr en0or theip r show defaultparsing fails,ARGO_AGENT_IPADDRwill be empty, causing the sed replacement on line 189 to produce invalid configuration. While this is a dev environment and errors would surface quickly, adding a check would prevent silent misconfiguration:if [[ "$OSTYPE" == "darwin"* ]]; then ARGO_AGENT_IPADDR=$(ipconfig getifaddr en0 2>/dev/null) || true else ARGO_AGENT_IPADDR=$(ip r show default 2>/dev/null | sed -e 's,.*\ src\ ,,' | sed -e 's,\ metric.*$,,' | head -n 1) || true fi if [[ -z "$ARGO_AGENT_IPADDR" ]]; then echo "WARNING: Failed to resolve local IP address for Redis proxy; using fallback" >&2 ARGO_AGENT_IPADDR="localhost" # or handle appropriately for your use case fitest/e2e/fixture/argoclient.go (1)
316-334: Consider also checking for Ingress IP.The current logic captures
LoadBalancerIPfirst, then overrides withHostnameif present. However, ifLoadBalancerIPis empty and the Ingress has an IP (not hostname), that IP won't be used.argoEndpoint := srvService.Spec.LoadBalancerIP if len(srvService.Status.LoadBalancer.Ingress) > 0 { - if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" { - argoEndpoint = hostname + ingress := srvService.Status.LoadBalancer.Ingress[0] + if ingress.Hostname != "" { + argoEndpoint = ingress.Hostname + } else if ingress.IP != "" { + argoEndpoint = ingress.IP } }agent/agent.go (1)
323-343: Missing CA configuration when TLS is enabled but no explicit CA is provided.When
redisTLSEnabledis true butredisTLSInsecureis false andredisTLSCAPathis empty, the TLS config is created with onlyMinVersionset (lines 326-328) and noRootCAs. This means the system CA pool will be used by default, which may or may not be the intended behavior.Consider adding a log message to clarify this fallback, or explicitly setting
RootCAsto the system pool for clarity:if a.redisProxyMsgHandler.redisTLSEnabled { clusterCacheTLSConfig = &tls.Config{ MinVersion: tls.VersionTLS12, } if a.redisProxyMsgHandler.redisTLSInsecure { log().Warn("INSECURE: Not verifying Redis TLS certificate for cluster cache") clusterCacheTLSConfig.InsecureSkipVerify = true } else if a.redisProxyMsgHandler.redisTLSCAPath != "" { caCertPEM, err := os.ReadFile(a.redisProxyMsgHandler.redisTLSCAPath) if err != nil { return nil, fmt.Errorf("failed to read CA certificate for cluster cache: %w", err) } certPool := x509.NewCertPool() if !certPool.AppendCertsFromPEM(caCertPEM) { return nil, fmt.Errorf("failed to parse CA certificate for cluster cache from %s", a.redisProxyMsgHandler.redisTLSCAPath) } clusterCacheTLSConfig.RootCAs = certPool + } else { + log().Debug("Using system CA pool for Redis TLS verification") } }hack/dev-env/start-e2e.sh (1)
96-102: Cleanup function doesn't restore kubectl context.Other scripts in the repo (e.g.,
setup-vcluster-env.sh,configure-redis-tls.sh) save and restore the initial kubectl context in their cleanup functions. This script's cleanup only stops goreman but doesn't restore context, which could leave the environment in an unexpected state if the script switches contexts during execution.Consider adding context restoration for consistency:
+initial_context=$(kubectl config current-context) + # Function to cleanup on exit cleanup() { echo "Stopping goreman..." kill $GOREMAN_PID 2>/dev/null || true wait $GOREMAN_PID 2>/dev/null || true + kubectl config use-context ${initial_context} 2>/dev/null || true }principal/redisproxy/redisproxy.go (1)
852-894: Consider adding a warning when server TLS is enabled but upstream TLS is not configured.When
rp.tlsEnabledis true but none of the upstream TLS options are set (line 853 condition is false), the connection to principal Redis will be unencrypted. This creates an asymmetric security posture where incoming connections are encrypted but outgoing connections are not.While this may be intentional for some deployments, a warning would help operators understand the configuration:
// If TLS is enabled for upstream, wrap the connection with TLS -if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) { +hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure + +if rp.tlsEnabled && !hasUpstreamTLSConfig { + logCtx.Warn("Redis proxy server has TLS enabled, but no upstream TLS configuration provided. Connection to principal Redis will be unencrypted.") +} + +if rp.tlsEnabled && hasUpstreamTLSConfig { tlsConfig := &tls.Config{This helps operators identify potential security misconfigurations during deployment.
hack/dev-env/start-agent-managed.sh (1)
63-74: Consider adding cleanup of temporary TLS files on script exit.The script extracts mTLS certificates to temp files (
/tmp/agent-managed-tls.{crt,key,ca}), but does not clean them up. If this script is run repeatedly during development, temp files could accumulate. Consider adding a trap to clean up on exit:trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXITOr verify that cleanup happens elsewhere (e.g., in goreman or make targets).
docs/getting-started/kubernetes/index.md (1)
226-229: Fix markdown code-block style issue (indented vs fenced).Static analysis flags line 229 as using indented code block syntax when fenced syntax should be used. The block starting at line 226 should use fenced backticks (
```) instead of indentation to maintain consistency and linting compliance.principal/listen.go (2)
174-199: Tighten logging and error handling in WebSocket vs native gRPC branchesThe added logs are helpful, but a few small tweaks would improve clarity and reduce surprises:
- Line 174: Consider making this a debug log and dropping the emoji so logs stay machine‑friendly and less noisy in production.
- Line 176: There’s a leading space in
" WebSocket is ENABLED..."which will look odd in log output.- Lines 185–189 and 193–198: Using the outer
errvariable inside goroutines is safe here but non‑idiomatic; it’s clearer to use a localerrinside each goroutine and keep the outer error forserveGRPCitself.- Line 196:
log().WithError(err).Warn(" gRPC server.Serve() exited")will logerror=<nil>on a graceful shutdown. It’s usually better to log atWarnonly whenerr != nil, and otherwise log a simple Info for a clean exit. The WebSocketServeTLSpath currently doesn’t log on exit at all, only forwards the error toerrch, so you might also want a symmetric exit log there.For example, you could refactor the goroutines like this to address all of the above while keeping the
errchbehavior the same:- log().WithField("enableWebSocket", s.enableWebSocket).Info("🔧 Checking if WebSocket is enabled") + log().WithField("enableWebSocket", s.enableWebSocket).Debug("Checking if WebSocket is enabled") if s.enableWebSocket { - log().Info(" WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC") + log().Info("WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC") @@ - go func() { - log().Info("Starting WebSocket downgrading server") - err = downgradingServer.ServeTLS(s.listener.l, s.options.tlsCertPath, s.options.tlsKeyPath) - errch <- err - }() + go func() { + log().Info("Starting WebSocket downgrading server") + err := downgradingServer.ServeTLS(s.listener.l, s.options.tlsCertPath, s.options.tlsKeyPath) + if err != nil { + log().WithError(err).Warn("WebSocket downgrading server exited with error") + } else { + log().Info("WebSocket downgrading server exited gracefully") + } + errch <- err + }() } else { @@ - go func() { - log().Info("Starting gRPC server.Serve() - server is now accepting connections") - err = s.grpcServer.Serve(s.listener.l) - log().WithError(err).Warn(" gRPC server.Serve() exited") - errch <- err - }() + go func() { + log().Info("Starting gRPC server.Serve() - server is now accepting connections") + err := s.grpcServer.Serve(s.listener.l) + if err != nil { + log().WithError(err).Warn("gRPC server.Serve() exited with error") + } else { + log().Info("gRPC server.Serve() exited gracefully") + } + errch <- err + }() }
224-231: Service registration logs look good; consider making them more structuredThe new Info logs around service registration are useful and low‑overhead at startup. If you want to make them a bit more compact and query‑friendly, you could use a structured
servicefield instead of separate messages:- log().Info("Registering gRPC services on principal") - authapi.RegisterAuthenticationServer(s.grpcServer, authSrv) - log().Info("Authentication service registered successfully") - versionapi.RegisterVersionServer(s.grpcServer, version.NewServer(s.authenticate)) - log().Info("Version service registered successfully") - eventstreamapi.RegisterEventStreamServer(s.grpcServer, eventstream.NewServer(s.queues, s.eventWriters, metrics, s.clusterMgr, eventstream.WithNotifyOnConnect(s.notifyOnConnect))) - log().Info("EventStream service registered successfully") + log().Info("Registering gRPC services on principal") + authapi.RegisterAuthenticationServer(s.grpcServer, authSrv) + log().WithField("service", "Authentication").Info("gRPC service registered") + versionapi.RegisterVersionServer(s.grpcServer, version.NewServer(s.authenticate)) + log().WithField("service", "Version").Info("gRPC service registered") + eventstreamapi.RegisterEventStreamServer(s.grpcServer, eventstream.NewServer(s.queues, s.eventWriters, metrics, s.clusterMgr, eventstream.WithNotifyOnConnect(s.notifyOnConnect))) + log().WithField("service", "EventStream").Info("gRPC service registered")Not strictly necessary, but it can make log search/aggregation simpler if you start adding more services over time.
test/e2e/fixture/cluster.go (2)
170-218: TLS config and Redis client tuning are reasonable for E2E, scoped by flagsWiring
redis.Options.TLSConfigwithMinVersion: tls.VersionTLS12andInsecureSkipVerify: truebehind the*RedisTLSEnabledbooleans matches the PR intent for E2E: you get encrypted transport without needing stable LB hostnames. The extended timeouts, pool sizing, and retry/backoff settings are also sane defaults for noisy CI environments. If in the future you want to test full certificate validation, you could optionally add a test‑only env toggle to switchInsecureSkipVerifyoff and setRootCAs/ServerName, but the current behavior is fine for this fixture.
288-327: Redis address discovery and TLS flags are robust; consider optional TLS override for local devThe updated
getManagedAgentRedisConfig/getPrincipalRedisConfiglogic to try LoadBalancer ingress, thenspec.LoadBalancerIP, thenClusterIPprovides a much more resilient way to find Redis, and the error messages are clear. SettingManagedAgentRedisTLSEnabled/PrincipalRedisTLSEnabledtotrueby default, with env vars (MANAGED_AGENT_REDIS_ADDR,ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS) only overriding the address, aligns with the “TLS‑everywhere for tests” goal.If you later need to support non‑TLS Redis in ad‑hoc local setups, you might add a parallel env knob (e.g.
*_REDIS_TLS_DISABLEDor a scheme‑based address) to flip the*RedisTLSEnabledflags, but that’s not required for the current E2E path.Also applies to: 359-387
test/e2e/redis_proxy_test.go (4)
120-124: SSE readiness sleep works but is still inherently racyThe extra 5-second sleep will reduce the race between SSE connection and Redis
SUBSCRIBE, but it’s still a fixed guess and may be too short/long on some clusters. If the server reliably emits at least one initial SSE event after subscription, an optional follow-up would be to gate on “first message observed onmsgChan” viarequire.Eventuallyinstead of a hardSleep, so the test waits just long enough and becomes deterministic.Also applies to: 326-330
188-208: Channel-drain loops look correct; could be factored into a shared helperThe “drain all available SSE messages” loops correctly avoid blocking (thanks to the
defaultbranch) and ensure eachEventuallytick processes the full backlog before deciding whether to retry. However, the logic is duplicated across the two tests; consider extracting a small helper likedrainSSEUntilPodSeen(t *testing.T, msgChan <-chan string, podName string) boolto centralize this behavior and logging, which will simplify future changes to the drain semantics.Also applies to: 407-427
211-237: ResourceTree retry logic is solid; duplication could be reducedWrapping
appClient.ResourceTreeinrequires.Eventuallywith explicit logging on transient errors / nil trees is a good way to handle EOFs and Redis hiccups. The almost-identical blocks in the managed vs. autonomous tests (only differing in whichApplicationis referenced) could be pulled into a small helper (e.g.,waitForPodInResourceTree) to reduce repetition and keep the Redis/Argo retry policy in one place.Also applies to: 430-456
588-588: Buffered SSE channel and HTTP/TLS settings are appropriate for tests; consider clarifying test-only TLS behaviorUsing a buffered
msgChan(size 100) plus aTimeout: 0client andResponseHeaderTimeout: 0is consistent with long-lived SSE streams and avoids reader backpressure in these tests. The explicit*tls.Config{InsecureSkipVerify: true}is acceptable here because this helper lives undertest/e2eand you still exercise TLS on the wire, but it would be worth adding a short comment stating that this is intentionally insecure and test-only (due to self-signed certs and dynamic LoadBalancer endpoints) to discourage copy-paste into production paths. If you expect a high reconnection rate, you might also consider hoisting thehttp.Transport/http.Clientconstruction outside theforloop to avoid reallocating them on every retry, though this is non-critical in test code.Please double-check against your current Go version’s
net/httpdocumentation thatTimeout: 0plus context cancellation behaves as expected for SSE (i.e., no hidden default deadline). For example, verify locally that a hung SSE server causes the request to terminate when the context is canceled, not earlier due to client-side timeouts.Also applies to: 642-649, 650-653, 661-663
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
.github/workflows/ci.yaml(1 hunks)Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/setup-vcluster-env.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/e2e/sync_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- test/e2e/sync_test.go
- principal/resource.go
- test/run-e2e.sh
- hack/dev-env/configure-redis-tls.sh
- hack/dev-env/start-agent-autonomous.sh
- test/e2e/clusterinfo_test.go
- install/helm-repo/argocd-agent-agent/values.schema.json
- cmd/argocd-agent/agent.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
Makefilehack/dev-env/Procfile.e2e.github/workflows/ci.yamlhack/dev-env/start-e2e.shtest/e2e/README.mdtest/e2e/rp_test.go
🧬 Code graph analysis (6)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(42-56)AgentManagedName(37-37)AgentClusterServerURL(39-39)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-271)
agent/agent.go (2)
internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
hack/dev-env/start-e2e.sh (2)
hack/dev-env/configure-redis-tls.sh (1)
cleanup(50-52)hack/dev-env/setup-vcluster-env.sh (1)
cleanup(39-41)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/options.go (6)
WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSInsecure(543-548)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
GetArgoCDServerEndpoint(315-337)GetInitialAdminSecret(302-313)NewArgoClient(52-66)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md
229-229: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (42)
principal/tracker/tracking.go (1)
75-78: LGTM! Valid concurrency fix.Buffering the channel with size 1 is the correct solution for this single request/response pattern. It prevents the sender from blocking when the receiver goroutine hasn't started yet, eliminating the potential deadlock. The comments clearly explain the rationale.
test/e2e/fixture/argoclient.go (1)
27-27: LGTM!Adding the
osimport to support environment variable reading is appropriate for the new functionality.hack/dev-env/Procfile.e2e (1)
1-7: LGTM!The port-forward setup correctly exposes Redis services from each vcluster to distinct localhost ports, and the sleep delays ensure they're ready before starting the principal/agents. Using environment variables for Redis addresses aligns with the TLS configuration approach.
internal/argocd/cluster/cluster.go (2)
135-142: LGTM!Good defensive addition. Initializing
ConnectionStatewhen it doesn't exist ensures newly connected agents are properly reflected in the cluster info, avoiding a scenario where cache stats are set but connection status appears unknown.
176-191: LGTM!The TLS configuration is correctly wired into the Redis client options. Passing
nilfortlsConfigmaintains backward compatibility with non-TLS connections.test/e2e/fixture/fixture.go (6)
109-112: LGTM!Increasing the deletion timeout from the previous value to 120 iterations (2 minutes) provides more headroom for finalizer processing and TLS connection establishment overhead.
229-240: Good use of DeepCopy to avoid mutating loop variables.Creating a deep copy before modifying the namespace prevents unintended side effects on the original list item. The warning-based error handling is appropriate for cleanup operations that shouldn't fail the entire test.
254-265: Consistent pattern applied here as well.The same deep copy and warning-based cleanup approach maintains consistency across the cleanup logic.
315-324: Deep copy for AppProject cleanup.Correctly creates a copy before modifying name and namespace, avoiding mutation of the loop variable.
457-461: Non-fatal Redis cleanup is appropriate.Logging a warning instead of failing when Redis is unavailable (e.g., port-forward died) ensures cleanup completes and doesn't block subsequent test runs.
465-470: VerifygetCachedCacheInstanceis defined and properly configured.The function
getCachedCacheInstanceis referenced but cannot be verified in the available context. Ensure it is properly implemented in the fixture package and handles TLS configuration support as expected.test/e2e/rp_test.go (3)
162-169: Good refactoring to use fixture helpers.Consolidating endpoint and secret retrieval into reusable fixture functions improves maintainability and ensures consistent behavior across tests, especially with the new TLS configuration support.
295-304: Consistent usage of fixture helpers.Same pattern applied here maintains consistency across the test suite.
509-510: Minor formatting change.No functional change - the URL is now on a single line.
cmd/argocd-agent/principal.go (4)
89-98: LGTM! Clear Redis TLS configuration variables.The TLS configuration fields are well-organized, covering server TLS (cert/key from path or secret) and upstream TLS (CA from path, secret, or insecure mode).
258-299: LGTM! Comprehensive TLS configuration with proper validation.The mutual exclusivity validation (lines 272-286) correctly ensures only one upstream TLS mode is specified. The special handling at line 281 that excludes the default secret name from the count is appropriate—it allows users to explicitly set only insecure mode or CA path without being blocked by the default value.
The configuration flow properly mirrors the existing server cert/key validation pattern (lines 262-266).
430-451: LGTM! Well-documented CLI flags with sensible defaults.The Redis TLS flags follow existing patterns with environment variable fallbacks. Enabling TLS by default (
trueat line 432) aligns with the PR objective of "TLS encryption enabled by default."
482-482: Verify the 30-second timeout is appropriate.The timeout was increased from 2 seconds to 30 seconds. While this accommodates TLS secret retrieval which may take longer, 30 seconds is quite generous and could delay startup failures. Consider whether 10-15 seconds might be sufficient, or document why 30 seconds is needed.
agent/agent.go (1)
445-460: LGTM! Improved cluster cache info update logic.The refactored goroutine now:
- Sends an initial update immediately on startup (line 448)
- Uses a single ticker for periodic updates
- Works for both managed and autonomous agent modes
This is cleaner than mode-specific goroutines and ensures timely initial synchronization.
hack/dev-env/start-e2e.sh (3)
50-59: LGTM! Clean variable setup with proper declaration.The Redis password assignment now correctly separates declaration and assignment (lines 58-59), which properly surfaces
kubectlfailures. The static localhost addresses for Redis endpoints are appropriate for the TLS certificate validation during E2E tests.
104-170: LGTM! Robust readiness check with excellent diagnostics.The Redis proxy readiness check includes:
- Fallback from
ncto bash TCP redirection (lines 109-121)- Progress reporting (lines 123-127)
- Comprehensive failure diagnostics including goreman status, port checks, and log tails (lines 134-169)
This will significantly aid debugging E2E environment issues.
192-227: LGTM! Proper rollout handling with timeout.The Argo CD component restart and rollout status checks are well-implemented with:
- Individual rollout status checks per component
- 90-second timeout (reasonable for component restarts)
- Failure aggregation before exit
- Pod status output on failure for debugging
principal/redisproxy/redisproxy.go (3)
98-128: LGTM! Clean TLS configuration API.The setter methods provide a clear interface for configuring TLS:
- Server TLS from certificate/key objects or file paths
- Upstream TLS CA from pool, file path, or insecure mode
The separation of concerns makes the configuration flexible for different deployment scenarios.
130-154: LGTM! Robust TLS config creation with proper error handling.The
createServerTLSConfigmethod correctly:
- Prioritizes path-based loading over in-memory certificates
- Properly constructs
tls.Certificatefrom raw cert/key- Sets minimum TLS version to 1.2
- Returns descriptive errors on failure
156-200: LGTM! Clean TLS listener implementation.The
Start()method properly branches between TLS and non-TLS listeners with appropriate logging to indicate which mode is active.Makefile (1)
59-79: Add error handling to TLS configuration scripts to fail fast on errors.The TLS setup steps (lines 59-79) execute multiple scripts sequentially without error handling. If any script fails, the Makefile continues to the next step, potentially leaving the E2E environment in a partially configured state. Add
|| exit 1after each script invocation to stop execution immediately if a step fails.docs/configuration/redis-tls.md (3)
226-230: Confirm thathack/dev-env/reverse-tunnel/README.mdexists.Line 162 references
hack/dev-env/reverse-tunnel/README.mdfor detailed reverse-tunnel setup. Ensure this documentation file is included in the PR or update the link if it's located elsewhere.
1-700: Excellent comprehensive Redis TLS documentation.This is a well-structured, thorough guide covering overview, architecture, certificate management, configuration, Kubernetes installation, troubleshooting, and security best practices. The examples are clear and practical, and the documentation aligns well with the actual TLS implementation across the codebase. The table structures for CLI flags and environment variables are particularly helpful for users.
31-49: Resolve remaining markdownlint fenced-code-block language tags (MD040).The documentation uses
text tags for most code blocks, but the static analysis tool flags remaining bare code fences at lines 150, 475, 486, and 504. Ensure all architecture diagrams and script output sections are tagged withtext to satisfy linting.hack/dev-env/start-agent-managed.sh (1)
37-110: LGTM: Redis TLS configuration and argument passing look correct.The Redis TLS detection, certificate extraction, address defaulting, and dual-path invocation (dist vs go run) are all properly implemented. The script provides clear user guidance when certificates are missing, and TLS arguments are consistently passed to both binary paths.
hack/dev-env/start-principal.sh (2)
23-86: LGTM: Principal TLS startup configuration is well-structured.The Redis TLS detection, certificate checks, and dual-path argument passing are correct. The script properly handles the default Redis address (localhost:6380) and provides good comments about certificate SANs and reverse tunnel support.
44-62: Verify certificate file naming consistency between cert generation and usage scripts.The script checks for
redis-proxy.crt,redis-proxy.key, andca.crtfiles (lines 46-48). Confirm that thegen-redis-tls-certs.shscript generates files with these exact names. If the naming differs between the generation and startup scripts, update one to match the other or document the intentional difference.docs/getting-started/kubernetes/index.md (1)
159-230: Excellent Redis TLS setup instructions for Kubernetes.The sections provide clear, step-by-step guidance for configuring Redis TLS on both control-plane and workload clusters, including certificate generation, secret creation, Redis patching, and verification. The instructions are well-organized and include helpful commands. One minor note: the patches use JSON array append syntax (
-in the path), which should work correctly for idempotent re-runs when arrays already exist.Also applies to: 337-381
hack/dev-env/gen-redis-tls-certs.sh (1)
1-150: LGTM: Certificate generation script is well-structured and idempotent.The script properly generates Redis TLS certificates for all required components (CA, control-plane, proxy, autonomous, managed) with appropriate SANs including local IP detection, localhost, cluster DNS, and reverse-tunnel hostname. Error handling uses
set -e, and temporary files are cleaned up. The idempotent checks for existing keys/certs make the script safe to re-run.test/e2e/README.md (2)
83-107: Document the E2E_READY marker output requirement for make start-e2e.The CI workflow (
.github/workflows/ci.yaml) waits for anE2E_READY:marker in the logs from themake start-e2estep. This README should document that thestart-e2etarget (or its underlying script) must output this marker after all components are ready, so that CI's readiness check functions correctly.
1-137: LGTM: Clear and well-organized E2E test documentation.The multi-terminal workflow is clearly explained with proper step numbering, and the Redis TLS requirement is prominently documented. The addition of the reverse-tunnel section for remote clusters is excellent, and the note about InsecureSkipVerify in test fixtures appropriately clarifies that TLS encryption is still enabled. The environment auto-detection guidance (local vs CI) is helpful.
hack/dev-env/configure-argocd-redis-tls.sh (3)
37-57: Verify redis.server configuration logic for control-plane vs agent clusters.The script skips
redis.serverconfiguration forvcluster-control-plane(line 52), assuming it uses the Redis proxy. For agent clusters, it setsredis.servertoargocd-redis:6379(line 41). Verify that this logic matches the actual cluster configuration and that the control-plane'sredis.serveris correctly set by other means (e.g.,setup-vcluster-env.sh).
59-304: Verify that the secret nameargocd-redis-tlsis used consistently across all setup scripts.Lines 80, 98, 174, etc., reference the secret
argocd-redis-tlswith theca.crtkey. Confirm that this matches:
- The secret created by
hack/dev-env/configure-redis-tls.sh- The secret created in Kubernetes installation docs (section 2.4)
- Any other TLS configuration in the codebase
1-354: LGTM: Comprehensive and idempotent Argo CD TLS configuration script.The script robustly configures Redis TLS across all Argo CD components (server, repo-server, application-controller) with proper idempotency checks for volumes, mounts, and arguments. Error handling is explicit, and replica scaling logic has been fixed to correctly ensure a minimum of 1 replica. The use of conditional component checks makes the script flexible for clusters with different deployments.
.github/workflows/ci.yaml (1)
199-214: Verify thatE2E_READY:marker is generated during E2E setup.The readiness logic now waits for an event-driven marker instead of a fixed delay, which is a solid improvement. However, this depends on
start-e2e.sh(or one of its components) outputting theE2E_READY:marker to/tmp/e2e-argocd-agent.log. Confirm this marker is consistently generated as part of the E2E setup flow and that logs are properly redirected to this file.test/e2e/fixture/cluster.go (1)
42-52: TLS flags and cached Redis usage in cluster info helpers look consistentThe added
ManagedAgentRedisTLSEnabled/PrincipalRedisTLSEnabledfields and the switch togetCachedCacheInstanceinGetManagedAgentClusterInfo/GetPrincipalClusterInfoare consistent with the TLS‑enabled E2E design, and the extra logging around principal lookups should make Redis connectivity issues easier to debug in tests. EnsuregetClusterConfigurationsis always called before these helpers so the addresses/TLS flags are populated onClusterDetails.test/e2e/redis_proxy_test.go (1)
184-184: Extended pod replacement windows are reasonableBumping the
requires.Eventuallywait duration for new pod creation from 30s to 60s is a pragmatic way to accommodate slower schedulers without changing semantics, and looks safe for both managed and autonomous flows.Also applies to: 402-402
b199f13 to
6fb6d33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (2)
hack/dev-env/configure-redis-tls.sh (1)
68-70: Add error handling for context switch.If
kubectl config use-contextfails (e.g., context doesn't exist), the script continues and may operate on the wrong cluster. This was flagged in a previous review but not yet addressed.# Switch context echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }hack/dev-env/configure-argocd-redis-tls.sh (1)
240-255: Same volumes array handling issue.Like argocd-repo-server, argocd-application-controller assumes the volumes array exists. Apply the same defensive approach as suggested for argocd-repo-server to handle cases where the volumes array might not exist initially.
🧹 Nitpick comments (9)
docs/configuration/redis-tls.md (2)
150-156: Add language specifier to fenced code block.This code block is still missing a language identifier, which triggers markdownlint MD040. The past review comment indicated this was addressed, but the current code still shows a bare fence.
- ``` + ```text Argo CD Server (remote vcluster) → rathole Deployment (remote) → rathole Container (local Mac) → Principal process (local Mac) - ``` + ```
475-520: Add language specifiers to remaining code blocks.Several code blocks in the "Understanding Script Output" section are missing language specifiers (markdownlint MD040). Tag them as
textfor consistency:**gen-redis-tls-certs.sh:** -``` +```text Generating Redis TLS certificates in hack/dev-env/creds/redis-tls... ...**configure-redis-tls.sh:** -``` +```text ╔══════════════════════════════════════════════════════════╗ ...**configure-argocd-redis-tls.sh:** -``` +```text ╔══════════════════════════════════════════════════════════╗ ...docs/getting-started/kubernetes/index.md (1)
207-211: Clarify$(REDIS_PASSWORD)is a Redis environment variable reference.The
$(REDIS_PASSWORD)syntax in the Redis args may confuse users who might think it's shell variable expansion. Consider adding a brief note that this is how Redis references its internal environment variable, or ensure the existing Argo CD Redis deployment already has this env var defined.Consider adding a note:
!!! note "Redis Password" The `$(REDIS_PASSWORD)` syntax references the Redis container's environment variable, which is typically set from the `argocd-redis` secret.test/e2e/README.md (1)
83-108: Accurate Redis TLS documentation with proper script references.The Redis TLS section correctly documents the automatic setup and provides manual reconfiguration steps using the scripts added in this PR. The note about
InsecureSkipVerifyin test fixtures appropriately explains the trade-off for testing convenience.Optional: Consider adding a comma after "SANs)" in line 107 for improved readability:
-...localhost port-forwards (which match the certificate SANs). TLS encryption is fully enabled... +...localhost port-forwards (which match the certificate SANs), TLS encryption is fully enabled...test/e2e/fixture/fixture.go (1)
108-172: Extended deletion timeouts are reasonable for E2E usageBumping the deletion/wait loops from 60 to 120 seconds (in both
EnsureDeletionandWaitForDeletion) is a pragmatic way to reduce flakiness under slow CI; the polling logic and error handling remain sane. If this ever needs tuning per-suite, consider lifting the120into a shared constant, but it’s fine as-is.hack/dev-env/start-e2e.sh (1)
206-266: Consider waiting for application-controller rollout as wellWhen Argo CD needs Redis reconfiguration, you restart
argocd-serverandargocd-repo-serverand wait for their rollouts, but only check those two. Since the application controller also depends on Redis, you might want to add a rollout wait forargocd-application-controllertoo to catch early failures:- kubectl --context vcluster-control-plane -n argocd rollout restart statefulset argocd-application-controller 2>/dev/null || true + kubectl --context vcluster-control-plane -n argocd rollout restart statefulset argocd-application-controller 2>/dev/null || true + if ! kubectl --context vcluster-control-plane -n argocd rollout status statefulset argocd-application-controller --timeout=$ROLLOUT_TIMEOUT 2>/dev/null; then + echo " ERROR: argocd-application-controller rollout timed out" + kubectl --context vcluster-control-plane -n argocd get pods -l app.kubernetes.io/name=argocd-application-controller + ROLLOUT_FAILED=true + fitest/e2e/fixture/cluster.go (1)
170-223: Redis client TLS config for tests favors simplicity over strict verificationEnabling TLS with
MinVersion: TLS1.2andInsecureSkipVerify: truefor principal/managed Redis ingetCacheInstancematches the PR description of “TLS-on but skip verification” for E2E. This is acceptable for test harness code, but it means tests won’t catch CA/SAN misconfigurations. If you later want stricter coverage, consider:
- Allowing an env flag to turn verification on (and wiring a CA pool), while keeping the current behavior as the default; or
- At least logging when
InsecureSkipVerifyis active so it’s obvious in test logs.principal/redisproxy/redisproxy.go (2)
136-148: Consider documenting configuration precedence.When both file paths and in-memory certificates are configured, the file paths take precedence (lines 136-141 execute first). Consider documenting this behavior in the method comment or in the struct field comments to make the priority explicit for callers.
150-153: Consider TLS 1.3 as minimum version for new implementations.The code sets
MinVersion: tls.VersionTLS12. For new implementations handling sensitive data, TLS 1.3 (tls.VersionTLS13) provides stronger security guarantees and is widely supported. TLS 1.2 is acceptable but consider upgrading if compatibility allows.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
.github/workflows/ci.yaml(1 hunks)Makefile(1 hunks)agent/agent.go(3 hunks)agent/inbound_redis.go(3 hunks)agent/options.go(1 hunks)agent/outbound_test.go(1 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/setup-vcluster-env.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/README.md(3 hunks)install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml(2 hunks)install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml(1 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)install/helm-repo/argocd-agent-agent/values.yaml(1 hunks)install/kubernetes/agent/agent-deployment.yaml(3 hunks)install/kubernetes/agent/agent-params-cm.yaml(1 hunks)install/kubernetes/principal/principal-deployment.yaml(3 hunks)install/kubernetes/principal/principal-params-cm.yaml(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)internal/argocd/cluster/cluster_test.go(3 hunks)internal/argocd/cluster/informer_test.go(6 hunks)internal/argocd/cluster/manager.go(3 hunks)internal/argocd/cluster/manager_test.go(3 hunks)principal/listen.go(3 hunks)principal/options.go(2 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/server.go(3 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (22)
- test/e2e/fixture/argoclient.go
- install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml
- test/e2e/rp_test.go
- test/run-e2e.sh
- internal/argocd/cluster/manager.go
- .github/workflows/ci.yaml
- test/e2e/clusterinfo_test.go
- agent/inbound_redis.go
- test/e2e/redis_proxy_test.go
- principal/listen.go
- hack/dev-env/start-agent-autonomous.sh
- install/kubernetes/agent/agent-deployment.yaml
- agent/outbound_test.go
- install/kubernetes/principal/principal-deployment.yaml
- hack/dev-env/gen-redis-tls-certs.sh
- hack/dev-env/start-agent-managed.sh
- cmd/argocd-agent/agent.go
- install/kubernetes/agent/agent-params-cm.yaml
- install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
- principal/tracker/tracking.go
- install/kubernetes/principal/principal-params-cm.yaml
- principal/resource.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
Makefilehack/dev-env/start-e2e.shinstall/helm-repo/argocd-agent-agent/values.yamlhack/dev-env/Procfile.e2etest/e2e/README.md
🧬 Code graph analysis (11)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(42-56)AgentManagedName(37-37)AgentClusterServerURL(39-39)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)test/fake/kube/kubernetes.go (1)
NewFakeKubeClient(31-44)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
Config(127-127)internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
hack/dev-env/start-e2e.sh (2)
hack/dev-env/setup-vcluster-env.sh (2)
apply(94-271)cleanup(39-41)hack/dev-env/configure-redis-tls.sh (1)
cleanup(50-52)
agent/options.go (2)
principal/options.go (1)
WithRedisTLSEnabled(493-498)agent/agent.go (2)
AgentOption(136-136)Agent(65-117)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-271)
principal/redisproxy/redisproxy.go (1)
internal/logging/logging.go (3)
Error(305-307)Warn(300-302)Trace(285-287)
internal/argocd/cluster/manager_test.go (1)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
NewFakeKubeClient(31-44)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/options.go (6)
WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSInsecure(543-548)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
principal/options.go (5)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/apis/auth/auth.go (2)
ServerOption(56-56)Server(33-39)principal/mocks/ServerOption.go (1)
ServerOption(11-13)principal/server.go (1)
Server(72-164)internal/tlsutil/kubernetes.go (2)
TLSCertFromSecret(44-66)X509CertPoolFromSecret(106-128)
🪛 LanguageTool
docs/configuration/redis-tls.md
[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...
(AI_HYDRA_LEO_MISSING_COMMA)
test/e2e/README.md
[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md
229-229: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Lint Go code
- GitHub Check: Build & cache Go code
- GitHub Check: Run unit tests
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (37)
install/helm-repo/argocd-agent-agent/values.yaml (1)
138-162: Verify that referenced secrets and container paths are properly mounted in deployment templates.The configuration references the secret
argocd-redis-tlsand container path/app/config/redis-tls/ca.crtfor Redis TLS. Ensure that:
- The secret
argocd-redis-tlsis created (either by the Helm templates or installation scripts).- The Helm templates include volumeMounts to mount this secret at the specified path.
- The pod selectors used in networkPolicy (e.g.,
app.kubernetes.io/name: argocd-redisandapp.kubernetes.io/name: argocd-agent-agent) match the actual pod labels in the deployments.internal/argocd/cluster/manager_test.go (2)
57-57: LGTM!The
NewManagercall correctly matches the updated signature with the newredisCompressionTypeandtlsConfigparameters. PassingnilfortlsConfigis appropriate for unit tests that don't require TLS encryption.
78-78: Consistent test initialization.The test correctly uses the same pattern as
Test_StartStop, maintaining consistency across test functions.docs/configuration/redis-tls.md (1)
1-49: Comprehensive and well-structured TLS documentation.The document provides excellent coverage of Redis TLS architecture, configuration points, and the relationship between principal/agent components. The architecture diagram clearly illustrates the three TLS configuration points.
hack/dev-env/configure-redis-tls.sh (2)
81-121: Good implementation of component scaling during TLS transition.The script properly scales down ArgoCD components before enabling TLS on Redis, preventing SSL errors during the transition. Storing replica counts in a ConfigMap for later restoration is a thoughtful approach.
138-196: Well-implemented idempotency checks.The script correctly checks for existing volumes and volume mounts before patching, making it safe to run multiple times. The handling of both empty arrays and existing arrays is thorough.
hack/dev-env/setup-vcluster-env.sh (2)
159-190: Clear environment-specific Redis configuration.The branching logic for different environments (in-cluster, CI, local development) is well-documented and handles each scenario appropriately. The comments explaining why each configuration is needed are helpful.
182-186: Verify IP address detection robustness on Linux.The
ip r show defaultparsing may not work reliably on all Linux distributions or network configurations (e.g., multiple default routes, VPNs). Consider adding fallback logic or error handling to ensure the IP address detection fails gracefully when this command doesn't produce expected output.docs/getting-started/kubernetes/index.md (3)
159-230: Comprehensive Redis TLS setup documentation.The new section provides clear, step-by-step instructions for setting up Redis TLS on the control plane, including certificate generation, secret creation, deployment patching, and verification. The warning admonition properly emphasizes that TLS is required.
337-381: Good parallel structure for workload cluster TLS setup.The section correctly instructs users to reuse the same CA from Step 2.4, ensuring certificate chain consistency. The commands mirror the control plane setup appropriately.
646-646: Good addition of cross-reference.Adding the Redis TLS Configuration link to Related Documentation helps users find detailed TLS information.
hack/dev-env/configure-argocd-redis-tls.sh (3)
16-57: LGTM! Clean context-aware Redis configuration.The script correctly differentiates between control-plane (which uses Redis proxy) and agent clusters (which connect to local Redis), with proper error handling and informative messaging.
59-158: Robust idempotent patching with proper error handling.The configuration logic correctly handles both missing and existing volumes arrays, includes clear error messages, and ensures idempotency. The assumption that container index 0 is the main container aligns with standard Argo CD deployment structure.
306-355: Well-structured scaling and cleanup logic.The replica guard logic correctly ensures at least 1 replica using explicit if statements (addressing the past review comment). The cleanup of the temporary ConfigMap is a good practice. Rollout status checks with timeouts provide proper feedback.
test/e2e/README.md (1)
21-82: Clear and comprehensive workflow documentation.The step-by-step E2E test workflow is well-structured, with excellent coverage of local vs. remote cluster scenarios, reverse tunnel setup, and the distinction between port-forward and direct LoadBalancer access.
Makefile (1)
59-79: TLS configuration properly integrated with Make's default error handling.The sequential TLS setup steps rely on Make's default behavior to stop on the first non-zero exit code, which is standard practice. Combined with
set -ein the individual scripts, this provides adequate error handling.internal/argocd/cluster/cluster_test.go (1)
31-44: Test correctly updated for new NewManager signature.The test setup appropriately passes
nilfor the newtlsConfigparameter, which is suitable for test scenarios using miniredis.agent/options.go (1)
111-133: Well-structured Redis TLS configuration options.The new option functions follow the established pattern, include appropriate documentation, and correctly note that
WithRedisTLSInsecureis for testing only. The implementation properly sets fields on theredisProxyMsgHandler.internal/argocd/cluster/informer_test.go (1)
17-126: Consistent test updates for extended function signatures.All test cases properly updated to pass the compression type and
nilTLS config. The changes maintain test functionality while accommodating the new signature requirements.hack/dev-env/start-principal.sh (3)
23-29: Properly delegates port-forward to external process.The script now correctly expects an external port-forward (from Procfile.e2e or manual setup) rather than creating its own, avoiding the port conflict issue flagged in the previous review.
44-62: Robust TLS certificate validation and user guidance.The script properly checks for required TLS certificates and provides helpful guidance when they're missing. The TLS arguments correctly cover both server-side TLS (cert/key) and upstream TLS (CA path), with appropriate SANs noted in comments.
64-86: TLS arguments consistently propagated across execution modes.The dual execution path (pre-built binary vs. go run) ensures TLS arguments are applied regardless of the execution method, supporting both CI and local development workflows seamlessly.
internal/argocd/cluster/cluster.go (2)
135-142: Sensible ConnectionState initialization for new agent connections.The initialization provides appropriate defaults when ConnectionState doesn't exist, preventing nil values and ensuring consistent status reporting when cache stats are first received from a newly connected agent.
176-191: Clean TLS configuration wiring into Redis client.The TLS config is properly integrated into the Redis client options, following the standard go-redis pattern. The nullable
tlsConfigparameter correctly supports both TLS and non-TLS configurations.test/e2e/fixture/fixture.go (2)
219-375: Non-fatal cleanup errors and deep-copy usage look appropriateSwitching the various
EnsureDeletion/WaitForDeletionfailures tofmt.Printfwarnings while continuing cleanup matches the goal of not failing tests due to residual resources, and the use ofDeepCopy()when changing namespace/name on loop variables avoids subtle aliasing issues. Just be aware that leaked resources will now only show up in logs, not as hard test failures.Would you like a small helper to aggregate and surface a summary of cleanup warnings at the end of the suite, so persistent leaks are easier to spot without failing every run?
487-501: Graceful handling of Redis unavailability during cleanupTreating
resetManagedAgentClusterInfofailures as a warning instead of a hard error is a good trade-off for E2E runs where Redis port-forwards may already be gone. The error wrapping inresetManagedAgentClusterInfoalso gives clearer diagnostics when debugging Redis-related issues.principal/server.go (1)
349-372: Redis proxy and cluster manager TLS wiring is consistent and robustThe new Redis TLS wiring in
NewServerlooks solid: server TLS is configured from either file paths or secrets, upstream TLS supports insecure mode or CA from file/pool, and the same upstream options are reused for the cluster manager viaclusterMgrRedisTLSConfigwithMinVersion: TLS1.2. The explicit warning forInsecureSkipVerifyis also helpful. No changes needed here.Also applies to: 400-428
agent/agent.go (1)
323-345: Cluster cache Redis TLS configuration matches principal-side behaviorReusing the Redis TLS options for
clusterCacheTLSConfig(with TLS 1.2 minimum, optional insecure mode, and CA loading from path) keeps agent-side cluster cache consistent with the proxy/upstream config. The warning when running insecure is a good touch. Looks good.hack/dev-env/start-e2e.sh (1)
50-123: Localhost Redis endpoint wiring and in-/out-of-cluster proxy bridge look goodUsing fixed localhost ports (6380/6381/6382) with goreman-managed port-forwards, plus the vcluster
argocd-agent-redis-proxyService/Endpoints bridge for out-of-cluster mode, is a clear and predictable setup for TLS-enabled Redis in E2E. TheREDIS_PASSWORDexport and Redis pod readiness checks are also straightforward.cmd/argocd-agent/principal.go (2)
258-299: Redis TLS configuration and upstream mode validation are well-structuredThe Redis TLS block cleanly separates server TLS (file vs secret) from upstream TLS (insecure vs CA file vs CA secret), and the
modesSetmutual-exclusivity check prevents conflicting upstream modes with a clear fatal message. Skipping counting the default CA secret name avoids spurious errors while still allowing an explicit override. This is a solid configuration surface.
430-452: CLI flags and resource-proxy TLS timeout align with the new TLS surfaceThe new Redis TLS flags (enabled-by-default, server cert/key or secret, upstream CA path/secret, and insecure flag) match the
principal.ServerOptionAPI and default to secure behavior. SwitchinggetResourceProxyTLSConfigFromKubeto a 30s timeout avoids potential hangs on secret reads without changing semantics. No issues here.Also applies to: 482-490
hack/dev-env/Procfile.e2e (1)
1-7: Procfile port-forwards and startup gating match the new Redis topologyThe added
pf-*entries and the principal/agent processes that wait for local Redis ports (6380/6381/6382) before starting line up cleanly with the localhost-based TLS endpoints configured instart-e2e.sh. This should make Redis startup ordering much more deterministic in CI.test/e2e/fixture/cluster.go (1)
226-267: Redis cache client caching and test-focused config look fineThe
cachedRedisClientsmap with a mutex and per-address cache key is a reasonable way to avoid reconnect churn in tests. Givenappstatecache.Cachedoesn’t expose a close API, resetting the map inCleanupRedisCachedClientsand relying on GC is an acceptable compromise for short-lived E2E runs—just ensure the suite calls this cleanup once at the end. The Redis address discovery and env overrides (MANAGED_AGENT_REDIS_ADDR,ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS) also align with the new port-forwarded TLS setup.Also applies to: 320-387
principal/options.go (1)
80-88: Redis TLS option fields and helpers integrate cleanly with existing ServerOptionsAdding Redis TLS fields into
ServerOptionsand exposingWithRedisTLSEnabled,WithRedisServerTLSFromPath/Secret, andWithRedisUpstreamTLSCAFromFile/Secret/Insecurefollows the existing options pattern and keeps TLS concerns encapsulated. Usingtlsutilfor secret-based loading is consistent with other TLS helpers. This API surface looks well-designed for both CLI wiring and tests.Also applies to: 492-548
principal/redisproxy/redisproxy.go (3)
65-128: LGTM! Well-structured TLS configuration API.The TLS fields and setter methods are cleanly designed, allowing flexible configuration via both in-memory objects and file paths. The separation between server TLS (incoming connections) and upstream TLS (outgoing connections) is clear and appropriate.
157-200: LGTM! Clean TLS listener setup.The Start() method correctly branches between TLS and non-TLS listener creation with appropriate error handling and logging. The distinction between modes is clear at startup.
853-894: Note: Upstream TLS mismatch concern previously flagged.A past review comment flagged that when
rp.tlsEnabledis true but no upstream TLS configuration is provided, the connection to principal Redis will be unencrypted. This creates a potential security gap where the proxy accepts encrypted connections from Argo CD but forwards traffic in plaintext to Redis.The past review suggested adding a warning log when this mismatch occurs. Consider addressing this feedback to improve the security posture.
Based on learnings, past review comments should be referenced when similar concerns are identified.
fec1462 to
bf0d4f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
Makefile (1)
59-79: Error handling still missing between TLS configuration steps.The sequential script calls lack error handling—if any script fails (cert generation, Redis config, or ArgoCD config), subsequent steps still execute, potentially leaving the environment partially configured. The past review suggested chaining commands with
||to fail fast, but this appears unaddressed in the current code.Apply error handling to fail fast:
@echo "" @echo "Configuring Redis TLS (required for E2E)..." - ./hack/dev-env/gen-redis-tls-certs.sh + ./hack/dev-env/gen-redis-tls-certs.sh || (echo "ERROR: Certificate generation failed" && exit 1) @echo "" @echo "Configuring each cluster for Redis TLS (Redis + ArgoCD components together)" @echo "Note: Redis and ArgoCD components are configured together per-cluster to avoid" @echo " connection errors during the transition period." @echo "" @echo "=== Control Plane ===" - ./hack/dev-env/configure-redis-tls.sh vcluster-control-plane - ./hack/dev-env/configure-argocd-redis-tls.sh vcluster-control-plane + ./hack/dev-env/configure-redis-tls.sh vcluster-control-plane || (echo "ERROR: Redis TLS config failed for control-plane" && exit 1) + ./hack/dev-env/configure-argocd-redis-tls.sh vcluster-control-plane || (echo "ERROR: ArgoCD TLS config failed for control-plane" && exit 1) @echo "" @echo "=== Agent Managed ===" - ./hack/dev-env/configure-redis-tls.sh vcluster-agent-managed - ./hack/dev-env/configure-argocd-redis-tls.sh vcluster-agent-managed + ./hack/dev-env/configure-redis-tls.sh vcluster-agent-managed || (echo "ERROR: Redis TLS config failed for agent-managed" && exit 1) + ./hack/dev-env/configure-argocd-redis-tls.sh vcluster-agent-managed || (echo "ERROR: ArgoCD TLS config failed for agent-managed" && exit 1) @echo "" @echo "=== Agent Autonomous ===" - ./hack/dev-env/configure-redis-tls.sh vcluster-agent-autonomous - ./hack/dev-env/configure-argocd-redis-tls.sh vcluster-agent-autonomous + ./hack/dev-env/configure-redis-tls.sh vcluster-agent-autonomous || (echo "ERROR: Redis TLS config failed for agent-autonomous" && exit 1) + ./hack/dev-env/configure-argocd-redis-tls.sh vcluster-agent-autonomous || (echo "ERROR: ArgoCD TLS config failed for agent-autonomous" && exit 1)hack/dev-env/configure-redis-tls.sh (2)
68-70: Verify context switch succeeded before proceeding.If
kubectl config use-contextfails (context doesn't exist or kubectl error), the script continues and may operate on the wrong cluster. This is dangerous in a multi-cluster setup.Add error checking:
# Switch context echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }
198-206: Fail when Redis password secret is missing.Continuing with an empty password when the
argocd-redissecret is missing will cause Argo CD components to fail with NOAUTH errors. For E2E environments, the password secret should exist before Redis TLS configuration. Fail fast to surface the missing prerequisite.Apply this fix:
# Get the Redis password from the secret REDIS_PASSWORD=$(kubectl -n ${NAMESPACE} get secret argocd-redis -o jsonpath='{.data.auth}' | base64 --decode 2>/dev/null || echo "") if [ -z "$REDIS_PASSWORD" ]; then - echo "Warning: Redis password not found in secret argocd-redis" - echo "Redis will be configured without password authentication" - REDIS_PASSWORD="" + echo "Error: Redis password not found in secret argocd-redis" + echo "Redis password is required for secure configuration" + exit 1 fiagent/agent.go (1)
445-460: Guard against zerocacheRefreshIntervalbefore creating ticker.
time.NewTicker(a.cacheRefreshInterval)will panic ifcacheRefreshIntervalis zero or negative ("non-positive interval for NewTicker"). If noAgentOptionsets this field, the goroutine will crash at runtime.Add validation:
+ // Validate cache refresh interval + interval := a.cacheRefreshInterval + if interval <= 0 { + interval = 30 * time.Second + log().Warnf("cacheRefreshInterval not set, using default: %v", interval) + } + // Send initial update immediately on startup (don't wait for first ticker) a.addClusterCacheInfoUpdateToQueue() - ticker := time.NewTicker(a.cacheRefreshInterval) + ticker := time.NewTicker(interval)principal/redisproxy/redisproxy.go (1)
846-850: Note: Timeout concerns previously flagged.Past reviews identified missing timeouts on the TCP dial (lines 846-850) and TLS handshake (lines 886-890). These remain valid concerns but have already been documented.
Also applies to: 886-890
🧹 Nitpick comments (1)
cmd/argocd-agent/principal.go (1)
272-286: Consider clarifying the default secret name exclusion logic.The mutual exclusivity validation at line 281 excludes the default secret name
"argocd-redis-tls"from the count. This is necessary because the flag has a default value (line 447), but the logic is subtle and could confuse future maintainers.Consider adding a comment explaining this:
// Only count non-default secret name to allow default value if redisUpstreamTLSCASecretName != "" && redisUpstreamTLSCASecretName != "argocd-redis-tls" { + // Note: We skip counting the default value because the flag always has a value + // (see line 447). This allows users to specify --redis-upstream-ca-path + // without explicitly clearing the default secret name. modesSet++ }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/setup-vcluster-env.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- hack/dev-env/setup-vcluster-env.sh
🚧 Files skipped from review as they are similar to previous changes (11)
- principal/listen.go
- test/e2e/rp_test.go
- principal/resource.go
- install/helm-repo/argocd-agent-agent/values.schema.json
- hack/dev-env/start-e2e.sh
- test/e2e/fixture/argoclient.go
- hack/dev-env/start-agent-autonomous.sh
- cmd/argocd-agent/agent.go
- hack/dev-env/start-principal.sh
- hack/dev-env/gen-redis-tls-certs.sh
- test/e2e/redis_proxy_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/run-e2e.shMakefilehack/dev-env/Procfile.e2ehack/dev-env/start-agent-managed.shtest/e2e/README.md
🧬 Code graph analysis (6)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (2)
cleanup(39-41)apply(94-247)
agent/agent.go (2)
internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
Event(112-115)internal/logging/logfields/logfields.go (1)
Event(34-34)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(42-56)AgentManagedName(37-37)AgentClusterServerURL(39-39)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (4)
HasConnectionStatus(60-74)AgentManagedName(37-37)ClusterDetails(42-56)AgentAutonomousName(38-38)
🪛 LanguageTool
docs/configuration/redis-tls.md
[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...
(AI_HYDRA_LEO_MISSING_COMMA)
test/e2e/README.md
[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
test/e2e/README.md
32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run unit tests
- GitHub Check: Run end-to-end tests
- GitHub Check: Build & cache Go code
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (26)
principal/tracker/tracking.go (1)
75-78: Good concurrency fix—verify deadlock resolution and single-response guarantee.Buffering the channel with size 1 is a valid pattern to decouple sender and receiver timing, preventing deadlock when the send occurs before the receive is ready. This is appropriate for the request/response tracking pattern.
Please verify:
- The deadlock is actually resolved with this change (e.g., through stress testing or reproduction steps).
- The usage pattern guarantees exactly one response per tracked request—if multiple events could be sent to the same channel, the second would block (or be lost), since the buffer size is 1.
internal/argocd/cluster/cluster.go (2)
135-142: LGTM!Defensive initialization of
ConnectionStatewhen it doesn't exist yet is appropriate. This ensures cluster info always has a valid connection state after cache stats are updated.
176-185: LGTM!TLS configuration is cleanly threaded through to the Redis client. The signature change is backward-compatible (callers can pass nil for non-TLS connections), and the implementation properly wires TLS into Redis options.
hack/dev-env/configure-redis-tls.sh (1)
61-66: LGTM!Certificate validation comprehensively checks all required TLS files (CA, certificate, and key) before proceeding. This prevents kubectl secret creation from failing with unclear errors.
hack/dev-env/start-agent-managed.sh (2)
37-61: LGTM!The script gracefully handles TLS configuration with clear user guidance. The Redis address defaults are appropriate for local development, and the port-forward requirements are well-documented. This developer-friendly approach helps prevent common setup mistakes.
63-91: LGTM!Certificate extraction and agent invocation properly wired for TLS. The
/tmpstorage for certificates is acceptable for development environments, and all necessary TLS flags are passed to the agent.test/e2e/fixture/fixture.go (3)
110-156: LGTM!Extended timeouts (60s → 120s) appropriately accommodate TLS handshake overhead and slower cluster operations in E2E environments. This prevents spurious timeout failures while maintaining reasonable bounds.
201-492: LGTM!Converting cleanup errors to warnings (rather than failing fast) ensures E2E test teardown completes as much as possible even when resources are partially unavailable. This is particularly appropriate when Redis connections may be unstable (e.g., port-forward died), preventing orphaned resources from accumulating across test runs.
236-267: LGTM!Using
DeepCopy()before adjusting namespaces prevents mutation of loop variables—a classic Go pitfall. This ensures each wait operation acts on an independent copy with the correct namespace.test/e2e/README.md (1)
83-108: LGTM!The Redis TLS documentation accurately describes the automatic setup and manual reconfiguration procedures. The scripts referenced (
gen-redis-tls-certs.sh,configure-redis-tls.sh,configure-argocd-redis-tls.sh) are provided in this PR. The explanation ofInsecureSkipVerifyusage in test fixtures versus proper TLS validation in agents/principal is clear and appropriate.Note: Past review comments flagged these scripts as non-existent, but they're introduced in this PR.
agent/agent.go (1)
323-349: LGTM!TLS configuration for cluster cache follows established patterns and includes proper error handling. CA loading validates the certificate pool, and the warning log for insecure mode (Line 330) provides appropriate security awareness, matching the principal implementation.
hack/dev-env/Procfile.e2e (1)
1-7: LGTM!Process orchestration properly coordinates TLS-enabled E2E testing. Port-forwards enable Redis TLS connections via localhost (matching certificate SANs), and staggered startup delays (3s for principal, 5s for agents) ensure proper initialization ordering. Environment variables correctly pass cluster-specific Redis addresses to agent startup scripts.
cmd/argocd-agent/principal.go (1)
430-452: LGTM! Well-structured Redis TLS flag definitions.The flag definitions follow consistent patterns with appropriate defaults (TLS enabled by default) and clear help text. The environment variable mappings align with the codebase conventions.
test/e2e/fixture/cluster.go (3)
182-201: LGTM! Appropriate TLS configuration for E2E tests.The use of
InsecureSkipVerify: trueis well-documented in the comments and aligns with the PR objectives, which note this accommodation for dynamic LoadBalancer addresses in E2E tests while preserving TLS encryption.
206-224: LGTM! Well-tuned connection pool settings for E2E tests.The timeout and pool size configuration is appropriately tuned for E2E test conditions with port-forward latency. The comments clearly explain the rationale for increased values, making future adjustments easier.
232-257: LGTM! Effective Redis client caching for E2E tests.The caching mechanism prevents connection leaks by reusing Redis clients across test operations. The cache key construction (source + address) is appropriate for distinguishing between different Redis instances.
test/e2e/clusterinfo_test.go (1)
108-115: LGTM! Appropriate timeout adjustments for port-forward latency.The increased timeouts (30s→60s, 1s→2s polling) are well-justified by the inline comments explaining port-forward latency in long test runs. The consistent application across related assertions ensures reliable test behavior.
Also applies to: 123-129, 136-142
docs/configuration/redis-tls.md (1)
1-700: LGTM! Comprehensive and well-structured Redis TLS documentation.This documentation provides excellent coverage of Redis TLS configuration across development, E2E testing, and production scenarios. The architecture diagrams, troubleshooting guidance, and security best practices sections are particularly valuable.
Note: The markdown linting issues (missing language tags on fenced code blocks at lines 150, 475, 486, 504) were already addressed in previous commits per past review comments.
docs/getting-started/kubernetes/index.md (1)
159-230: LGTM! Clear and actionable Redis TLS setup instructions.The step-by-step Redis TLS configuration for both control plane and workload clusters is well-structured. The inclusion of verification commands and the cross-reference to the detailed Redis TLS configuration guide provides good user experience.
Also applies to: 337-381
test/run-e2e.sh (2)
62-73: LGTM! Robust TLS validation using jq.The JSON parsing with jq (lines 64-65) properly validates both the TLS port argument and TLS volume configuration, addressing the past review concern about fragile text grep. The error messages provide clear diagnostics and next steps.
89-122: LGTM! Helpful port-forward detection for local development.The macOS-specific port-forward detection provides clear guidance for local development scenarios while allowing CI environments with MetalLB to proceed without warnings. The environment variable setup is well-documented.
hack/dev-env/configure-argocd-redis-tls.sh (2)
316-325: LGTM! Clear replica count validation.The explicit if statements properly ensure at least 1 replica for each component, addressing the past review concern about operator precedence in compound expressions. The logic is now clear and unambiguous.
37-57: LGTM! Appropriate redis.server configuration logic.The conditional logic correctly sets
redis.serverfor agent clusters while preserving the Redis proxy configuration for the control plane. The debug output helps troubleshoot configuration issues.principal/redisproxy/redisproxy.go (3)
98-128: LGTM! Clean TLS configuration API.The new public methods provide a clear interface for configuring Redis proxy TLS. The separation of server TLS (SetServerTLS, SetServerTLSFromPath) and upstream TLS (SetUpstreamTLSCA, SetUpstreamTLSCAPath, SetUpstreamTLSInsecure) is well-designed.
130-154: LGTM! Robust TLS config creation with proper error handling.The
createServerTLSConfigmethod handles both file-based and in-memory certificate sources with clear error messages. SettingMinVersion: tls.VersionTLS12is a good security baseline.
162-183: LGTM! Clear conditional TLS listener setup.The TLS-enabled listener setup is well-structured with appropriate logging distinguishing TLS from non-TLS modes. The error handling provides clear diagnostics.
bf0d4f8 to
e4b8ca8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/clusterinfo_test.go (1)
150-156: Inconsistent timeout for autonomous agent re-connection check.The re-connection check for the autonomous agent still uses
30*time.Second, 1*time.Second, while the managed agent re-connection check (Lines 108-115) was increased to60*time.Second, 2*time.Second. This asymmetry may cause flaky tests in TLS-enabled environments.Apply this diff for consistency:
// Verify that connection status is updated again when agent is re-connected requires.Eventually(func() bool { return fixture.HasConnectionStatus(fixture.AgentAutonomousName, appv1.ConnectionState{ Status: appv1.ConnectionStatusSuccessful, Message: fmt.Sprintf(message, fixture.AgentAutonomousName, "connected"), ModifiedAt: &metav1.Time{Time: time.Now()}, }, clusterDetail) - }, 30*time.Second, 1*time.Second) + }, 60*time.Second, 2*time.Second) }
♻️ Duplicate comments (5)
principal/redisproxy/redisproxy.go (2)
845-850: Add timeout to TCP dial operation.The
net.DialTCPcall has no timeout, which can cause connection attempts to hang indefinitely if the upstream Redis is unresponsive. This blocks the goroutine handling the Argo CD connection.Apply this diff to add a dial timeout:
- // Dial the resolved address - conn, err := net.DialTCP("tcp", nil, addr) + // Dial the resolved address with timeout + dialer := &net.Dialer{ + Timeout: 30 * time.Second, + } + connTmp, err := dialer.Dial("tcp", addr.String()) if err != nil { logCtx.WithError(err).WithField("redisAddress", rp.principalRedisAddress).Error("Connection error") return nil, fmt.Errorf("unable to connect to redis '%s': %w", rp.principalRedisAddress, err) } + conn := connTmp.(*net.TCPConn)
886-890: Add timeout to TLS handshake.The TLS handshake has no timeout, which can cause connections to hang indefinitely if the upstream Redis TLS endpoint is unresponsive during negotiation.
Apply this diff to add a handshake timeout:
+ // Set deadline for handshake + if err := conn.SetDeadline(time.Now().Add(30 * time.Second)); err != nil { + conn.Close() + return nil, fmt.Errorf("failed to set handshake deadline: %w", err) + } + tlsConn := tls.Client(conn, tlsConfig) if err := tlsConn.Handshake(); err != nil { conn.Close() return nil, fmt.Errorf("TLS handshake failed: %w", err) } + + // Clear deadline after successful handshake + if err := tlsConn.SetDeadline(time.Time{}); err != nil { + tlsConn.Close() + return nil, fmt.Errorf("failed to clear handshake deadline: %w", err) + }hack/dev-env/configure-redis-tls.sh (1)
202-206: Empty Redis password proceeds without authentication.This continues with an empty password when the secret is missing. While this provides dev flexibility, ArgoCD components may fail with NOAUTH errors if they expect authentication.
Consider failing fast if password authentication is required in your environment, or document this behavior clearly.
test/e2e/fixture/cluster.go (1)
259-267: Cleanup still doesn't explicitly close Redis connections.This concern from the previous review remains unaddressed. The function only clears the map and relies on garbage collection to clean up connections. As noted in the previous review,
appstatecache.Cachemay not expose aClose()method, making explicit cleanup difficult without tracking the underlyingredis.Clientinstances separately.If explicit connection cleanup is needed, consider:
- Verifying whether
appstatecache.Cacheor the underlying Redis client can be closed- Tracking
*redis.Clientinstances alongside cache instances and closing them explicitly- Documenting that garbage collection handles cleanup if explicit close isn't feasible
agent/agent.go (1)
445-460: Guard against zerocacheRefreshIntervalbefore creating ticker.This concern from the previous review remains unaddressed. If
cacheRefreshIntervalis not set by anyAgentOption,time.NewTicker(a.cacheRefreshInterval)at line 450 will panic withnon-positive interval for NewTicker.Consider applying the previously suggested fix:
- ticker := time.NewTicker(a.cacheRefreshInterval) + interval := a.cacheRefreshInterval + if interval <= 0 { + interval = 30 * time.Second + } + ticker := time.NewTicker(interval)Alternatively, initialize
a.cacheRefreshIntervalto a sensible default inNewAgent.
🧹 Nitpick comments (5)
principal/redisproxy/redisproxy.go (2)
853-854: Consider warning when server TLS enabled without upstream TLS.The condition
if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure)means if the Redis proxy server has TLS enabled but no upstream TLS configuration is provided, it will connect to the principal's Redis over plain TCP. This could expose sensitive data in transit within the cluster.Consider adding a warning when this mismatch occurs:
+ // Warn if server TLS is enabled but no upstream TLS configured + hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure + if rp.tlsEnabled && !hasUpstreamTLSConfig { + logCtx.Warn("Redis proxy server has TLS enabled, but no upstream TLS configuration provided. Connection to principal Redis will be unencrypted.") + } + // If TLS is enabled for upstream, wrap the connection with TLS - if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) { + if rp.tlsEnabled && hasUpstreamTLSConfig {
858-877: Consider warning if CA configured but ignored due to InsecureSkipVerify.The
else ifstructure means ifrp.upstreamTLSInsecureis true, any configured CA pool or CA path is silently ignored. While this may be intentional for test environments, it could lead to confusion.Consider logging a warning if CA configuration is provided but ignored:
if rp.upstreamTLSInsecure { logCtx.Warn("INSECURE: Not verifying upstream Redis TLS certificate") tlsConfig.InsecureSkipVerify = true + if rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" { + logCtx.Warn("CA configuration provided but ignored due to InsecureSkipVerify=true") + }hack/dev-env/configure-redis-tls.sh (1)
68-70: Context switch error handling relies onset -e.While
set -ewill cause the script to exit on failure, adding explicit error handling provides clearer feedback when the context doesn't exist.# Switch context echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }test/e2e/redis_proxy_test.go (2)
120-123: Sleep as race condition workaround.The 5-second sleep addresses a race between SSE stream establishment and Redis subscription activation. While pragmatic, a more robust approach would be to verify subscription is active before proceeding (e.g., receiving an initial heartbeat or confirmation message).
Consider adding a mechanism to confirm subscription is active rather than relying on a fixed delay, which may be insufficient under high load or too long in fast environments.
187-208: Duplicate drain logic between test methods.The SSE message drain logic is nearly identical between
Test_RedisProxy_ManagedAgent_ArgoandTest_RedisProxy_AutonomousAgent_Argo. Consider extracting to a helper function.func drainChannelForPod(t *testing.T, msgChan chan string, podName string) bool { messagesDrained := false for { select { case msg := <-msgChan: messagesDrained = true t.Logf("Processing SSE message (looking for pod %s)", podName) if strings.Contains(msg, podName) { t.Logf("Found new pod name in SSE stream: %s", podName) return true } default: if messagesDrained { t.Log("Drained all available messages, pod not found yet, will retry...") } return false } } }Also applies to: 406-427
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/setup-vcluster-env.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/fixture/toxyproxy.go(1 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
- principal/resource.go
- principal/tracker/tracking.go
- principal/listen.go
- hack/dev-env/setup-vcluster-env.sh
- test/e2e/fixture/argoclient.go
- test/run-e2e.sh
- install/helm-repo/argocd-agent-agent/values.schema.json
- test/e2e/rp_test.go
- hack/dev-env/start-e2e.sh
- Makefile
- cmd/argocd-agent/agent.go
- hack/dev-env/configure-argocd-redis-tls.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/dev-env/start-principal.shhack/dev-env/start-agent-managed.shtest/e2e/README.mdhack/dev-env/Procfile.e2e
🧬 Code graph analysis (5)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (2)
cleanup(39-41)apply(94-247)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(42-56)AgentManagedName(37-37)AgentClusterServerURL(39-39)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (4)
HasConnectionStatus(60-74)AgentManagedName(37-37)ClusterDetails(42-56)AgentAutonomousName(38-38)
agent/agent.go (2)
internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
cmd/argocd-agent/principal.go (4)
principal/options.go (6)
WithInformerSyncTimeout(439-444)WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)agent/options.go (1)
WithRedisTLSEnabled(112-117)internal/logging/logging.go (1)
Warn(300-302)internal/env/env.go (3)
DurationWithDefault(168-177)BoolWithDefault(30-39)StringWithDefault(46-55)
🪛 LanguageTool
test/e2e/README.md
[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...
(AI_HYDRA_LEO_MISSING_COMMA)
docs/configuration/redis-tls.md
[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...
(AI_HYDRA_LEO_MISSING_COMMA)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build & cache Go code
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (17)
internal/argocd/cluster/cluster.go (2)
135-142: Good defensive initialization of ConnectionState.This ensures that when
SetClusterCacheStatsis called beforeSetAgentConnectionStatus(e.g., agent just connected and sends cache stats first), the ConnectionState is properly initialized rather than left empty.
176-185: TLS configuration properly wired to Redis client.The signature change and TLSConfig wiring are appropriate. The go-redis library handles
nilTLSConfig gracefully (no TLS when nil).hack/dev-env/start-agent-autonomous.sh (1)
85-85:--insecure-tlsskips server certificate verification.This is acceptable for local development but should not be used in production. The flag is appropriately placed for the dev script.
hack/dev-env/gen-redis-tls-certs.sh (1)
1-150: Well-structured certificate generation script.The script addresses previous review feedback (stderr handling, LOCAL_IP conditional). It's idempotent, uses appropriate validity periods, and has clear separation for each certificate type.
test/e2e/fixture/fixture.go (4)
110-113: Timeout increases appropriate for TLS environments.Doubling the wait iterations from 60 to 120 seconds accommodates the additional latency from TLS handshakes and port-forward operations in the test environment.
232-241: Warning-and-continue pattern for cleanup resilience.Changing from hard failures to warnings during cleanup improves test stability, especially in TLS environments where transient connection issues are more common. However, this may mask legitimate issues.
Ensure test logs are monitored for recurring warnings that might indicate systemic problems rather than transient issues.
317-325: Good use of DeepCopy to avoid modifying loop variables.Using
DeepCopy()before modifying the namespace/name prevents unintended side effects on the original loop variable.
487-491: Graceful handling of Redis unavailability during cleanup.Logging a warning instead of failing when Redis is unavailable (e.g., port-forward died) prevents cleanup failures from cascading to test failures.
test/e2e/redis_proxy_test.go (3)
586-588: Buffered channel prevents message loss.The buffer size of 100 is reasonable for preventing message loss when the consumer is temporarily slow. This aligns with the drain-and-retry pattern used in the tests.
211-237: ResourceTree retry pattern handles transient errors.Wrapping the ResourceTree call in
Eventuallywith proper error logging handles transient Redis connection issues (EOF errors) gracefully.
642-653: HTTP client properly configured for SSE streams.Setting
Timeout: 0andResponseHeaderTimeout: 0is correct for SSE streams that are long-lived. TheIdleConnTimeout: 300shelps maintain connection pools.hack/dev-env/Procfile.e2e (1)
1-7: LGTM! Port-forward and startup configuration supports TLS setup.The port-forward entries and updated startup commands with sleep delays properly support the TLS-enabled Redis configuration. The sleep delays ensure proper startup ordering (principal starts first, then agents), and the environment variable overrides for Redis addresses align with the TLS configuration changes.
agent/agent.go (1)
323-343: LGTM! TLS configuration correctly mirrors Redis proxy setup.The TLS configuration for the cluster cache Redis client is well-structured:
- Warning log added when using InsecureSkipVerify (line 330), addressing the previous review feedback
- CA certificate loading correctly reads the file, creates a cert pool, and parses the PEM data
- Error handling appropriately reports read and parse failures
The implementation aligns with the Redis proxy TLS logic and properly integrates with the updated
NewClusterCacheInstancesignature.test/e2e/fixture/cluster.go (4)
181-201: LGTM! TLS configuration appropriate for E2E testing.The TLS configuration correctly enables encryption for both principal and managed-agent Redis connections when the respective TLS flags are set. The use of
InsecureSkipVerify: trueis intentional for E2E tests (as noted in the PR description) to accommodate dynamic LoadBalancer addresses while preserving TLS encryption.
206-217: LGTM! Generous timeouts and pool settings appropriate for E2E tests.The extended timeouts (ReadTimeout: 30s, DialTimeout: 10s) and connection pool configuration (PoolSize: 10, retry settings) are well-justified by the inline comments. These settings accommodate port-forward latency and concurrent test operations, improving E2E test stability.
226-257: LGTM! Caching mechanism prevents connection proliferation.The caching mechanism with mutex-protected map access correctly prevents creating multiple Redis clients for the same source and address. The cache key construction properly distinguishes between principal and managed-agent clients.
298-327: LGTM! Redis address resolution with appropriate fallbacks.Both
getManagedAgentRedisConfigandgetPrincipalRedisConfigimplement sensible fallback logic:
- Primary: LoadBalancer ingress (IP or Hostname)
- Secondary:
spec.LoadBalancerIP(for local vcluster development)- Tertiary: ClusterIP (last resort)
The environment variable overrides (
MANAGED_AGENT_REDIS_ADDR,ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS) provide flexibility for local development with port-forward scenarios, while TLS is appropriately enabled by default for E2E tests.Also applies to: 359-387
| # Extract mTLS client certificates and CA from Kubernetes secret for agent authentication | ||
| echo "Extracting mTLS client certificates and CA from Kubernetes..." | ||
| TLS_CERT_PATH="/tmp/agent-autonomous-tls.crt" | ||
| TLS_KEY_PATH="/tmp/agent-autonomous-tls.key" | ||
| ROOT_CA_PATH="/tmp/agent-autonomous-ca.crt" | ||
| kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \ | ||
| -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" | ||
| kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \ | ||
| -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}" | ||
| kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \ | ||
| -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}" | ||
| echo " mTLS client certificates and CA extracted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certificate extraction lacks error handling and uses world-readable paths.
The kubectl commands extract sensitive TLS credentials to /tmp without error checking. If the secrets don't exist, the script continues with empty files, causing cryptic TLS errors later.
Consider adding error handling:
# Extract mTLS client certificates and CA from Kubernetes secret for agent authentication
echo "Extracting mTLS client certificates and CA from Kubernetes..."
TLS_CERT_PATH="/tmp/agent-autonomous-tls.crt"
TLS_KEY_PATH="/tmp/agent-autonomous-tls.key"
ROOT_CA_PATH="/tmp/agent-autonomous-ca.crt"
-kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
- -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}"
-kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
- -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}"
-kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \
- -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}"
+kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
+ -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" || { echo "Failed to extract TLS cert"; exit 1; }
+kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
+ -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}" || { echo "Failed to extract TLS key"; exit 1; }
+kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \
+ -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}" || { echo "Failed to extract CA cert"; exit 1; }
echo " mTLS client certificates and CA extracted"🤖 Prompt for AI Agents
In hack/dev-env/start-agent-autonomous.sh around lines 63 to 74, the kubectl
extraction of TLS certs writes sensitive data to predictable, world-readable
/tmp paths and lacks error handling; update the script to create secure temp
files (use mktemp), restrict permissions to 600 (or set umask) before writing,
check each kubectl command's exit status and verify the resulting file is
non-empty, print a clear error and exit non-zero on failure, and add a trap to
securely remove the temp files on script exit.
e4b8ca8 to
5147959
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/fixture/argoclient.go (1)
316-338: Potential silent failure when endpoint is empty.When
ARGOCD_SERVER_ADDRESSis not set and the Service has neitherLoadBalancerIPnor Ingress hostname,argoEndpointwill be an empty string, yet the function returnsnilerror. Callers may not expect an empty string endpoint.Consider validating before returning:
+ if argoEndpoint == "" { + return "", fmt.Errorf("unable to determine argocd-server endpoint: no LoadBalancerIP or Ingress hostname found") + } + return argoEndpoint, nil }
♻️ Duplicate comments (6)
hack/dev-env/configure-argocd-redis-tls.sh (1)
164-182: Inconsistent volume array handling for argocd-repo-server.Unlike
argocd-server(lines 67-108) which checks if the volumes array exists before deciding whether to create or append,argocd-repo-serverdirectly uses"path": "/spec/template/spec/volumes/-"which will fail if the volumes array doesn't exist. The past review comment flagged this as addressed, but the current code still shows the inconsistency.Consider applying the same defensive pattern used for argocd-server:
if ! kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes[?(@.name=="redis-tls-ca")]}' | grep -q "redis-tls-ca"; then echo " Adding redis-tls-ca volume..." + + # Check if volumes array exists + VOLUMES_EXIST=$(kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes}' 2>/dev/null || echo "") + + if [ -z "$VOLUMES_EXIST" ] || [ "$VOLUMES_EXIST" = "null" ]; then + # Create volumes array with first element + if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[ + { + "op": "add", + "path": "/spec/template/spec/volumes", + "value": [{ ... }] + } + ]'; then + echo " ERROR: Failed to create volumes array" + exit 1 + fi + else if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[test/e2e/fixture/cluster.go (1)
259-267: CleanupRedisCachedClients doesn't explicitly close connections.This was flagged in a past review. The
appstatecache.Cachewraps a Redis client that should ideally be closed explicitly. Ifappstatecache.Cachedoesn't expose aClose()method, consider tracking the underlyingredis.Clientseparately for proper cleanup, or verify that garbage collection handles this appropriately for test scenarios.#!/bin/bash # Check if appstatecache.Cache or its underlying types expose a Close method ast-grep --pattern 'func ($RECV *Cache) Close() $_' # Also check the redis client interface rg -n "type.*Client.*interface" --type go -A 20 | head -50agent/agent.go (1)
445-460: Guard against zerocacheRefreshIntervalbefore creating ticker.The goroutine uses
time.NewTicker(a.cacheRefreshInterval)without validating that the interval is positive. IfcacheRefreshIntervalis not set via options, this will panic with "non-positive interval for NewTicker".Apply a guard before creating the ticker:
go func() { // Send initial update immediately on startup (don't wait for first ticker) a.addClusterCacheInfoUpdateToQueue() + interval := a.cacheRefreshInterval + if interval <= 0 { + interval = 10 * time.Second // Default to 10 seconds if not configured + } - ticker := time.NewTicker(a.cacheRefreshInterval) + ticker := time.NewTicker(interval) defer ticker.Stop()#!/bin/bash # Check if cacheRefreshInterval has a default value set in options or NewAgent rg -n "cacheRefreshInterval" --type go -B 2 -A 2cmd/argocd-agent/principal.go (1)
277-291: Clarify upstream TLS mode validation when using the default CA secretThe mutual-exclusivity check intentionally ignores the default
argocd-redis-tlssecret name, so combinations like:
--redis-upstream-ca-path=...with the default--redis-upstream-ca-secret-name--redis-upstream-tls-insecure=truewith the default secretdo not trip
modesSet > 1even though two “modes” are effectively configured, and the secret is silently ignored. That can be surprising for users relying on the default secret.Consider either:
- Counting any non‑empty
redisUpstreamTLSCASecretName(including the default), or- Detecting whether the flag was explicitly set (via
c.Flags().Changed("redis-upstream-ca-secret-name")) and only incrementingmodesSetwhen the user actually chose it.This would make the validation behavior match the “only one mode” promise more closely and avoid silently dropping a configured CA.
principal/redisproxy/redisproxy.go (1)
839-897: Add dial + handshake timeouts and clarify insecure upstream TLS behavior
establishConnectionToPrincipalRediscurrently:
- Uses
net.DialTCPwith no timeout, and- Performs
tlsConn.Handshake()with no deadline,so a slow or blackholed upstream Redis can block this goroutine indefinitely. In addition, when
upstreamTLSInsecureis true, any configured CA (pool or path) is silently ignored.Consider:
- Replacing
net.DialTCPwith anet.Dialer(ornet.DialTimeout) using a reasonable connect timeout, and- Setting a deadline on
conn(ortlsConn) beforeHandshake()and clearing it afterwards, so both connect and handshake failures fail fast instead of hanging.- Optionally logging a warning if
upstreamTLSInsecureis true whileupstreamTLSCAorupstreamTLSCAPathis also set, to make it clear that CA config is being ignored.This materially improves robustness under network issues and makes insecure mode behavior more transparent.
hack/dev-env/configure-redis-tls.sh (1)
198-207: Fail fast when Redis password is missing instead of silently configuring empty authIf
.data.authon theargocd-redissecret is missing or empty, the script logs a warning and proceeds with:REDIS_PASSWORD="" … "--requirepass", "'"${REDIS_PASSWORD}"'",so Redis is configured with an empty password. This diverges from typical Argo CD expectations (components usually assume a non‑empty password when the secret exists) and can lead to confusing NOAUTH or auth mismatch errors.
Given this script is part of the dev/E2E setup path, it would be safer to:
- Treat a missing/empty
authvalue as a hard error (print a clear message andexit 1), or- Explicitly document and require a no‑auth Redis configuration instead of silently falling back.
That keeps the TLS setup deterministic and avoids subtle runtime failures later.
🧹 Nitpick comments (7)
test/e2e/fixture/toxyproxy.go (1)
119-133: LGTM - reasonable timeout adjustment for TLS-enabled principal readiness.The extended timeout for principal (180s) appropriately accounts for the informer sync timeout (120s) mentioned in the comment. The dynamic timeout approach is clean.
Consider extracting these timeout values as named constants if they're used elsewhere or likely to change:
const ( defaultReadinessTimeout = 120 * time.Second principalReadinessTimeout = 180 * time.Second )cmd/argocd-agent/agent.go (1)
184-199: Redis TLS configuration validation is well-structured.The mutual exclusivity check between
--redis-tls-insecureand--redis-tls-ca-pathis appropriate. When TLS is enabled without either flag, the system CA pool will be used (via the defaulttls.Configbehavior inagent/agent.go), which is a reasonable default.One consideration: when TLS is enabled but neither insecure nor CA path is specified, there's no log message indicating the default behavior. Consider adding an informational log for clarity.
if redisTLSInsecure { logrus.Warn("INSECURE: Not verifying Redis TLS certificate") agentOpts = append(agentOpts, agent.WithRedisTLSInsecure(true)) } else if redisTLSCAPath != "" { logrus.Infof("Loading Redis CA certificate from file %s", redisTLSCAPath) agentOpts = append(agentOpts, agent.WithRedisTLSCAPath(redisTLSCAPath)) + } else { + logrus.Info("Redis TLS enabled with system CA pool") }test/e2e/redis_proxy_test.go (1)
120-123: Sleep-based synchronization for SSE stream establishment.The 5-second sleep is a pragmatic workaround for Redis subscription race conditions in E2E tests. While not ideal, this is acceptable for test reliability. Consider extracting this as a named constant for clarity.
+const sseStreamEstablishmentWait = 5 * time.Second + // Wait for SSE stream to fully establish and Redis SUBSCRIBE to propagate // This prevents a race condition where the pod is deleted before the subscription is active t.Log("Waiting for SSE stream to fully establish...") -time.Sleep(5 * time.Second) +time.Sleep(sseStreamEstablishmentWait)cmd/argocd-agent/principal.go (1)
259-261: Align informer-sync-timeout default behavior with help textThe flag is wired with a default of
0and only applied wheninformerSyncTimeout > 0, while the help text says “default: 60s”. In practice this means “0 = use server default (likely 60s)”, butargocd-agent principal --helpwill show0as the CLI default.Either:
- Set the env default to
60sand always pass it through, or- Clarify in the description that
0means “use built‑in default (60s)” instead of stating a literal 60s default.This avoids confusing operators reading the CLI help.
Also applies to: 434-436
docs/configuration/redis-tls.md (2)
149-156: Tag remaining fenced code blocks with a languageThe “How the tunnel works” block and the script output examples (
gen-redis-tls-certs.sh,configure-redis-tls.sh,configure-argocd-redis-tls.sh) still use bare triple‑backtick fences, which markdownlint flags (MD040).Recommend tagging them as plain text, e.g.:
- ``` + ```text Argo CD Server (remote vcluster) … - ``` + ```and similarly for the three script output sections. This keeps content unchanged while satisfying linting.
Also applies to: 475-520
331-340: Align documented principal Redis flags/defaults with the actual CLIIn the “All Principal Redis TLS Options” table:
- The flag is listed as
--redis-addr, but the principal command actually exposes--redis-server-address(seecmd/argocd-agent/principal.go).- The default for
--redis-tls-enabledis documented astrue (Kubernetes/Helm), false (CLI), while the code usesenv.BoolWithDefault("ARGOCD_PRINCIPAL_REDIS_TLS_ENABLED", true), so the CLI default is effectivelytrueas well unless overridden.To avoid confusing users, please:
- Rename the documented flag to
--redis-server-address(or explicitly mention both if you decide to add an alias), and- Update the default column for
--redis-tls-enabledto reflect the actual behavior (e.g., “true(enabled by default for all deployments)” or similar).hack/dev-env/start-agent-managed.sh (1)
37-46: Clarify when it’s acceptable to run the managed agent without Redis TLSThe script enables Redis TLS when
creds/redis-tls/ca.crtexists and otherwise logs a warning and runs without TLS:if [ -f "${SCRIPTPATH}/creds/redis-tls/ca.crt" ]; then … else echo "Redis TLS certificates not found, running without TLS" fiGiven the docs state that Redis TLS is required for all E2E tests, this silent fallback to plaintext could mask misconfigured dev/E2E environments.
Consider:
- Failing fast when TLS creds are missing in E2E flows (e.g., when
make setup-e2ehas been run or under a guard env var), or- Explicitly documenting that this script allows non‑TLS Redis only for ad‑hoc local development and that E2E runs must ensure TLS creds exist.
The mTLS client cert/CA extraction and wiring into
go run ... agentotherwise look solid.Also applies to: 49-62, 63-75, 76-86
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/setup-vcluster-env.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/application_test.go(2 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(3 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/fixture/toxyproxy.go(1 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- principal/resource.go
- Makefile
- hack/dev-env/start-agent-autonomous.sh
- hack/dev-env/start-principal.sh
- install/helm-repo/argocd-agent-agent/values.schema.json
- principal/tracker/tracking.go
- test/e2e/clusterinfo_test.go
- principal/listen.go
- hack/dev-env/setup-vcluster-env.sh
- hack/dev-env/Procfile.e2e
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/e2e/rp_test.gotest/run-e2e.shhack/dev-env/start-agent-managed.shtest/e2e/application_test.gotest/e2e/README.mdhack/dev-env/start-e2e.sh
🧬 Code graph analysis (7)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(42-56)AgentManagedName(37-37)AgentClusterServerURL(39-39)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
cleanup(39-41)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
GetArgoCDServerEndpoint(316-338)GetInitialAdminSecret(303-314)NewArgoClient(53-67)
test/e2e/fixture/argoclient.go (1)
test/e2e/fixture/kubeclient.go (1)
KubeClient(67-73)
agent/agent.go (2)
internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
cmd/argocd-agent/agent.go (4)
agent/options.go (3)
WithRedisTLSEnabled(112-117)WithRedisTLSInsecure(128-133)WithRedisTLSCAPath(120-125)principal/options.go (1)
WithRedisTLSEnabled(493-498)internal/logging/logging.go (1)
Warn(300-302)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
test/e2e/application_test.go (1)
test/e2e/fixture/argoclient.go (1)
IsArgoCDRepoServerReady(492-505)
🪛 LanguageTool
test/e2e/README.md
[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...
(AI_HYDRA_LEO_MISSING_COMMA)
docs/configuration/redis-tls.md
[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Build & cache Go code
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (30)
test/e2e/fixture/argoclient.go (1)
489-505: LGTM - clean readiness helper.The function correctly checks for at least one available replica and appropriately returns
falseon any error, which is suitable for readiness checks in test fixtures.test/e2e/rp_test.go (3)
158-169: LGTM - good refactoring to use centralized fixture helpers.Replacing inline K8s API calls with
fixture.GetArgoCDServerEndpointandfixture.GetInitialAdminSecretimproves consistency and maintainability across tests.
294-306: LGTM - consistent with other test refactoring.Same pattern applied here as in
Test_ResourceProxy_Argo, using the centralized fixture helpers for endpoint and credential retrieval.
509-510: LGTM - minor formatting improvement.No behavioral change, just cleaner request construction.
internal/argocd/cluster/cluster.go (2)
135-142: LGTM - sensible default initialization for ConnectionState.Initializing
ConnectionStatewithSuccessfulstatus when receiving cache stats (and no prior state exists) is logically correct—receiving cache info implies the agent is connected.
176-191: LGTM - TLS configuration correctly wired to Redis client.The
tlsConfig *tls.Configparameter is properly passed toredis.Options.TLSConfig. WhentlsConfigisnil, the go-redis client will use non-TLS connections, which maintains backward compatibility.Ensure that all existing callers of
NewClusterCacheInstancehave been updated to pass the newtlsConfigparameter.test/e2e/fixture/fixture.go (5)
110-113: LGTM - appropriate timeout increase for TLS-enabled environment.Doubling the deletion timeout from 60s to 120s accommodates potential TLS handshake overhead and slower Redis connections in the new TLS-enabled infrastructure.
230-241: LGTM - improved cleanup robustness and fixed potential mutation bug.Two important improvements:
- Using
DeepCopy()before modifying namespace prevents mutating the loop variable- Warning-based error handling prevents cleanup failures from cascading test failures
255-267: LGTM - consistent pattern with autonomous agent cleanup.Same DeepCopy and warning-based error handling pattern applied correctly here.
310-326: LGTM - AppProject cleanup follows the same robust pattern.DeepCopy usage for
principalAppProjectand warning-based error handling correctly applied.
494-500: LGTM - improved error wrapping.Using
%wfor error wrapping provides better error chain for debugging when Redis cache operations fail.cmd/argocd-agent/agent.go (1)
241-250: Redis TLS flags correctly wired with secure defaults.The flags use sensible defaults:
redis-tls-enableddefaults totrue(secure by default)redis-tls-insecuredefaults tofalseThis aligns with the PR objective to enable Redis TLS by default.
hack/dev-env/configure-argocd-redis-tls.sh (1)
316-325: Replica guard logic correctly implemented.The fix from the past review comment has been properly applied using explicit
ifstatements, ensuring both empty and "0" values are correctly handled.test/e2e/redis_proxy_test.go (4)
186-208: SSE message draining logic is well-structured.The drain-all-then-retry pattern correctly handles the buffered channel without blocking indefinitely. The
messagesDrainedflag ensures proper logging behavior. The 120-second timeout with 5-second intervals provides reasonable resilience for E2E tests.
210-237: ResourceTree retry logic handles transient Redis errors gracefully.Wrapping the
ResourceTreecall inEventuallywith proper nil and error checks addresses the transient EOF errors mentioned in the comments. The 30-second timeout with 2-second intervals is appropriate for this verification step.
642-653: HTTP transport configuration appropriate for SSE streams.The transport settings are well-suited for long-lived SSE connections:
Timeout: 0allows indefinite streamingIdleConnTimeout: 300skeeps connections aliveInsecureSkipVerify: trueis documented as intentional for E2E tests with dynamic LoadBalancer addresses
588-588: Buffered channel size is reasonable for E2E tests.A buffer of 100 messages should handle typical SSE event bursts. If tests become flaky due to message loss, consider increasing this or adding overflow detection.
test/e2e/fixture/cluster.go (3)
180-201: TLS configuration for E2E tests is appropriate.
InsecureSkipVerify: trueis acceptable for E2E tests where certificate validation complexity would add friction. The inline comments clearly document this is for E2E tests only.
206-217: Connection pool and timeout settings are generous for E2E stability.The increased timeouts (30s read, 10s dial/write) and pool settings (size 10, retry backoff) help handle E2E test latency and concurrent operations. These are reasonable for test environments.
319-326: Environment variable override for local development is a good addition.Allowing
MANAGED_AGENT_REDIS_ADDRoverride enables local development with port-forward while defaulting to the discovered address for E2E tests.agent/agent.go (2)
323-343: TLS configuration for cluster cache is correctly implemented.The TLS config construction properly handles:
- Insecure mode with warning log (line 330)
- CA certificate loading with error handling
- Consistency with Redis proxy TLS logic
The warning message now aligns with the principal code pattern.
19-23: New imports for TLS support are appropriate.The added imports (
crypto/tls,crypto/x509,os) are necessary for TLS configuration and CA certificate loading.test/e2e/application_test.go (1)
3-6: Repo-server readiness gate in SetupSuite looks goodWaiting up to 120s with
Require().EventuallyonIsArgoCDRepoServerReadybefore creating the Argo client is a solid way to reduce test flakiness when the repo-server is slow to become available. No issues spotted.Also applies to: 28-35
hack/dev-env/gen-redis-tls-certs.sh (1)
14-27: Redis TLS certificate generation script looks solidThe script cleanly generates a CA and per‑component Redis certificates with appropriate SANs, avoids suppressing OpenSSL errors, and conditionally adds the local IP to the proxy certificate. Cleanup of temporary CSR/EXT/SRL files at the end is also a nice touch. No changes needed from my side.
Also applies to: 34-58, 60-103, 105-135
test/run-e2e.sh (3)
32-45: Verify certificate validation completeness.The validation checks only for
ca.crton the host filesystem. The past review requested validation of all three certificate files (ca.crt,server.crt,server.key). Clarify whetherserver.crtandserver.keyare:
- Expected on the host and should be validated here, or
- Deployed to the pod by Kubernetes and thus validated only through the deployment check on lines 62–77.
If they should be present on the host, update the validation to check all three files.
62-77: Robust TLS detection using jq.The TLS configuration validation properly uses
jqto check for both the--tls-portargument andredis-tlsvolume, with clear per-condition error messages. This addresses the prior concern about fragile text-based grep matching.
88-122: macOS port-forward detection and environment configuration.The script properly detects the macOS environment, checks for required port-forwards, and sets appropriate Redis address environment variables for local development. The warning+continue approach allows for both local and CI scenarios.
hack/dev-env/start-e2e.sh (1)
50-59: Static Redis addresses and proper export handling.The script now uses static
localhostaddresses with explicit ports for all Redis endpoints, which simplifies TLS certificate validation and aligns with the port-forward strategy intest/run-e2e.sh. TheREDIS_PASSWORDassignment and export are properly separated (lines 58–59), addressing the shellcheck SC2155 concern about masking return values.test/e2e/README.md (2)
21-88: Clear multi-step workflow documentation.The README provides a well-structured, multi-step workflow covering setup, optional reverse tunnel configuration, process startup, and test execution. The optional reverse tunnel scenario (Step 1b) is clearly marked and well-explained, and the environment detection (local vs. CI) addresses platform-specific concerns. The documentation properly reflects the TLS-mandatory requirement stated on line 29.
94-105: Verify existence of referenced Redis TLS scripts in hack/dev-env/.The README documents manual reconfiguration using three scripts:
./hack/dev-env/gen-redis-tls-certs.sh./hack/dev-env/configure-redis-tls.sh./hack/dev-env/configure-argocd-redis-tls.shPast reviews flagged these scripts as non-existent. Confirm that these scripts are present in
hack/dev-env/and that they implement the documented behavior: certificate generation, Redis TLS configuration, and Argo CD component reconfiguration for each vcluster variant (control-plane, agent-managed, agent-autonomous).
5147959 to
3b0283f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (9)
principal/redisproxy/redisproxy.go (4)
846-850: Add timeout to TCP dial operation.The
net.DialTCPcall still lacks a timeout, which can cause the connection attempt to hang indefinitely if the upstream Redis is unresponsive. This blocks the goroutine handling the Argo CD connection.This issue was previously flagged and remains unresolved. See the past review comment on lines 846-850 for the suggested fix using
net.Dialerwith a timeout.
853-853: Ensure TLS is required for upstream when server TLS is enabled.The upstream TLS connection is only established when both
rp.tlsEnabledis true AND upstream TLS configuration is provided (Line 853). If the Redis proxy server has TLS enabled but no upstream TLS configuration is provided, it will connect to the principal's Redis over plain TCP, potentially exposing sensitive data in transit.This security issue was previously flagged and remains unresolved. See the past review comment on lines 853-894 for the suggested fix to enforce upstream TLS or log a warning when this configuration mismatch occurs.
858-877: InsecureSkipVerify takes precedence over CA configuration.The
if/else ifstructure means that whenrp.upstreamTLSInsecureis true, any configured CA pool or CA path is silently ignored. While this may be intentional for test environments, it could be unexpected behavior.This issue was previously flagged and remains unresolved. See the past review comment on lines 858-877 for the suggested warning log when CA is configured but ignored due to insecure mode.
886-890: Add timeout to TLS handshake.The TLS handshake has no timeout, which can cause the connection to hang indefinitely if the upstream Redis TLS endpoint is unresponsive during negotiation.
This issue was previously flagged and remains unresolved. See the past review comment on lines 886-890 for the suggested fix using
conn.SetDeadline()before and after the handshake.docs/configuration/redis-tls.md (1)
149-156: Tag remaining fenced blocks with a language (text) to satisfy markdownlintThe “How the tunnel works” diagram and the three script output examples (
gen-redis-tls-certs.sh,configure-redis-tls.sh,configure-argocd-redis-tls.sh) still use bare triple‑backtick fences, triggering MD040. Consider tagging them as plain text:- ``` + ```text Argo CD Server (remote vcluster) ... - ``` + ```and similarly for the script output sections around lines 475–520.
Also applies to: 475-483, 485-501, 503-520
docs/getting-started/kubernetes/index.md (1)
205-212: Fix$(REDIS_PASSWORD)in JSON patch examples (no expansion inside single quotes)In both Redis TLS patch examples,
--requirepassuses"$(REDIS_PASSWORD)"inside a single‑quoted-p='[...]'argument, so the shell never expands it and Redis ends up with the literal string"$(REDIS_PASSWORD)"as the password.Consider either:
- Using a clear placeholder, e.g.
"--requirepass", "<redis-password>",and explaining how to obtain it from theargocd-redissecret, or- Showing an interpolated pattern, e.g.:
REDIS_PASSWORD="$(kubectl -n argocd get secret argocd-redis -o jsonpath='{.data.auth}' | base64 -d)" kubectl patch deployment argocd-redis -n argocd --context <context> --type='json' -p="$( cat <<EOF [ {"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": [ "--save", "", "--appendonly", "no", "--requirepass", "$REDIS_PASSWORD", "--tls-port", "6379", "--port", "0", "--tls-cert-file", "/app/tls/tls.crt", "--tls-key-file", "/app/tls/tls.key", "--tls-ca-cert-file", "/app/tls/ca.crt", "--tls-auth-clients", "no" ]} ] EOF )"and apply the same fix in both Step 2.4 and Step 4.4.
Also applies to: 372-378
test/e2e/fixture/fixture.go (1)
487-491: Minor: extra leading space in warning message.Line 489 has a leading space in the format string:
" Warning: Failed...". This is inconsistent with other warning messages that start without a leading space.- fmt.Printf(" Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err) + fmt.Printf("Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)agent/agent.go (1)
445-460: Guard against zerocacheRefreshIntervalbefore creating ticker.The goroutine uses
time.NewTicker(a.cacheRefreshInterval)without ensuring the interval is > 0. If noAgentOptionsetscacheRefreshInterval, this will panic at runtime with "non-positive interval for NewTicker".go func() { // Send initial update immediately on startup (don't wait for first ticker) a.addClusterCacheInfoUpdateToQueue() + interval := a.cacheRefreshInterval + if interval <= 0 { + interval = 30 * time.Second // Default fallback + } - ticker := time.NewTicker(a.cacheRefreshInterval) + ticker := time.NewTicker(interval) defer ticker.Stop()test/e2e/fixture/cluster.go (1)
259-267: CleanupRedisCachedClients doesn't explicitly close connections.The cleanup function only clears the map, relying on garbage collection to close connections. For proper resource management, the underlying Redis clients should be explicitly closed.
Since
appstatecache.Cachedoesn't expose the underlying Redis client for closing, consider either:
- Tracking
redis.Clientinstances separately alongside the cache- Verifying through testing that GC properly closes connections
This may be acceptable for E2E tests but is worth monitoring for connection leaks during test runs.
🧹 Nitpick comments (3)
test/e2e/fixture/argoclient.go (1)
489-513: Consider refactoring the return type for idiomatic Go.The
(bool, string)return pattern is unconventional. Idiomatic Go typically uses(bool, error)or justerrorto distinguish between "not ready" states and actual failures (e.g., permission errors, deployment doesn't exist).Current behavior treats API errors the same as "deployment exists but isn't ready," which may mask actual problems in wait loops. While this might be intentional for test resilience with transient conditions, the semantic distinction would be clearer with an error type.
Consider this refactor:
-func IsArgoCDRepoServerReady(k8sClient KubeClient, namespace string) (bool, string) { +func IsArgoCDRepoServerReady(k8sClient KubeClient, namespace string) (bool, error) { ctx := context.Background() // Try to get the repo-server deployment deployment := &appsv1.Deployment{} key := types.NamespacedName{Name: "argocd-repo-server", Namespace: namespace} err := k8sClient.Get(ctx, key, deployment, metav1.GetOptions{}) if err != nil { - return false, fmt.Sprintf("Failed to get deployment: %v", err) + return false, fmt.Errorf("failed to get deployment: %w", err) } // Check if the deployment has at least one available replica if deployment.Status.AvailableReplicas > 0 { - return true, "" + return true, nil } // Return diagnostic information about why it's not ready - return false, fmt.Sprintf("Replicas: %d/%d available, Conditions: %v", + return false, fmt.Errorf("not ready - replicas: %d/%d available, conditions: %v", deployment.Status.AvailableReplicas, deployment.Status.Replicas, deployment.Status.Conditions) }This preserves diagnostic information while providing clearer error semantics for callers.
test/e2e/redis_proxy_test.go (1)
120-124: SSE stream robustness changes look good; keepInsecureSkipVerifytest‑onlyThe added wait before pod deletion, buffered SSE channel, “drain all messages then retry” logic, and
Eventuallywrappers around ResourceTree calls should all help eliminate race‑based flakiness in these Redis proxy tests.The SSE client’s
http.Transportuses&tls.Config{InsecureSkipVerify: true}, which is acceptable here since this code lives undertest/e2eand exists purely for test connectivity to dynamically addressed endpoints. Just ensure this pattern stays confined to test code and doesn’t leak into production clients.Also applies to: 186-209, 326-330, 406-456, 588-653
test/run-e2e.sh (1)
49-77: Redis TLS preflight checks look robust—consider documentingjqas a test prerequisiteThe per‑context checks for the
argocd-redis-tlssecret plus--tls-portarg andredis-tlsvolume on theargocd-redisDeployment are a solid way to enforce Redis TLS before running e2e tests.Since this now relies on
jqfor JSON inspection, it would be helpful to ensurejqis listed as a prerequisite for runningmake test-e2e(e.g., in contributor docs or a comment near the top of this script) so failures due to a missingjqbinary are less surprising.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
Makefile(1 hunks)agent/agent.go(3 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/setup-vcluster-env.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/listen.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/application_test.go(2 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(3 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(11 hunks)test/e2e/fixture/toxyproxy.go(1 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (14)
- principal/resource.go
- principal/tracker/tracking.go
- test/e2e/application_test.go
- hack/dev-env/start-agent-autonomous.sh
- test/e2e/fixture/toxyproxy.go
- principal/listen.go
- install/helm-repo/argocd-agent-agent/values.schema.json
- hack/dev-env/configure-argocd-redis-tls.sh
- hack/dev-env/start-principal.sh
- cmd/argocd-agent/agent.go
- test/e2e/rp_test.go
- hack/dev-env/setup-vcluster-env.sh
- test/e2e/clusterinfo_test.go
- Makefile
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
test/run-e2e.shtest/e2e/README.mdhack/dev-env/start-e2e.shhack/dev-env/Procfile.e2ehack/dev-env/start-agent-managed.sh
🧬 Code graph analysis (4)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (2)
cleanup(39-41)apply(94-247)
test/e2e/fixture/argoclient.go (1)
test/e2e/fixture/kubeclient.go (1)
KubeClient(67-73)
agent/agent.go (2)
internal/logging/logging.go (1)
Warn(300-302)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
principal/redisproxy/redisproxy.go (1)
internal/logging/logging.go (3)
Error(305-307)Warn(300-302)Trace(285-287)
🪛 LanguageTool
test/e2e/README.md
[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...
(AI_HYDRA_LEO_MISSING_COMMA)
docs/configuration/redis-tls.md
[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run unit tests
- GitHub Check: Run end-to-end tests
- GitHub Check: Build & cache Go code
- GitHub Check: Lint Go code
- GitHub Check: Build and push image
- GitHub Check: Analyze (go)
🔇 Additional comments (32)
hack/dev-env/gen-redis-tls-certs.sh (1)
1-150: Script looks solid—previous issues have been resolved.The two previously flagged issues are now addressed:
- Error suppression in OpenSSL commands has been removed (lines 17, 20, 25).
- Empty
LOCAL_IPis handled gracefully with conditional append (lines 87–90).The script is well-structured: it's idempotent (checks for existing keys/certs before regenerating), uses
set -efor error handling, cleans up temporary files, and handles both macOS and Linux for local IP detection. The graceful fallback to emptyLOCAL_IPwith conditional SAN entry is correct.hack/dev-env/configure-redis-tls.sh (6)
198-206: Verify password requirement for Redis configuration.A past review (lines 202-206) explicitly requested that the script "fail hard" when the Redis password secret is missing, citing that ArgoCD components expect authentication. However, the current code (lines 202-206) issues a warning and continues with an empty password.
Clarify the intended behavior:
- If Redis authentication is required for E2E tests, the script should fail when
argocd-redissecret is missing.- If graceful degradation to unauthenticated Redis is acceptable, document this assumption and update the warning message to reflect the impact.
136-196: Volume and volumeMount patching logic is sound and idempotent.The conditional checks (lines 139, 169) and JSON patch operations correctly handle cases where volumes/volumeMounts may or may not already exist. Re-running the script safely skips redundant patches. The logic is correct.
18-54: Error handling structure is solid.The combination of
set -e, trap-based cleanup, and explicit error checks on critical operations provides good defense-in-depth. Early validation (lines 61-76) catches issues before state mutations. The cleanup trap (line 54) ensures the initial context is restored regardless of exit path.Also applies to: 61-76
85-96: Replica count storage is idempotent and safe.The use of
--dry-run=client -o yaml | kubectl applycorrectly handles both creation and update, ensuring the pattern is re-entrant. If this step fails, the script exits and cleanup restores the initial context. Downstream scripts reading this ConfigMap should handle potential missing data gracefully.
239-253: Verification section appropriately informational.Post-rollout verification provides helpful feedback without blocking on transient states. The earlier
rollout statuscommand (line 231) enforces correctness, while this final check (lines 239-253) is user-friendly diagnostics.
68-71: Add explicit error check for context switch.Although
set -eprovides implicit safety (script exits on failure), explicit error checks with clear messages improve debuggability and document intent.Apply this diff to add explicit error handling:
# Switch context echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }test/e2e/fixture/argoclient.go (3)
27-27: LGTM! Imports are correctly added.The new imports (
osandappsv1) are properly used in the added functionality.Also applies to: 30-30
317-320: LGTM! Good optimization to avoid unnecessary K8s API calls.The environment variable check provides a simple override mechanism and improves test performance.
322-322: LGTM! Formatting and defensive checks improve code quality.The added comment clarifies the fallback logic, and the hostname check is good defensive programming.
Also applies to: 330-335
principal/redisproxy/redisproxy.go (6)
21-23: LGTM!The new imports for TLS support (
crypto/tls,crypto/x509,os) are appropriate and necessary for the TLS functionality added in this file.Also applies to: 27-27
65-75: LGTM!The TLS configuration fields are well-structured, clearly separating server-side and upstream TLS concerns. Supporting both in-memory certificates and path-based loading provides good flexibility.
98-128: LGTM!The TLS configuration setters provide a clean API surface. The comment on
SetUpstreamTLSInsecureappropriately warns that it's for testing only.
130-154: LGTM!The TLS configuration builder correctly handles both path-based and in-memory certificates, with appropriate error handling. Setting
MinVersionto TLS 1.2 provides a good balance between security and compatibility.
157-200: LGTM!The
Start()method cleanly integrates TLS support with clear branching between TLS and plaintext modes. Error handling and logging are appropriate for both paths.
221-221: LGTM!Converting the connection establishment to a method call enables access to the TLS configuration stored in the
RedisProxyinstance.hack/dev-env/start-agent-managed.sh (1)
37-75: Redis TLS and mTLS wiring in dev agent startup looks consistentTLS detection via
creds/redis-tls/ca.crt, defaulting the Redis address tolocalhost:6381, and extracting client cert/CA from Kubernetes secrets into /tmp all look correct for the dev/E2E workflow and align with the documented Redis TLS setup.Also applies to: 76-90
internal/argocd/cluster/cluster.go (1)
135-142: TLS‑enabled cluster cache wiring and connection state initialization look correctPassing
*tls.Configintoredis.Options.TLSConfiginNewClusterCacheInstanceis the right way to enable Redis TLS for the cluster cache, and the logic inSetClusterCacheStatsto initializeConnectionStatewhen none exists avoids empty status for newly reporting agents while preserving any existing state.Also applies to: 175-191
hack/dev-env/start-e2e.sh (1)
50-59: Localhost address exports and Redis password wiring align with TLS setupExporting the principal, agent, and Argo CD server addresses as
localhostwith fixed ports (6380/6381/6382/8444) matches the documented certificate SANs and simplifies the dev/e2e environment. FetchingREDIS_PASSWORDfrom the managed agent’sargocd-redissecret once and exporting it is also a clean way to keep the Redis auth in sync with the cluster.test/e2e/fixture/fixture.go (4)
110-113: Timeout increases look reasonable for TLS-enabled Redis.The increased timeouts from 60s to 120s accommodate the additional latency that TLS handshakes and encrypted operations may introduce, especially during cleanup operations. This is a sensible adjustment for the TLS-enabled environment.
Also applies to: 143-144, 161-161
232-241: Good resilience improvement: continue cleanup despite individual failures.Converting hard errors to warnings during cleanup prevents a single failing deletion from blocking the entire cleanup process. This is especially useful in TLS-enabled environments where transient connection issues may occur.
Also applies to: 255-266, 276-279, 288-292
236-238: Correct use of DeepCopy to avoid mutating loop variables.Using
DeepCopy()before modifying namespace ensures the original loop variable isn't mutated, which could cause subtle bugs in subsequent iterations. This is a proper fix.Also applies to: 261-263, 317-321, 350-353
497-499: LGTM: Proper error wrapping and cache instance switching.Using
getCachedCacheInstancealigns with the caching strategy in cluster.go, and wrapping the error with%wenables proper error chain inspection.agent/agent.go (2)
323-343: LGTM: TLS configuration for cluster cache is well-implemented.The TLS setup properly:
- Sets minimum TLS version to 1.2
- Logs a warning when using insecure mode (line 330)
- Loads and validates CA certificates from the filesystem
- Returns clear error messages for failure cases
345-349: Correct integration with updated NewClusterCacheInstance signature.The TLS config is properly passed to the cluster cache constructor, maintaining consistency with the Redis proxy's TLS configuration.
hack/dev-env/Procfile.e2e (2)
1-7: LGTM: Procfile properly sets up port-forwards and process dependencies.The configuration correctly:
- Sets up Redis port-forwards on distinct ports (6380-6382) to avoid conflicts
- Uses appropriate delays to ensure port-forwards are established before starting dependent processes
- Passes Redis addresses via environment variables for flexibility
6-7: Ensure environment variables are set before running goreman.The agents depend on
$MANAGED_AGENT_REDIS_ADDRand$AUTONOMOUS_AGENT_REDIS_ADDRenvironment variables. Verify these are exported bystart-e2eor documented in the README for manual execution.test/e2e/fixture/cluster.go (4)
180-201: Appropriate use of InsecureSkipVerify for E2E tests.Using
InsecureSkipVerify: truein test fixtures is acceptable given the documented rationale in README.md. The tests need to connect via dynamic addresses (port-forwards, LoadBalancer IPs) that may not match certificate SANs.
206-217: Good defensive configuration for E2E test stability.The generous timeouts and connection pool settings appropriately handle:
- Port-forward latency (10s dial, 30s read)
- Concurrent test operations (pool size 10)
- Connection lifecycle management (idle timeouts, retries)
232-257: LGTM: Redis client caching prevents connection leaks.The caching mechanism correctly:
- Uses a mutex for thread-safe access
- Creates cache keys based on source and address
- Reuses existing clients instead of creating new ones per call
308-327: Robust fallback chain for Redis address resolution.The address resolution order (LoadBalancer ingress → spec.loadBalancerIP → ClusterIP) with environment variable override provides flexibility for:
- CI environments with MetalLB (LoadBalancer)
- Local development (port-forwards via env override)
- Fallback scenarios
Also applies to: 369-387
test/e2e/README.md (1)
83-105: Confirm that all three Redis TLS configuration scripts are included in this PR.The manual reconfiguration section references three scripts:
./hack/dev-env/gen-redis-tls-certs.sh./hack/dev-env/configure-redis-tls.sh./hack/dev-env/configure-argocd-redis-tls.shThese scripts were previously flagged as missing. Ensure they are committed as part of this PR, or remove this section if they are not available.
97e15ae to
291dd51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/clusterinfo_test.go (1)
150-156: Inconsistent timeout in re-connect assertion.This assertion uses 30s/1s while the earlier checks in
Test_ClusterInfo_Autonomous(lines 124-129, 136-142) use the increased 60s/2s timeouts. For consistency and to avoid flaky tests, consider aligning this timeout with the others.Apply this diff:
requires.Eventually(func() bool { return fixture.HasConnectionStatus(fixture.AgentAutonomousName, appv1.ConnectionState{ Status: appv1.ConnectionStatusSuccessful, Message: fmt.Sprintf(message, fixture.AgentAutonomousName, "connected"), ModifiedAt: &metav1.Time{Time: time.Now()}, }, clusterDetail) - }, 30*time.Second, 1*time.Second) + }, 60*time.Second, 2*time.Second) }
♻️ Duplicate comments (6)
hack/dev-env/start-agent-autonomous.sh (1)
63-74: Add error handling for certificate extraction.The kubectl commands extract TLS credentials without error checking. If secrets are missing or extraction fails, the script continues with empty files, causing TLS failures at runtime.
Apply this diff to add error handling:
# Extract mTLS client certificates and CA from Kubernetes secret for agent authentication echo "Extracting mTLS client certificates and CA from Kubernetes..." TLS_CERT_PATH="/tmp/agent-autonomous-tls.crt" TLS_KEY_PATH="/tmp/agent-autonomous-tls.key" ROOT_CA_PATH="/tmp/agent-autonomous-ca.crt" kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \ - -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" + -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" || { echo "Failed to extract TLS cert"; exit 1; } kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \ - -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}" + -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}" || { echo "Failed to extract TLS key"; exit 1; } kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \ - -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}" + -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}" || { echo "Failed to extract CA cert"; exit 1; } echo "✅ mTLS client certificates and CA extracted"hack/dev-env/configure-redis-tls.sh (1)
68-76: Verify context switch succeeded before proceeding.Line 70 switches the kubectl context without checking for errors. If the context doesn't exist or the switch fails, subsequent kubectl commands may target the wrong cluster, risking unintended configuration changes.
Apply this diff to add error handling:
# Switch context echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { + echo "Error: Failed to switch to context ${CONTEXT}" + echo "Please verify the context exists: kubectl config get-contexts" + exit 1 +} # Check Redis Deployment exists if ! kubectl get deployment argocd-redis -n ${NAMESPACE} &>/dev/null; thentest/e2e/fixture/cluster.go (1)
309-317: CleanupRedisCachedClients doesn't explicitly close connections.The cleanup function only clears the map, relying on garbage collection. The
appstatecache.Cachewraps a Redis client that ideally should be explicitly closed for deterministic resource cleanup in tests.This was flagged in a previous review. If
appstatecache.Cachedoesn't expose aClose()method, this may be acceptable, but worth tracking as technical debt.cmd/argocd-agent/principal.go (2)
285-291: Validation still allows conflicting upstream TLS modes when using the default secret name.The mutual exclusivity check excludes the default secret name
"argocd-redis-tls"from the mode count (line 286-287). This means a user can specify--redis-upstream-ca-path=/some/pathwhile--redis-upstream-ca-secret-nameremains at its default, and the validation won't catch this conflict. The if-else chain will silently prefer the CA path.This was flagged in a previous review. Consider either:
- Counting all non-empty values regardless of default, or
- Tracking whether the flag was explicitly set vs. using the default
434-436:informer-sync-timeouthelp text is misleading.The help text says "(0 = use default of 60s)" but the flag's actual default via
env.DurationWithDefaultis0. The description should clarify whether 0 means "use internal default" or if there's no timeout.This was flagged in a previous review. The help text should accurately reflect the behavior.
test/e2e/fixture/fixture.go (1)
487-491: Minor: extra leading space in warning message still present.Line 489 still has a leading space in the format string:
" Warning: Failed...". This was flagged in a previous review.- fmt.Printf(" Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err) + fmt.Printf("Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)
🧹 Nitpick comments (6)
docs/configuration/redis-tls.md (1)
150-150: Optional: tag remaining fenced blocks to satisfy markdownlint.A few fenced code blocks still lack language tags (triggering MD040). Consider tagging them as
text:
- Line 150 ("How the tunnel works" architecture block)
- Lines 475-520 (script output examples)
Example:
-``` +```text Argo CD Server (remote vcluster) ...This is a low-priority linting issue; the content is already clear.
Also applies to: 475-520
hack/dev-env/configure-argocd-redis-tls.sh (2)
164-182: Consider defensive volume array handling for consistency.The argocd-repo-server configuration (lines 167-182) directly appends to
/spec/template/spec/volumes/-without first checking if the volumes array exists. While this may work in practice (if repo-server always has pre-existing volumes), it's inconsistent with the defensive pattern used for argocd-server (lines 68-108) that handles the case where the array might not exist.For consistency and robustness, consider applying the same defensive check used for argocd-server. This ensures the script handles edge cases uniformly across all components.
237-255: Consider defensive volume array handling for StatefulSet.Similar to repo-server, the argocd-application-controller configuration directly appends to the volumes array without checking if it exists. While this may work in practice, the defensive pattern from argocd-server (lines 68-108) would make the script more robust and consistent across all components.
test/e2e/fixture/cluster.go (1)
183-256: Consider extracting TLS configuration into a helper function to reduce duplication.The TLS configuration logic for
PrincipalName(lines 185-216) andAgentManagedName(lines 225-256) is nearly identical. This duplication increases maintenance burden.Extract a helper function:
func buildRedisTLSConfig(enabled bool, caPath string) *tls.Config { if !enabled { return nil } tlsConfig := &tls.Config{ MinVersion: tls.VersionTLS12, } if caPath != "" { if _, err := os.Stat(caPath); err == nil { caCertPEM, err := os.ReadFile(caPath) if err != nil { panic(fmt.Sprintf("failed to read Redis CA certificate: %v", err)) } certPool := x509.NewCertPool() if !certPool.AppendCertsFromPEM(caCertPEM) { panic(fmt.Sprintf("failed to parse Redis CA certificate from %s", caPath)) } tlsConfig.RootCAs = certPool } else { fmt.Printf("Warning: Redis CA certificate not found at %s, skipping verification\n", caPath) tlsConfig.InsecureSkipVerify = true } } else { tlsConfig.InsecureSkipVerify = true } return tlsConfig }test/e2e/fixture/argoclient.go (1)
330-335: LoadBalancer IP fallback may miss Ingress IP.The logic sets
argoEndpoint = srvService.Spec.LoadBalancerIPfirst, then only overwrites with hostname from Ingress. IfIngress[0].IPis set (not hostname), it won't be used. Consider checking both IP and Hostname from Ingress:argoEndpoint := srvService.Spec.LoadBalancerIP if len(srvService.Status.LoadBalancer.Ingress) > 0 { - if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" { + ingress := srvService.Status.LoadBalancer.Ingress[0] + if ingress.IP != "" { + argoEndpoint = ingress.IP + } else if ingress.Hostname != "" { - argoEndpoint = hostname + argoEndpoint = ingress.Hostname } }test/e2e/redis_proxy_test.go (1)
120-124: Usingtime.Sleepfor synchronization is fragile.While the 5-second sleep helps mitigate a race condition between SSE stream establishment and Redis SUBSCRIBE propagation, this approach is timing-dependent and may still be flaky under load or in slower environments.
Consider implementing a more deterministic synchronization mechanism, such as waiting for a specific initial SSE message or heartbeat that confirms the subscription is active.
Also applies to: 326-329
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
Makefile(1 hunks)agent/agent.go(4 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/agent/configuration.md(1 hunks)docs/configuration/agent/pki-certificates.md(1 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/setup-vcluster-env.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/application_test.go(2 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(3 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(12 hunks)test/e2e/fixture/toxyproxy.go(1 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- docs/configuration/agent/configuration.md
- docs/configuration/agent/pki-certificates.md
🚧 Files skipped from review as they are similar to previous changes (10)
- hack/dev-env/setup-vcluster-env.sh
- hack/dev-env/gen-redis-tls-certs.sh
- hack/dev-env/start-principal.sh
- test/e2e/rp_test.go
- test/e2e/application_test.go
- test/e2e/README.md
- hack/dev-env/Procfile.e2e
- Makefile
- install/helm-repo/argocd-agent-agent/values.schema.json
- test/e2e/fixture/toxyproxy.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/dev-env/start-agent-autonomous.shhack/dev-env/start-e2e.shhack/dev-env/start-agent-managed.shtest/run-e2e.sh
🧬 Code graph analysis (6)
test/e2e/fixture/argoclient.go (1)
test/e2e/fixture/kubeclient.go (1)
KubeClient(67-73)
cmd/argocd-agent/agent.go (4)
agent/options.go (3)
WithRedisTLSEnabled(112-117)WithRedisTLSInsecure(128-133)WithRedisTLSCAPath(120-125)principal/options.go (1)
WithRedisTLSEnabled(493-498)internal/logging/logging.go (1)
Warn(300-302)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
agent/agent.go (3)
internal/manager/manager.go (1)
NewDeletionTracker(253-257)internal/cache/resource_cache.go (1)
NewSourceCache(32-38)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
Event(112-115)internal/logging/logfields/logfields.go (1)
Event(34-34)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (3)
HasConnectionStatus(63-77)AgentManagedName(38-38)ClusterDetails(43-59)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
157-157: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
178-178: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
202-202: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
211-211: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
228-228: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
250-250: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
271-271: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
285-285: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
292-292: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
299-299: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
338-338: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
367-367: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
docs/getting-started/kubernetes/index.md
178-178: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
202-202: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
211-211: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
228-228: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
367-367: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build and push image
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Build & cache Go code
- GitHub Check: Lint Go code
- GitHub Check: Analyze (go)
🔇 Additional comments (36)
docs/getting-started/kubernetes/index.md (2)
204-216: ✓ Redis password variable expansion properly fixed.The previous review flagged that
$(REDIS_PASSWORD)would not expand within single-quoted patches. You've correctly addressed this by:
- Extracting the password into a shell variable first (lines 205–206, 376–377)
- Using double-quoted
-p="..."syntax for variable interpolation (lines 209, 380)- Employing
${REDIS_PASSWORD}syntax inside the JSON patch (lines 211, 382)- Adding clarifying comments (lines 208, 379)
The fix prevents subtle authentication misconfiguration and makes the instructions accurate for users.
Also applies to: 375-387
166-228: Static analysis warnings appear to be false positives.The markdownlint alerts about missing language specifiers are incorrect; all code blocks in section 2.4 have explicit
bashlanguage specified (lines 166, 197, 221). The flagged lines (178, 202, 211, 228) are interior lines within bash blocks, and the linter is likely confused by heredoc syntax or JSON patch escaping. No code block syntax corrections are needed.principal/tracker/tracking.go (1)
75-78: Buffered channel prevents deadlock—verify broader synchronization.The change to a buffered channel (capacity 1) is appropriate for async event delivery between goroutines and prevents blocking when the sender and receiver are not synchronized.
Verify the following:
- Proper synchronization exists to prevent send-on-closed-channel panics. Since
StopTracking()closes the channel (Line 90), ensure thatprocessRedisEventResponse(or any sender) cannot attempt a send after the channel is closed.- Buffer size of 1 is sufficient—confirm that only one event is sent per tracked request and no events are lost.
#!/bin/bash # Verify synchronization between channel send and close operations # Find all sends to evCh channel (tracked request wrappers) echo "=== Finding sends to tracked event channels ===" ast-grep --pattern 'evCh <- $_' echo "" echo "=== Finding processRedisEventResponse function (sender) ===" ast-grep --pattern $'func processRedisEventResponse($$$) { $$$ }' echo "" echo "=== Finding calls to StopTracking (closes channel) ===" rg -n -C3 'StopTracking\(' echo "" echo "=== Finding sendSynchronousRedisMessageToAgent function (receiver) ===" ast-grep --pattern $'func sendSynchronousRedisMessageToAgent($$$) { $$$ }'principal/resource.go (1)
39-39: LGTM—timeout increase appropriate for TLS overhead.Tripling the timeout from 10s to 30s is reasonable given the additional latency introduced by TLS handshakes in Redis connections.
test/run-e2e.sh (1)
124-124: LGTM—timeout increase appropriate for TLS-enabled tests.Doubling the E2E test timeout from 30m to 60m is reasonable given the additional overhead from TLS handshakes and certificate validation across multiple vclusters.
hack/dev-env/start-e2e.sh (1)
50-61: LGTM—static addresses simplify TLS certificate validation.The shift to static localhost addresses for Redis endpoints (6380, 6381, 6382) is a good simplification. It eliminates dynamic IP detection complexity and ensures TLS certificates can include
localhostin their SANs, making local development and E2E testing more reliable.test/e2e/fixture/cluster.go (2)
370-374: Hardcoded CA certificate path is test-specific.The path
hack/dev-env/creds/redis-tls/ca.crtis hardcoded for E2E tests. This is acceptable for test fixtures but consider adding a comment explaining this is intentional for the dev environment setup.The hardcoded path aligns with the dev-env scripts mentioned in the PR objectives.
261-267: Generous timeouts are appropriate for E2E tests with TLS overhead.The extended timeouts (DialTimeout: 10s, ReadTimeout: 30s) appropriately account for TLS handshake latency and port-forward operations in E2E test environments.
cmd/argocd-agent/principal.go (2)
263-275: Redis TLS server certificate configuration is well-validated.The validation ensures both cert and key are provided together (lines 270-271), and gracefully falls back to loading from a Kubernetes secret when paths aren't specified. This follows the same pattern used for gRPC TLS configuration.
438-459: Redis TLS enabled by default is a good security posture.Enabling TLS by default (
env.BoolWithDefault("ARGOCD_PRINCIPAL_REDIS_TLS_ENABLED", true)) aligns with the PR objective and security best practices. The flags provide appropriate flexibility for different deployment scenarios.test/e2e/clusterinfo_test.go (1)
108-115: Timeout increases are appropriate for TLS-enabled E2E tests.The increased timeouts (60s/2s) account for additional latency from TLS handshakes and potential port-forward delays. The inline comments explaining the rationale are helpful.
test/e2e/fixture/argoclient.go (2)
316-338: Environment variable override for ArgoCD server endpoint improves local development experience.Checking
ARGOCD_SERVER_ADDRESSfirst avoids unnecessary K8s API calls and provides flexibility for local testing. The fallback logic is preserved for cluster deployments.
489-513: IsArgoCDRepoServerReady helper is well-implemented.The function provides useful diagnostics when the repo-server isn't ready, including replica counts and conditions. This aids debugging E2E test failures.
internal/argocd/cluster/cluster.go (2)
176-191: TLS configuration cleanly integrated into Redis client initialization.The signature change to accept
*tls.Configis well-designed - callers can passnilwhen TLS is not required, and the config is directly assigned toredis.Options.TLSConfig. This maintains backward compatibility while enabling TLS support.
135-142: Initializing ConnectionState on first cache stats update improves UX.When
SetClusterCacheStatsis called but noConnectionStateexists yet (agent just connected), initializing it with a successful status ensures the connection info is populated promptly rather than waiting for a separate connection status update.cmd/argocd-agent/agent.go (3)
184-199: Redis TLS configuration logic is correctly implemented.The validation ensures mutual exclusivity between
--redis-tls-insecureand--redis-tls-ca-path, and the configuration is only applied when TLS is enabled. The warning for insecure mode is appropriate.
241-250: Redis TLS enabled by default aligns with security objectives.The default
trueforARGOCD_AGENT_REDIS_TLS_ENABLEDensures TLS encryption is used by default, matching the PR objective and the principal's configuration.
184-199: Agent lacks secret-based CA loading option available in principal.The principal supports loading Redis upstream CA from a Kubernetes secret (
--redis-upstream-ca-secret-name), but the agent only supports file-based CA (--redis-tls-ca-path). This asymmetry may be intentional (agent runs in a different context), but worth verifying whether secret-based CA loading should be added for feature parity in Kubernetes deployments.test/e2e/redis_proxy_test.go (4)
588-588: Buffered channel size of 100 looks reasonable.The buffered channel helps prevent message loss during SSE stream processing. The size of 100 provides adequate headroom for burst scenarios while the consumer drains messages.
642-653: HTTP transport configuration improvements for SSE streams.The transport settings are appropriate for long-lived SSE connections:
IdleConnTimeout: 300skeeps connections aliveResponseHeaderTimeout: 0and clientTimeout: 0are correct for SSE streams that may take time to produce eventsInsecureSkipVerify: trueis acceptable in E2E tests per PR description
188-208: Drain-and-retry logic is well-structured.The message draining approach correctly processes all available messages before returning false to retry, preventing missed messages due to timing issues. The logging provides good visibility into test progress.
Also applies to: 407-427
210-237: ResourceTree retry logic handles transient Redis connection issues.The
Eventuallywrapper with error handling for EOF and nil results provides resilience against transient connection issues during TLS-enabled Redis operations. The 30-second timeout with 2-second intervals is appropriate.Also applies to: 430-456
agent/agent.go (3)
141-146: Default initialization addresses potential ticker panic.Setting
cacheRefreshInterval: 30 * time.Secondas a default in the Agent struct initialization prevents thetime.NewTickerpanic that could occur with a zero duration. This addresses the previous review concern.
324-344: TLS configuration for cluster cache is well-implemented.The TLS setup correctly:
- Sets
MinVersion: tls.VersionTLS12- Logs a warning when using insecure mode (line 331)
- Properly reads and parses CA certificate from path
- Returns descriptive errors for CA loading failures
446-461: Unified cluster cache info update goroutine is cleaner.The refactored goroutine sends an immediate update on startup and then uses a single ticker for periodic updates. This consolidates the previous mode-specific logic and ensures both managed and autonomous agents send cluster cache info.
test/e2e/fixture/fixture.go (3)
110-113: Extended timeouts for deletion operations are appropriate.Increasing the deletion wait timeouts from 60s to 120s accommodates TLS handshake overhead and potential Redis connection delays in the TLS-enabled environment.
Also applies to: 144-144, 161-161
236-241: DeepCopy pattern prevents loop variable mutation.Using
DeepCopy()before modifying namespace/name ensures the original loop variable isn't mutated, which could cause subtle bugs in subsequent iterations or when the list is reused.Also applies to: 261-266
232-233: Cleanup now logs warnings instead of failing tests.Converting cleanup errors to warnings with
fmt.Printfand continuing execution improves test resilience. This prevents cascading test failures when non-critical cleanup operations fail (e.g., due to transient Redis unavailability).Also applies to: 240-241, 257-258, 265-266, 278-279, 291-292, 312-314, 323-325, 345-347, 355-357, 372-374
principal/redisproxy/redisproxy.go (8)
98-128: TLS configuration setters are well-designed.The setter methods provide a clean API for configuring TLS:
- Separation between in-memory cert/key and file paths
- Upstream TLS can use CA pool, CA path, or insecure mode
- Clear method naming indicates purpose
130-154: Server TLS configuration handles both cert sources correctly.
createServerTLSConfigproperly prioritizes file paths over in-memory certificates and setsMinVersion: tls.VersionTLS12. The error messages are descriptive.
162-183: TLS listener initialization is correct.The
Start()method properly branches between TLS and non-TLS listeners, with appropriate logging for each mode.
846-854: TCP dial timeout prevents indefinite hangs.Using
net.Dialerwith a 30-second timeout addresses the previous review concern about the connection attempt hanging indefinitely.
861-863: Security warning for unencrypted upstream connection.The warning when server TLS is enabled but upstream TLS is not configured helps operators identify potential security gaps. The message clearly indicates the data exposure risk.
871-877: InsecureSkipVerify warns when CA config is ignored.The additional warning when CA configuration is provided but ignored due to
InsecureSkipVerify=trueaddresses the previous review concern about silent configuration conflicts.
903-919: TLS handshake timeout prevents indefinite hangs.Setting a deadline before the handshake and clearing it afterward addresses the previous review concern. The 30-second timeout is consistent with the dial timeout.
896-901: SNI hostname extraction is correct.Extracting the hostname from the address for SNI configuration ensures proper TLS server name verification when connecting to upstream Redis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
agent/agent.go (1)
108-109: Guard against zero/negativecacheRefreshIntervalbefore creating ticker
NewAgentnow initializescacheRefreshIntervalto 30 seconds, which is good, butAgentOptions can still overwrite it with 0 or a negative value. In that case,time.NewTicker(a.cacheRefreshInterval)will panic at runtime (“non-positive interval for NewTicker”).To keep this safe even when options misconfigure the interval, validate it at ticker creation time:
- go func() { - // Send initial update immediately on startup (don't wait for first ticker) - a.addClusterCacheInfoUpdateToQueue() - - ticker := time.NewTicker(a.cacheRefreshInterval) + go func() { + // Send initial update immediately on startup (don't wait for first ticker) + a.addClusterCacheInfoUpdateToQueue() + + interval := a.cacheRefreshInterval + if interval <= 0 { + interval = 30 * time.Second + } + ticker := time.NewTicker(interval) defer ticker.Stop() for { select { case <-ticker.C: a.addClusterCacheInfoUpdateToQueue() case <-a.context.Done(): return } } }()Optionally factor
30 * time.Secondinto aconst defaultCacheRefreshIntervalto avoid duplication with the constructor.Also applies to: 141-146, 447-461
test/e2e/fixture/argoclient.go (1)
316-338: Fix LoadBalancer endpoint fallback to handle IP-only ingressThe new env override is great, but the Kubernetes fallback currently does:
argoEndpoint := srvService.Spec.LoadBalancerIP if len(srvService.Status.LoadBalancer.Ingress) > 0 { if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" { argoEndpoint = hostname } }On many providers (e.g., bare-metal + MetalLB, some cloud setups),
Ingress[0].IPis populated whileSpec.LoadBalancerIPandIngress[0].Hostnameare empty. In that case, this function now returns an empty endpoint and callers will fail.Consider handling both hostname and IP, and erroring explicitly if still empty:
func GetArgoCDServerEndpoint(k8sClient KubeClient) (string, error) { // Check environment variable first (avoids unnecessary K8s API call) if envAddr := os.Getenv("ARGOCD_SERVER_ADDRESS"); envAddr != "" { return envAddr, nil } - // Fall back to querying K8s service + // Fall back to querying K8s service srvService := &corev1.Service{} err := k8sClient.Get(context.Background(), types.NamespacedName{Namespace: "argocd", Name: "argocd-server"}, srvService, metav1.GetOptions{}) if err != nil { return "", err } - argoEndpoint := srvService.Spec.LoadBalancerIP - if len(srvService.Status.LoadBalancer.Ingress) > 0 { - if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" { - argoEndpoint = hostname - } - } + argoEndpoint := srvService.Spec.LoadBalancerIP + if len(srvService.Status.LoadBalancer.Ingress) > 0 { + ingress := srvService.Status.LoadBalancer.Ingress[0] + if ingress.Hostname != "" { + argoEndpoint = ingress.Hostname + } else if ingress.IP != "" { + argoEndpoint = ingress.IP + } + } + if argoEndpoint == "" { + return "", fmt.Errorf("argocd-server service has no LoadBalancer IP or hostname") + } return argoEndpoint, nil }This restores compatibility with IP-only LoadBalancers while keeping the env override behavior.
♻️ Duplicate comments (6)
hack/dev-env/start-agent-autonomous.sh (1)
63-75: Use secure temp files and cleanup for extracted TLS keysThe script currently writes the client cert, key, and CA to predictable
/tmp/agent-autonomous-*.{crt,key}paths. On most systems those files will be world-readable by default (depending onumask), and they aren’t removed after the process exits. Given these are long-lived TLS credentials, it’s better to:
- Use
mktempto create unique temp files.- Restrict permissions to 600 (or rely on a tighter
umask).- Add a
trapto delete them on exit.For example:
-echo "Extracting mTLS client certificates and CA from Kubernetes..." -TLS_CERT_PATH="/tmp/agent-autonomous-tls.crt" -TLS_KEY_PATH="/tmp/agent-autonomous-tls.key" -ROOT_CA_PATH="/tmp/agent-autonomous-ca.crt" +echo "Extracting mTLS client certificates and CA from Kubernetes..." +TLS_CERT_PATH="$(mktemp /tmp/agent-autonomous-tls.crt.XXXXXX)" +TLS_KEY_PATH="$(mktemp /tmp/agent-autonomous-tls.key.XXXXXX)" +ROOT_CA_PATH="$(mktemp /tmp/agent-autonomous-ca.crt.XXXXXX)" +chmod 600 "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}" +trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXIT kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \ -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \ -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}" kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \ -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}"This keeps the dev script behavior the same while tightening handling of sensitive material.
Also applies to: 79-81
hack/dev-env/start-agent-managed.sh (1)
63-75: Harden temp file handling for extracted TLS keysAs in the autonomous script, the managed-agent script writes the client cert, key, and CA to predictable
/tmp/agent-managed-*.{crt,key}files and leaves them behind. To better protect these credentials, especially on shared dev boxes, it’s worth:
- Using
mktempto generate unique paths.- Locking down permissions (e.g.,
chmod 600).- Cleaning them up with a
trapon exit.For example:
echo "Extracting mTLS client certificates and CA from Kubernetes..." -TLS_CERT_PATH="/tmp/agent-managed-tls.crt" -TLS_KEY_PATH="/tmp/agent-managed-tls.key" -ROOT_CA_PATH="/tmp/agent-managed-ca.crt" +TLS_CERT_PATH="$(mktemp /tmp/agent-managed-tls.crt.XXXXXX)" +TLS_KEY_PATH="$(mktemp /tmp/agent-managed-tls.key.XXXXXX)" +ROOT_CA_PATH="$(mktemp /tmp/agent-managed-ca.crt.XXXXXX)" +chmod 600 "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}" +trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXITThe rest of the extraction logic can remain the same.
Also applies to: 79-81
docs/configuration/redis-tls.md (1)
149-155: Tag remaining fenced code blocks with a language to satisfy markdownlint (MD040)markdownlint is still flagging a few bare code fences here (“How the tunnel works” and the three script-output examples). To quiet MD040 and keep formatting explicit, please tag them as plain text, e.g.:
-**How the tunnel works:** - ``` +**How the tunnel works:** + ```text Argo CD Server (remote vcluster) → rathole Deployment (remote) → rathole Container (local Mac) → Principal process (local Mac)And similarly for the script output sections: ```diff -**gen-redis-tls-certs.sh:** -``` +**gen-redis-tls-certs.sh:** +```text ... -**configure-redis-tls.sh:** -``` +**configure-redis-tls.sh:** +```text ... -**configure-argocd-redis-tls.sh:** -``` +**configure-argocd-redis-tls.sh:** +```text ...This keeps content unchanged while making the markdown linter happy.
Also applies to: 475-483, 485-501, 503-520
hack/dev-env/configure-argocd-redis-tls.sh (2)
29-31: Optional: add explicit error handling for context switch
set -ewill stop the script ifkubectl config use-context ${CONTEXT}fails, but the user only sees a generic kubectl error. Wrapping it with a short, explicit message would make failures clearer:-echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +echo "Switching to context: ${CONTEXT}" +kubectl config use-context "${CONTEXT}" || { + echo "Error: Failed to switch to context ${CONTEXT}" >&2 + exit 1 +}This keeps the safety while improving debuggability.
160-183: Harden repo-server and app-controller volume patches like argocd-serverFor
argocd-repo-serverandargocd-application-controller, the JSON patches assume/spec/template/spec/volumesalready exists and append with"/volumes/-". This works with current upstream Argo CD manifests but will fail if those Deployments/StatefulSets are ever created without avolumesarray.To mirror the more defensive pattern used for
argocd-server, consider checking for the existence of/spec/template/spec/volumesand creating it when missing before appending:- if ! kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes[?(@.name=="redis-tls-ca")]}' | grep -q "redis-tls-ca"; then - echo " Adding redis-tls-ca volume..." - if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[ - { - "op": "add", - "path": "/spec/template/spec/volumes/-", - "value": { - "name": "redis-tls-ca", - "secret": { - "secretName": "argocd-redis-tls", - "items": [{"key": "ca.crt", "path": "ca.crt"}] - } - } - } - ]'; then + if ! kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes[?(@.name=="redis-tls-ca")]}' | grep -q "redis-tls-ca"; then + echo " Adding redis-tls-ca volume..." + + VOLUMES_EXIST=$(kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes}' 2>/dev/null || echo "") + if [ -z "$VOLUMES_EXIST" ] || [ "$VOLUMES_EXIST" = "null" ]; then + # Create volumes array with first element + if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[ + { + "op": "add", + "path": "/spec/template/spec/volumes", + "value": [{ + "name": "redis-tls-ca", + "secret": { + "secretName": "argocd-redis-tls", + "items": [{"key": "ca.crt", "path": "ca.crt"}] + } + }] + } + ]'; then + echo " ERROR: Failed to create volumes array and add redis-tls-ca volume to argocd-repo-server" + exit 1 + fi + else + # Append to existing volumes array + if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[ + { + "op": "add", + "path": "/spec/template/spec/volumes/-", + "value": { + "name": "redis-tls-ca", + "secret": { + "secretName": "argocd-redis-tls", + "items": [{"key": "ca.crt", "path": "ca.crt"}] + } + } + } + ]'; then + echo " ERROR: Failed to add redis-tls-ca volume to argocd-repo-server" + exit 1 + fi + fiApply the same pattern to the StatefulSet block for
argocd-application-controllerto keep behavior consistent and robust.Also applies to: 237-252
hack/dev-env/configure-redis-tls.sh (1)
68-71: Optional: improve error message on context switchAs with the other script,
set -ewill abort ifkubectl config use-context ${CONTEXT}fails, but a short explicit message would make failures easier to diagnose:-echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +echo "Switching to context: ${CONTEXT}" +kubectl config use-context "${CONTEXT}" || { + echo "Error: Failed to switch to context ${CONTEXT}" >&2 + exit 1 +}Functionality is already safe; this is just UX polish.
🧹 Nitpick comments (6)
test/e2e/fixture/toxyproxy.go (1)
119-124: Dynamic readiness timeout for principal avoids informer-sync flakesUsing a 120s default and extending to 180s for
compName == "principal"is a pragmatic way to account for the principal’s longer informer sync time and should reduce test flakiness. If you ever see drift withARGOCD_PRINCIPAL_INFORMER_SYNC_TIMEOUT, consider deriving this timeout from that value instead of hard-coding, but it’s fine as-is.Also applies to: 126-134
test/run-e2e.sh (1)
89-122: macOS port-forward detection is reasonable as a soft checkThe lsof-based check and the explanatory warning around
make start-e2eprovide a helpful signal for local development without blocking CI. It only guarantees that some of the required forwards are running (union of 6380/6381/6382), but since it’s advisory and not fatal, that trade-off is fine for now.test/e2e/fixture/fixture.go (1)
487-501: Optional: guard against nilclusterDetailsin cluster-info reset
resetManagedAgentClusterInfoassumesclusterDetailsis non-nil when callinggetCachedCacheInstance(AgentManagedName, clusterDetails). Today that’s true for the existing BaseSuite usage, but a future caller could accidentally passniland trigger a panic during cleanup.A small defensive check would make this safer:
func resetManagedAgentClusterInfo(clusterDetails *ClusterDetails) error { - // Reset cluster info in redis cache - if err := getCachedCacheInstance(AgentManagedName, clusterDetails).SetClusterInfo(AgentClusterServerURL, &argoapp.ClusterInfo{}); err != nil { + if clusterDetails == nil { + return fmt.Errorf("resetManagedAgentClusterInfo: clusterDetails is nil") + } + // Reset cluster info in redis cache + if err := getCachedCacheInstance(AgentManagedName, clusterDetails).SetClusterInfo(AgentClusterServerURL, &argoapp.ClusterInfo{}); err != nil { return fmt.Errorf("resetManagedAgentClusterInfo: %w", err) } return nil }Not required for current tests, but it future-proofs the helper.
test/e2e/fixture/cluster.go (1)
43-60: Redis TLS wiring, timeouts, and address resolution look solid; consider logging for implicit InsecureSkipVerify.The overall shape here looks good: TLS is enabled by default for both principal and managed-agent Redis in E2E, with a CA-path override, sane
tls.Config{MinVersion: tls.VersionTLS12}, generous dial/read/write timeouts, and clear LoadBalancer →spec.LoadBalancerIP→ClusterIPfallbacks plus env overrides for local runs. This should significantly reduce flakiness in tests.One small ergonomics improvement: when no CA path is specified you silently set
InsecureSkipVerify = true(Lines 210-213, 251-253), whereas the “file missing” case logs a warning. For misconfigurations, an explicit warning in the “no CA path” branch would make it much easier to spot that certificate verification isn’t happening, without changing behavior.For example:
- } else { - // No CA path specified, skip verification - tlsConfig.InsecureSkipVerify = true - } + } else { + // No CA path specified, skip verification + tlsConfig.InsecureSkipVerify = true + fmt.Printf("Warning: Principal Redis CA certificate path not specified, skipping verification\n") + } ... - } else { - // No CA path specified, skip verification - tlsConfig.InsecureSkipVerify = true - } + } else { + // No CA path specified, skip verification + tlsConfig.InsecureSkipVerify = true + fmt.Printf("Warning: Managed agent Redis CA certificate path not specified, skipping verification\n") + }Given this is test-only wiring, this remains a low‑risk, nice‑to‑have for operator visibility rather than a functional change.
Also applies to: 173-217, 224-267, 338-400, 402-463
cmd/argocd-agent/principal.go (1)
90-99: Redis TLS CLI wiring and informer sync timeout semantics are consistent; only minor UX nits possible.The new Redis TLS surface looks coherent:
redis-tls-enabledis on by default and fed throughWithRedisTLSEnabled, with server TLS sourced either from explicit cert/key paths or a secret, with proper paired‑flag validation.- Upstream TLS “modes” (insecure, CA file, CA secret) are mutually exclusive for explicit configurations via the
modesSetcount, while still allowing the default secret name to act as a fallback when no mode is chosen, which is a sensible behavior.- The upstream wiring (
WithRedisUpstreamTLSInsecure,WithRedisUpstreamTLSCAFromFile,WithRedisUpstreamTLSCAFromSecret) lines up with that validation, so you won’t silently drop user‑specified upstream TLS settings.
informer-sync-timeoutnow clearly documents0 = use default of 60sand is only applied when > 0, which matches the help text and avoids surprising behavior for existing installs.The increased 30s timeout in
getResourceProxyTLSConfigFromKubeis also a good call for slow or loaded clusters.If you want to polish further, a small optional improvement would be to log a warning when
redis-tls-enabled=falsebut any of the TLS‑specific flags (server cert/key, upstream CA path/secret, insecure) are set, to surface misconfigurations that are currently silently ignored.Also applies to: 259-305, 434-459, 490-510
principal/redisproxy/redisproxy.go (1)
65-76: RedisProxy server & upstream TLS implementation is robust; only minor configurability tweaks are optional.The new TLS support in
RedisProxylooks well‑structured:
Server‑side TLS:
SetTLSEnabled,SetServerTLS, andSetServerTLSFromPathcleanly separate concerns between enabling TLS and configuring certificate sources.createServerTLSConfigcorrectly prefers explicit paths when present, falls back to in‑memory*x509.Certificate/crypto.PrivateKey, and enforcesMinVersion: tls.VersionTLS12.Upstream TLS:
establishConnectionToPrincipalRedisnow uses anet.Dialerwith a 30s timeout and adds a 30s deadline around the TLS handshake, which should prevent the previous “hang forever on connect/handshake” failure mode.hasUpstreamTLSConfigplus therp.tlsEnabledgate ensures you only wrap the upstream connection when both sides expect TLS, and the explicitWarnwhen server TLS is on but upstream TLS isn’t configured is a good safety net.- CA handling from either an in‑memory pool or a file is correct, and the warning when CA config is present but ignored due to
InsecureSkipVerify=trueis helpful from a security‑visibility standpoint.- SNI
ServerNamederived vianet.SplitHostPortfromprincipalRedisAddressis appropriate for hostname‑based certs while still working with IP‑SAN certs.Overall, this is a nice, self‑contained TLS upgrade of the proxy. If, in the future, you decide you need “upstream TLS even when the proxy listens in plaintext” for phased migrations, you could relax the
if rp.tlsEnabled && hasUpstreamTLSConfigcondition to key only onhasUpstreamTLSConfigand control the two halves independently, but that’s a design choice rather than a requirement for this PR.Also applies to: 98-155, 159-183, 836-926
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
Makefile(1 hunks)agent/agent.go(4 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/agent/configuration.md(1 hunks)docs/configuration/agent/pki-certificates.md(1 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/application_test.go(2 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(3 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(12 hunks)test/e2e/fixture/toxyproxy.go(1 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/configuration/agent/configuration.md
🚧 Files skipped from review as they are similar to previous changes (11)
- principal/resource.go
- Makefile
- principal/tracker/tracking.go
- hack/dev-env/gen-redis-tls-certs.sh
- test/e2e/clusterinfo_test.go
- test/e2e/application_test.go
- hack/dev-env/start-e2e.sh
- hack/dev-env/Procfile.e2e
- test/e2e/rp_test.go
- cmd/argocd-agent/agent.go
- install/helm-repo/argocd-agent-agent/values.schema.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/dev-env/start-agent-managed.shdocs/configuration/agent/pki-certificates.mdhack/dev-env/start-agent-autonomous.shhack/dev-env/configure-argocd-redis-tls.shtest/run-e2e.shtest/e2e/README.mdhack/dev-env/start-principal.sh
🧬 Code graph analysis (6)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
agent/agent.go (2)
internal/cache/resource_cache.go (1)
NewSourceCache(32-38)internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
test/e2e/fixture/argoclient.go (1)
test/e2e/fixture/kubeclient.go (1)
KubeClient(67-73)
principal/redisproxy/redisproxy.go (4)
internal/logging/logfields/logfields.go (2)
Config(127-127)Error(101-101)principal/listen.go (1)
Listener(54-60)internal/logging/logging.go (3)
Error(305-307)Warn(300-302)Trace(285-287)internal/env/env.go (1)
String(96-107)
test/e2e/fixture/fixture.go (3)
internal/logging/logfields/logfields.go (1)
Name(59-59)internal/backend/interface.go (1)
Namespace(124-127)test/e2e/fixture/cluster.go (3)
ClusterDetails(43-59)AgentManagedName(38-38)AgentClusterServerURL(40-40)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md
233-233: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Analyze (go)
🔇 Additional comments (23)
docs/configuration/agent/pki-certificates.md (1)
568-576: Clarified default certificate validation wording looks goodStating explicitly that certificate validation is the default behavior matches the CLI/env defaults and improves guidance without changing semantics. No further changes needed.
internal/argocd/cluster/cluster.go (2)
129-142: ConnectionState initialization on first cache stats update is reasonablePreserving an existing
ConnectionStatewhile initializing it toConnectionStatusSuccessfulwhen it was previously zeroed gives clusters a sensible “connected” state once they start reporting stats, without overwriting prior status. This behavior is consistent with the rest of the manager logic.
175-185: Redis cluster cache now honors TLS configuration—verify updated call sitesAccepting a
*tls.ConfiginNewClusterCacheInstanceand wiring it through toredis.Options{ TLSConfig: tlsConfig }is the correct way to enable TLS for the go-redis client while allowingnilto mean "no TLS". Please double-check that:
- All callers of
NewClusterCacheInstancehave been updated to pass the newtlsConfigparameter.- Callers pass
nilwhen Redis TLS is disabled, and a correctly populated*tls.Config(including CA roots /InsecureSkipVerifyas appropriate) when enabled.docs/configuration/redis-tls.md (1)
1-701: Comprehensive Redis TLS documentation aligns with the new default behaviorThis document clearly explains that Redis TLS is enabled by default, how principal/upstream/agent TLS fit together, and how the dev scripts and Kubernetes manifests interact. The flag/env/ConfigMap examples match the described behavior and give a practical path from dev/E2E to production.
docs/getting-started/kubernetes/index.md (1)
159-229: Redis TLS setup steps are consistent and fix the previous password interpolation issueThe new Redis TLS sections for control-plane (Step 2.4) and workload cluster (Step 4.4) are clear and aligned: you generate CA/server certs, create the
argocd-redis-tlssecret, patch the deployment for TLS, and verify withredis-cli --tls. ReadingREDIS_PASSWORDfrom the existingargocd-redissecret and using a double-quoted JSON patch correctly interpolates the password instead of treating it as a literal. The cross-links to the dedicated Redis TLS configuration doc tie the getting-started flow into the more detailed reference nicely.Also applies to: 341-390, 655-655
hack/dev-env/start-agent-autonomous.sh (1)
37-46: Redis TLS and Redis address defaults for autonomous agent are wired correctlyDetecting the local Redis TLS CA under
creds/redis-tls, building--redis-tls-enabled/--redis-tls-ca-pathargs, and defaultingARGOCD_AGENT_REDIS_ADDRESStolocalhost:6382(with explicit port-forward guidance) keeps the autonomous agent E2E startup behavior consistent with the Redis TLS docs and the managed-agent script. Passing$REDIS_TLS_ARGSand$REDIS_ADDRESS_ARGinto the agent invocation preserves flexibility for overriding via env while maintaining secure defaults.Also applies to: 48-62, 79-83
hack/dev-env/start-principal.sh (1)
23-29: Principal dev script now cleanly expects Redis port-forward and wires Redis TLS correctlyDefaulting
ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESStolocalhost:6380(and relying on an external port-forward) avoids the prior double port-forward problem, while still matching the SANs used in the Redis TLS certificates. The newARGOCD_PRINCIPAL_INFORMER_SYNC_TIMEOUTdefault of120slines up with the extended readiness timeout in tests, and theREDIS_TLS_ARGSblock correctly enables Redis TLS and passes server cert, key, and upstream CA path into the principal. The overall startup flow looks consistent with the Redis TLS docs and E2E expectations.Also applies to: 42-44, 47-65, 73-74
hack/dev-env/start-agent-managed.sh (1)
37-46: Managed agent Redis TLS and Redis address defaults are consistent with the autonomous scriptChecking for the Redis TLS CA under
creds/redis-tls, enabling--redis-tls-enabled/--redis-tls-ca-pathwhen present, and defaultingARGOCD_AGENT_REDIS_ADDRESStolocalhost:6381(with clear port-forward instructions) align this script with both the Redis TLS docs and the autonomous-agent startup. Injecting$REDIS_TLS_ARGSand$REDIS_ADDRESS_ARGinto the agent invocation gives secure-by-default behavior while allowing overrides via env.Also applies to: 48-62, 79-83
hack/dev-env/configure-argocd-redis-tls.sh (1)
310-347: Replica restoration and cleanup flow looks solidReading replica counts from
argocd-redis-tls-replicas, enforcing a minimum of 1 for each component, scaling back up only if the resources exist, and finally deleting the temporary ConfigMap matches the intended “scale down for TLS cutover, then restore” flow. No issues here.hack/dev-env/configure-redis-tls.sh (1)
61-66: Redis TLS configuration script is robustNice job on:
- Validating all required cert files, including
ca.crt.- Capturing and persisting replica counts before scaling down Argo CD components.
- Safely adding
redis-tlsvolumes and mounts even when the arrays are initially missing.- Reading the Redis password from the
argocd-redissecret and failing fast with a clear message if it’s missing.- Replacing the Redis args with a TLS-only configuration and guarding the patch with explicit error handling.
This should give very predictable TLS cutovers in dev/e2e.
Also applies to: 123-131, 135-193, 198-229
test/e2e/redis_proxy_test.go (3)
105-124: SSE establishment wait effectively removes subscribe raceAdding a short delay after the SSE stream is established before manipulating pods is a pragmatic way to avoid the “delete before SUBSCRIBE active” race that intermittently broke the tests. The 5-second sleep is reasonable given the overall 5-minute Eventually window.
186-208: Buffered channel + draining loops make SSE verification resilientThe combination of:
- A buffered
msgChan(make(chan string, 100)), and- The inner loops that drain all currently available SSE messages before returning
falsetoEventuallygreatly reduces the chance of missing the pod-name event due to bursty traffic or timing. The non-blocking
select { case msg := <-msgChan ... default: ... }pattern keeps the Eventually closures fast and avoids deadlocks.Looks good for stabilizing these Redis proxy tests.
Also applies to: 406-427, 588-589
210-237: ResourceTree retries and SSE HTTP client tuning are appropriateWrapping the
ResourceTreecalls inRequires.Eventuallywith logging on errors and nil trees is a good way to cope with transient Redis/SSE issues while still asserting that the new pod eventually appears.The dedicated HTTP transport with
IdleConnTimeout, disabled compression, and no overall timeout is appropriate for long-lived SSE connections in tests; usingInsecureSkipVerify: truehere is acceptable given this is e2e-only code and the surrounding README calls out the TLS model.No changes needed.
Also applies to: 430-456, 642-653
agent/agent.go (2)
141-146: Good initialization of Agent internals and cache refresh defaultInitializing
version,deletions,sourceCache, and a sane defaultcacheRefreshInterval(30s) directly inNewAgentmakes the Agent more self-contained and predictable. It also sets a clear baseline for the periodic cluster cache info updates.
324-344: Cluster cache TLS config correctly mirrors Redis proxy settingsThe new
clusterCacheTLSConfigwiring:
- Enables TLS only when
redisTLSEnabledis true.- Logs a clear warning when
redisTLSInsecureis set and flipsInsecureSkipVerifyaccordingly.- Loads and validates the CA from
redisTLSCAPathinto aCertPooland assigns it toRootCAs.Passing this TLS config into
cluster.NewClusterCacheInstanceensures the cluster cache talks to Redis with the same security posture as the proxy. Error handling on CA read/parse and cache creation is appropriate.Also applies to: 346-351
test/e2e/fixture/argoclient.go (1)
489-513: Repo-server readiness helper is simple and useful
IsArgoCDRepoServerReady’s “available replica > 0” check with a diagnostic string on failure is a good fit for e2e polling. Usingtypes.NamespacedNameand returning both a bool and message makes it easy for tests to log context without special-casing API errors.test/run-e2e.sh (2)
24-45: Redis TLS preflight checks are clear and effectiveEnforcing presence of
creds/redis-tls/ca.crtwith explicit instructions, plus validating per-context state (argocd-redis-tlssecret,--tls-portarg, andredis-tlsvolume) before running tests, gives very good feedback when TLS setup is incomplete.This should prevent most of the confusing “Redis not TLS” failures in e2e.
Also applies to: 49-77
124-124: E2E invocation with race detector and long timeout looks appropriateRunning
go testwith-raceand a 60-minute timeout against the e2e package is aligned with the complexity and duration of these flows. No issues.test/e2e/fixture/fixture.go (3)
79-98: Extended AppProject sync timeout is appropriateBumping the AppProject sync Eventually window to 120 seconds gives more breathing room for cross-cluster propagation under TLS and heavier environments, without impacting test logic. Looks good.
108-172: Deletion helpers are clearer and more robustThe explicit 120-iteration loops in
EnsureDeletionandWaitForDeletionwith 1-second sleeps make the timing behavior obvious, and the second pass that strips finalizers after the first timeout is a pragmatic way to deal with stuck resources in tests.No functional issues here.
201-267: Cleanup now favors best-effort behavior, which is appropriate for e2eSwitching many fatal cleanup errors to warnings (while still logging which Application/AppProject failed and in which cluster) makes suites more resilient to transient issues without hiding the problems.
The use of DeepCopy for principal/managed App[Project] objects before
WaitForDeletionalso avoids mutating loop variables, which is safer. Overall, this is a solid improvement for test stability.Also applies to: 295-358, 377-487
test/e2e/README.md (1)
21-105: E2E README now accurately reflects TLS-enabled workflowThe stepwise instructions (setup, optional reverse tunnel, start principal/agents, then run tests) combined with the dedicated “Redis TLS” section and concrete script invocations match the new tooling you’ve added (
gen-redis-tls-certs.sh,configure-redis-tls.sh,configure-argocd-redis-tls.sh). This should significantly reduce confusion when running TLS-enabled e2e locally or in CI.test/e2e/fixture/cluster.go (1)
276-317: Redis cache client reuse & cleanup behavior is acceptable for E2E scope.The cached
map[string]*appstatecache.Cacheguarded bycachedRedisClientMutexavoids repeatedly constructing Redis clients across calls, andCleanupRedisCachedClientscorrectly resets the map under the lock. Given thatappstatecache.Cachedoesn’t expose an explicit close and this is only used in the E2E fixture, relying on process‑lifetime GC for the underlying connections is a reasonable trade‑off; the new debug print makes the lifecycle visible during test teardown.
Signed-off-by: Rizwana777 <[email protected]>
06673ef to
ada2bb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (6)
hack/dev-env/configure-argocd-redis-tls.sh (3)
29-31: Verify context switch succeeded before proceeding.The script switches kubectl context without error handling. If the switch fails, subsequent operations may target the wrong cluster, potentially misconfiguring production Argo CD components.
Apply this diff:
# Switch context echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { + echo "Error: Failed to switch to context ${CONTEXT}" + echo "Please verify the context exists: kubectl config get-contexts" + exit 1 +}
167-182: Inconsistent handling of missing volumes array.The argocd-repo-server configuration assumes the volumes array exists (lines 167-182), while argocd-server handles the case where it might not exist (lines 68-88). This inconsistency could cause the script to fail if argocd-repo-server has no pre-existing volumes array.
Consider applying the same defensive approach used for argocd-server - check if the volumes array exists before attempting to append to it, and create it if necessary.
237-258: Same inconsistency exists for argocd-application-controller.The StatefulSet configuration also directly appends to volumes without checking if the array exists, unlike the defensive handling in argocd-server. Apply the same pattern for consistency.
test/e2e/fixture/fixture.go (1)
487-491: Minor: extra leading space in warning message.Line 489 has a leading space in the format string:
" Warning: Failed...". This is inconsistent with other warning messages that start without a leading space.- fmt.Printf(" Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err) + fmt.Printf("Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)hack/dev-env/start-agent-autonomous.sh (1)
37-75: Harden temporary handling of mTLS certs/keys extracted to/tmpThe Redis TLS wiring and default
--redis-addrbehaviour look good, but mTLS credentials are still extracted to predictable/tmp/agent-autonomous-*.{crt,key}paths without tightening permissions or cleaning them up:
- Paths are static and may be world‑readable depending on umask.
- Files are never removed after the agent exits.
Given these are client TLS keys, even in dev/e2e it’s safer to:
- Use
mktempfor each ofTLS_CERT_PATH,TLS_KEY_PATH, andROOT_CA_PATH.- Immediately
chmod 600the files (or set a restrictiveumaskbefore writing).- Add a
traptorm -fthe temp files on exit.For example:
TLS_CERT_PATH="$(mktemp /tmp/agent-autonomous-tls.crt.XXXXXX)" TLS_KEY_PATH="$(mktemp /tmp/agent-autonomous-tls.key.XXXXXX)" ROOT_CA_PATH="$(mktemp /tmp/agent-autonomous-ca.crt.XXXXXX)" chmod 600 "$TLS_CERT_PATH" "$TLS_KEY_PATH" "$ROOT_CA_PATH" trap 'rm -f "$TLS_CERT_PATH" "$TLS_KEY_PATH" "$ROOT_CA_PATH"' EXITError handling is already decent due to
set -e -o pipefail; this change would mainly tighten the security story for local runs.Also applies to: 79-83
hack/dev-env/start-agent-managed.sh (1)
37-75: Use secure temp files for extracted mTLS credentialsAs in the autonomous script, mTLS certs/keys/CA here are written to fixed
/tmp/agent-managed-*.{crt,key}paths with default permissions and no cleanup. That’s workable for local e2e, but stronger hygiene is easy:
- Allocate each path via
mktempinstead of a static filename.- Restrict permissions (
chmod 600).- Register a
trapto delete the files on script exit.Pattern example:
TLS_CERT_PATH="$(mktemp /tmp/agent-managed-tls.crt.XXXXXX)" TLS_KEY_PATH="$(mktemp /tmp/agent-managed-tls.key.XXXXXX)" ROOT_CA_PATH="$(mktemp /tmp/agent-managed-ca.crt.XXXXXX)" chmod 600 "$TLS_CERT_PATH" "$TLS_KEY_PATH" "$ROOT_CA_PATH" trap 'rm -f "$TLS_CERT_PATH" "$TLS_KEY_PATH" "$ROOT_CA_PATH"' EXITThis keeps the nice Redis TLS integration while avoiding leaving long‑lived, guessable TLS key files under
/tmp.Also applies to: 79-83
🧹 Nitpick comments (8)
test/e2e/fixture/toxyproxy.go (1)
119-124: Dynamic timeout logic is sound; consider centralizing per-component configThe new
timeouthandling correctly preserves the 120s default and extends the principal’s window to 180s to cover informer sync, which should reduce flakiness while keeping other components unchanged. As a small cleanup, you could centralize bothhealthzAddrandtimeoutselection in a singleswitchor helper that takescompNameto keep these settings co-located and avoid drift if principal timings change again later.principal/options.go (1)
80-88: Redis TLS options on the principal are coherent with the rest of the TLS surfaceThe added
ServerOptionsfields andWithRedis*helpers cleanly separate server‑side TLS (proxy listener) from upstream TLS (CA / insecure), and the secret‑based variants reusetlsutilas expected. Only minor thought:WithRedisUpstreamTLSCAFromFilecurrently just stores the path and defers reading/validation to connection time; if this ever shows up as a hot path, you could mirrorWithTLSRootCaFromFileand eagerly build aCertPoolonce during option application.Also applies to: 492-548
principal/redisproxy/redisproxy.go (1)
836-926: Upstream TLS dial logic is correct, with a couple of low‑risk refinements to considerFunctionally this method looks good: you now have a dial timeout, SNI set from
principalRedisAddress, optional CA loading from memory or path, a distinct insecure mode with loud logging, and a bounded TLS handshake via deadlines.Two small, non‑blocking tweaks you might want to consider:
- Avoid the concrete
*net.TCPConnassertionYou don’t seem to use any TCP‑specific methods:
connTmp, err := dialer.Dial("tcp", addr.String()) if err != nil { // ... } conn := connTmp.(*net.TCPConn)You can keep
connas anet.Connand drop the assertion to avoid a potential panic if the implementation ever ceases to return*net.TCPConn:- connTmp, err := dialer.Dial("tcp", addr.String()) + conn, err := dialer.Dial("tcp", addr.String()) if err != nil { // ... - } - conn := connTmp.(*net.TCPConn) + }
- Optional: cache the CA pool when using
upstreamTLSCAPathRight now
os.ReadFile(rp.upstreamTLSCAPath)+x509.NewCertPool()runs on every new upstream connection. If connection churn is high, you might want to build and store theCertPoolonce (e.g., when applying options) and reuse it, similar to howWithRedisUpstreamTLSCAFromSecretsetsredisUpstreamTLSCAdirectly.Neither of these is a correctness blocker; the current implementation should behave as intended.
test/e2e/fixture/argoclient.go (1)
27-27: Env override for Argo CD server endpoint is helpful; just ensure expected format is clearLetting
GetArgoCDServerEndpointshort‑circuit onARGOCD_SERVER_ADDRESSis a nice way to avoid K8s API calls in constrained environments and to support custom endpoints.One thing to keep in mind: the
ArgoRestClientconstructs URLs viaurl.URL{Scheme: "https", Host: c.endpoint}, soARGOCD_SERVER_ADDRESSshould be a bare host (or host:port), not a full URL with scheme. If that’s not already documented where this env var is introduced, it’s worth calling out to avoid confusing “https://…” values.Also applies to: 387-403
hack/dev-env/configure-redis-tls.sh (1)
37-46: Script flow and error handling look good; one minor redundant branchOverall TLS setup (cert checks, secret creation, volume patches, and arg updates) is sound and nicely idempotent.
The second CA-based check at Lines 39–46 is now effectively redundant because Lines 61–66 already hard-fail if
ca.crt(and the cert/key pair) are missing, so theelsepath (“running without TLS”) is unreachable. You can safely drop that branch or merge the messages into the initial cert check to simplify the control flow.Also applies to: 61-66
docs/configuration/redis-tls.md (1)
114-121: Clean up tab characters flagged by markdownlint (MD010)markdownlint is still reporting MD010 “no-hard-tabs” around these lines. There are likely literal tab characters in the bullet/paragraph indentation here even though they render fine.
Replacing the tabs with spaces in this section (and any similar spots) will satisfy MD010 without changing rendered output.
install/helm-repo/argocd-agent-agent/values.schema.json (1)
302-383: Consider documenting the type flexibility for Redis TLS boolean fields.The schema allows
redisTLS.enabledandredisTLS.insecureto accept both boolean and string types viaanyOf, whilenetworkPolicy.enabledaccepts only boolean. This inconsistency might confuse users who expect uniform boolean handling across the chart.If the string support is needed for environment variable compatibility (e.g., Kubernetes ConfigMap values), consider adding this rationale to the field descriptions:
"enabled": { "anyOf": [ { "type": "string", "enum": ["true", "false"] }, { "type": "boolean" } ], "description": "Enable TLS for Redis connections (can be boolean or string for ConfigMap compatibility)" }Otherwise, consider standardizing all boolean flags to use the same type validation pattern.
cmd/argocd-agent/principal.go (1)
277-291: Validation logic for default secret name may be confusing.The mutual exclusivity check excludes the default secret name
"argocd-redis-tls"from the mode count (lines 286-287). This means:
- If a user specifies
--redis-upstream-ca-path=/some/pathand doesn't specify--redis-upstream-ca-secret-name, the validation passes (modesSet=1) even though the secret name has the default value- The if-else chain at lines 294-303 prioritizes the path, so the default secret is ignored
While this works correctly in practice, it's unintuitive. Users might expect that:
- Not specifying
--redis-upstream-ca-secret-namemeans "don't use a secret"- The default secret is only used when no other mode is specified
Consider either:
- Removing the default value from the flag (empty string means "not specified")
- Adding a comment explaining why the default is excluded from validation
- Checking if the flag was explicitly set by the user (not just using the default)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (48)
Makefile(1 hunks)agent/agent.go(4 hunks)agent/inbound_redis.go(3 hunks)agent/options.go(1 hunks)agent/outbound_test.go(1 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/agent/configuration.md(1 hunks)docs/configuration/agent/pki-certificates.md(1 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/README.md(3 hunks)install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml(2 hunks)install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml(1 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)install/helm-repo/argocd-agent-agent/values.yaml(1 hunks)install/kubernetes/agent/agent-deployment.yaml(3 hunks)install/kubernetes/agent/agent-params-cm.yaml(1 hunks)install/kubernetes/principal/principal-deployment.yaml(3 hunks)install/kubernetes/principal/principal-params-cm.yaml(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)internal/argocd/cluster/cluster_test.go(3 hunks)internal/argocd/cluster/informer_test.go(6 hunks)internal/argocd/cluster/manager.go(3 hunks)internal/argocd/cluster/manager_test.go(3 hunks)principal/options.go(2 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/server.go(3 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/application_test.go(2 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(3 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(12 hunks)test/e2e/fixture/toxyproxy.go(1 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (17)
- principal/tracker/tracking.go
- test/run-e2e.sh
- internal/argocd/cluster/manager.go
- Makefile
- test/e2e/rp_test.go
- cmd/argocd-agent/agent.go
- internal/argocd/cluster/manager_test.go
- docs/configuration/agent/pki-certificates.md
- hack/dev-env/start-e2e.sh
- principal/resource.go
- install/kubernetes/principal/principal-deployment.yaml
- docs/configuration/agent/configuration.md
- internal/argocd/cluster/cluster.go
- hack/dev-env/start-principal.sh
- agent/inbound_redis.go
- install/kubernetes/principal/principal-params-cm.yaml
- install/kubernetes/agent/agent-params-cm.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/dev-env/start-agent-autonomous.shinstall/helm-repo/argocd-agent-agent/templates/agent-deployment.yamlhack/dev-env/configure-argocd-redis-tls.shtest/e2e/application_test.gotest/e2e/README.mdhack/dev-env/start-agent-managed.shinstall/kubernetes/agent/agent-deployment.yamlhack/dev-env/Procfile.e2einstall/helm-repo/argocd-agent-agent/values.yaml
🧬 Code graph analysis (14)
agent/outbound_test.go (1)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
test/e2e/application_test.go (1)
test/e2e/fixture/argoclient.go (1)
IsArgoCDRepoServerReady(562-583)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
agent/options.go (2)
principal/options.go (1)
WithRedisTLSEnabled(493-498)agent/agent.go (2)
AgentOption(139-139)Agent(65-120)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)test/fake/kube/kubernetes.go (1)
NewFakeKubeClient(31-44)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(43-59)AgentManagedName(38-38)AgentClusterServerURL(40-40)
test/e2e/clusterinfo_test.go (2)
test/e2e/fixture/cluster.go (4)
HasConnectionStatus(63-77)AgentManagedName(38-38)ClusterDetails(43-59)AgentAutonomousName(39-39)internal/logging/logfields/logfields.go (2)
Status(130-130)Message(132-132)
cmd/argocd-agent/principal.go (3)
principal/options.go (6)
WithInformerSyncTimeout(439-444)WithRedisTLSEnabled(493-498)WithRedisServerTLSFromPath(501-507)WithRedisServerTLSFromSecret(510-520)WithRedisUpstreamTLSCAFromFile(523-528)WithRedisUpstreamTLSCAFromSecret(531-540)agent/options.go (1)
WithRedisTLSEnabled(112-117)internal/env/env.go (3)
DurationWithDefault(168-177)BoolWithDefault(30-39)StringWithDefault(46-55)
principal/server.go (1)
internal/argocd/cluster/manager.go (1)
NewManager(71-119)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
NewFakeKubeClient(31-44)
test/e2e/redis_proxy_test.go (1)
internal/logging/logfields/logfields.go (5)
Name(59-59)Kind(58-58)Config(127-127)Client(37-37)Timeout(78-78)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
principal/options.go (3)
agent/options.go (1)
WithRedisTLSEnabled(112-117)principal/server.go (1)
Server(73-166)internal/tlsutil/kubernetes.go (2)
TLSCertFromSecret(44-66)X509CertPoolFromSecret(106-128)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
115-115: Hard tabs
Column: 1
(MD010, no-hard-tabs)
123-123: Hard tabs
Column: 1
(MD010, no-hard-tabs)
132-132: Hard tabs
Column: 1
(MD010, no-hard-tabs)
docs/getting-started/kubernetes/index.md
233-233: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
🔇 Additional comments (37)
agent/outbound_test.go (1)
464-464: LGTM! API update correctly applied.The addition of the
nilparameter aligns with the extendedcluster.NewManagersignature that now accepts a TLS config. Passingnilis appropriate here since this test uses an in-memory miniredis instance that doesn't require TLS encryption.internal/argocd/cluster/informer_test.go (1)
19-19: LGTM!Test calls correctly updated to pass the new compression type parameter (
cacheutil.RedisCompressionGZip) andnilfor the TLS config, matching the expandedNewManagersignature.Also applies to: 33-33, 50-50, 87-87, 115-115
internal/argocd/cluster/cluster_test.go (1)
36-36: LGTM!Tests correctly updated to pass
nilfor the newtlsConfigparameter inNewManagercalls.Also applies to: 225-225
principal/server.go (2)
354-373: LGTM!Redis proxy TLS configuration is well-structured with proper separation between server-side TLS (incoming from Argo CD) and upstream TLS (outgoing to argocd-redis). The conditional logic correctly handles both path-based and direct certificate configuration.
402-427: LGTM! Solid TLS configuration with appropriate logging.The cluster manager TLS setup properly:
- Creates TLS config with MinVersion TLS 1.2
- Logs a warning when InsecureSkipVerify is enabled (line 410)
- Loads and validates CA certificates from disk with clear error messages
- Handles the CA pool from both direct provision and file path
hack/dev-env/configure-argocd-redis-tls.sh (1)
316-325: LGTM! Replica guard logic correctly implemented.The explicit
ifstatements properly ensure at least 1 replica for each component, handling both empty and "0" cases correctly. This addresses the shell operator precedence issue that could have occurred with compound conditions.test/e2e/fixture/fixture.go (2)
97-97: LGTM! Extended timeouts appropriate for TLS.The increased timeouts (to 120 seconds) account for the additional overhead of TLS handshakes and certificate validation during test setup and teardown.
Also applies to: 110-110, 113-113, 144-144, 161-161
236-241: LGTM! Proper use of DeepCopy prevents loop variable mutation.Creating copies via
DeepCopy()before modifying namespace/name fields ensures the original loop variables aren't mutated, which is correct and prevents subtle bugs in cross-cluster deletion checks.Also applies to: 261-266, 318-325, 351-357
agent/agent.go (2)
328-348: LGTM! TLS configuration properly constructed with appropriate warnings.The cluster cache TLS setup correctly:
- Creates TLS config with MinVersion TLS 1.2
- Logs an "INSECURE" warning when certificate verification is skipped (line 335)
- Loads and validates CA certificates from disk with clear error messages
- Handles both insecure mode and CA-based verification
149-149: LGTM! Default interval prevents ticker panic.Setting
cacheRefreshIntervalto 30 seconds by default (line 149) ensurestime.NewTickernever receives a zero or negative duration, which would panic. The unified goroutine (lines 450-465) performs an immediate initial update before entering the ticker loop, which is good for startup behavior.Also applies to: 450-465
test/e2e/fixture/cluster.go (4)
184-216: LGTM! TLS configuration with graceful CA fallback.The TLS setup correctly:
- Creates TLS config with MinVersion TLS 1.2
- Loads CA certificates from disk with proper error handling
- Falls back to
InsecureSkipVerifyif CA file doesn't exist (with warning)- Applies the same pattern for both Principal and ManagedAgent
The fallback to insecure mode is appropriate for test backward compatibility, though the warning message makes the degraded security posture clear.
Also applies to: 224-256
261-267: LGTM! Generous timeouts for E2E port-forward latency.The extended timeouts (30s read, 10s dial/write) with retry backoff are appropriate for E2E tests that may use port-forward or run in resource-constrained environments.
283-317: Client caching prevents connection leaks.The cached client approach (with mutex protection) ensures Redis connections are reused across test assertions rather than creating new connections for every query. The
CleanupRedisCachedClients()function clears the cache map.Note: As flagged in a previous review,
appstatecache.Cachemay not expose aClose()method for explicit connection cleanup. If connections need explicit closure, you may need to track the underlyingredis.Clientinstances separately.Consider verifying whether
appstatecache.Cacheor its underlying Redis client exposes aClose()method. If explicit cleanup is needed, you may want to track the rawredis.Clientalongside the cache and close it inCleanupRedisCachedClients().
348-368: LGTM! Robust address resolution with multiple fallbacks.The address resolution logic tries:
- LoadBalancer ingress (IP or hostname)
spec.LoadBalancerIP(for local vcluster)spec.ClusterIP(last resort)This covers various deployment scenarios and provides a clear error message if all methods fail.
Also applies to: 412-432
test/e2e/clusterinfo_test.go (1)
108-115: Timeout increases for connection status checks look reasonableBumping these
Eventuallytimeouts / intervals (with clear comments) is a pragmatic way to absorb extra latency from port‑forward/TLS in long e2e runs; no logic concerns from my side.Also applies to: 123-129, 142-142
principal/redisproxy/redisproxy.go (1)
65-75: Server‑side Redis proxy TLS wiring looks solidThe new TLS fields and setters on
RedisProxy, pluscreateServerTLSConfigand theStart()branching intotls.Listen, are all consistent and give you clear separation between plaintext and TLS modes with explicit logging. Enforcing TLS ≥1.2 is also a good default for an internal proxy.Also applies to: 98-154, 159-183
test/e2e/application_test.go (1)
5-5: Repo‑server readiness gate before application tests is a good additionWaiting on
IsArgoCDRepoServerReadywith a bounded 180s/5s poll and logging status deltas should cut down on timing‑related flakes when creating applications that rely on repo‑server, without affecting core test logic.Also applies to: 28-41
install/helm-repo/argocd-agent-agent/values.yaml (1)
136-152: Secure‑by‑default Redis TLS + NetworkPolicy values are reasonable but do require matching cluster setupThe new defaults (
tlsRootCAPath,redisTLS.*, andnetworkPolicy.*) align with the goal of having Redis TLS and restricted network access enabled out of the box. That said, these defaults assume:
- A
argocd-redis-tlssecret withca.crtexists and is mounted at/app/config/redis-tls.- Redis and agent workloads are labeled with the selectors used in the
networkPolicysection.Installations that don’t meet those assumptions will need to either provision the secret/labels or override these values. It’d be worth making sure the chart/docs call out these expectations clearly.
Also applies to: 153-163
install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)
93-101: Redis TLS parameters are wired cleanly into the agent ConfigMapThe new
agent.redis.tls.*keys mirror theredisTLSvalues and follow the existing pattern of stringified booleans in the params ConfigMap, so they should drop into the CLI/env parsing on the agent side without surprises.test/e2e/fixture/argoclient.go (1)
30-30: Repo‑server readiness helper is straightforward and gives useful diagnostics
IsArgoCDRepoServerReady’s check onAvailableReplicas > 0is a simple, robust readiness signal for the repo‑server, and returning a human‑readable message with replica counts and conditions makes the higher‑level tests’ logs much easier to interpret when readiness fails. No changes needed here.Also applies to: 559-583
agent/options.go (1)
111-133: Agent Redis TLS options are consistent with other AgentOption helpers
WithRedisTLSEnabled,WithRedisTLSCAPath, andWithRedisTLSInsecurefollow the same pattern as the existing Redis options and line up with the new Helm/params keys, so the agent can now be configured cleanly for Redis TLS just like the principal. Looks good.docs/getting-started/kubernetes/index.md (1)
159-234: Redis TLS setup steps are clear and consistent with the TLS docsThe new Sections 2.4 and 4.4 do a good job of:
- Generating a CA and per‑cluster Redis server certs with appropriate SANs,
- Creating
argocd-redis-tlssecrets on both control-plane and workload clusters, and- Patching Redis arguments with a pattern that correctly expands
REDIS_PASSWORDin a double‑quoted JSON patch.The final “Related Documentation” link back to Redis TLS Configuration also helps keep the duplication under control. I don’t see any functional issues here.
Also applies to: 341-390, 655-655
hack/dev-env/gen-redis-tls-certs.sh (1)
14-26: Redis TLS cert generation script looks solid and idempotentThis script cleanly covers:
- Idempotent CA + per‑cluster cert generation (control-plane, proxy, autonomous, managed),
- Reasonable SANs for k8s DNS, localhost, and optional local IP,
- Cleanup of temporary CSR/extension/serial files.
With the prior fixes (no
2>/dev/nullsuppression and conditional LOCAL_IP SAN), it’s in good shape for dev/e2e purposes.Also applies to: 28-135
test/e2e/redis_proxy_test.go (1)
120-238: Improved SSE handling and retries should significantly reduce Redis-proxy test flakinessThe combination of:
- A short post‑connect delay before mutating pods,
- Buffered SSE channel + “drain all available messages” loops, and
- Retried
ResourceTreecalls with logging on transient errors/nil trees,is a pragmatic way to address timing/race issues in these e2e flows without overcomplicating the tests. The explicit log messages will also make diagnosing future flakes easier.
Given this code is confined to the e2e test package and uses TLS only for test traffic, I’m comfortable with the
InsecureSkipVerifytransport here.Also applies to: 326-456, 588-665
hack/dev-env/Procfile.e2e (1)
1-7: LGTM! Process orchestration properly configured for TLS-enabled E2E environment.The port-forward mappings and startup sequences are well-structured:
- Redis port-forwards correctly target each vcluster (control-plane:6380, managed:6381, autonomous:6382)
- Startup delays ensure proper initialization order (principal starts at 3s, agents at 5s)
- Agent processes include the required REDIS_ADDRESS environment variables for TLS-enabled Redis connections
test/e2e/README.md (4)
21-29: LGTM! Clear documentation of mandatory Redis TLS requirement.The documentation properly emphasizes that Redis TLS is required and automatically configured, with a helpful reference to the detailed Redis TLS section below.
31-53: LGTM! Excellent documentation of reverse tunnel setup for remote clusters.The conditional flow for remote vs. local clusters is clearly explained, including:
- When the reverse tunnel is needed (remote clusters only)
- What the setup script does
- How to keep the tunnel running
55-82: LGTM! Clear step-by-step workflow for running E2E tests.The multi-terminal workflow is well-documented, including:
- Port-forward requirements
- Principal and agent process management
- Conditional tunnel usage
- Automatic connection method detection (local vs. CI)
83-105: Verify that the referenced TLS configuration scripts exist and are executable.The documentation references three scripts for manual Redis TLS reconfiguration:
./hack/dev-env/gen-redis-tls-certs.sh./hack/dev-env/configure-redis-tls.sh./hack/dev-env/configure-argocd-redis-tls.shPlease confirm these scripts exist in the repository and are executable. If these scripts were added in a recent commit (such as 3b0283f), verify that file permissions were correctly set during the commit.
install/kubernetes/agent/agent-deployment.yaml (2)
149-166: LGTM! Redis TLS environment variables properly configured.The three new environment variables for Redis TLS configuration are correctly wired:
ARGOCD_AGENT_REDIS_TLS_ENABLED- enables/disables TLSARGOCD_AGENT_REDIS_TLS_CA_PATH- path to CA certificateARGOCD_AGENT_REDIS_TLS_INSECURE- skip verification flag (dev/test only)All variables use
optional: trueto allow graceful degradation if the ConfigMap keys are not present.
193-195: LGTM! Redis TLS CA volume properly configured with security best practices.The volume mount and volume definition follow Kubernetes security best practices:
- Mount is
readOnly: true(prevents accidental modification)- Secret reference uses
optional: true(allows deployment without TLS secret)- CA certificate properly mapped from
argocd-redis-tlssecret to/app/config/redis-tls/ca.crtAlso applies to: 205-211
cmd/argocd-agent/principal.go (6)
259-261: LGTM! Informer sync timeout properly wired with conditional application.The timeout is only applied when explicitly set (> 0), which allows the internal default (60s) to be used when not specified. This aligns with the flag description at line 436.
265-275: LGTM! Redis server TLS configuration properly validated.The validation ensures both cert and key are provided together or neither is provided, preventing partial TLS configuration. The fallback to Kubernetes secret is appropriate.
294-303: LGTM! Upstream TLS configuration priority is clear and well-logged.The if-else chain provides a clear priority order (insecure > CA file > CA secret) with appropriate warning messages for insecure mode.
434-436: LGTM! Informer sync timeout flag properly documented.The flag description clearly explains the behavior: 0 uses the internal default of 60s, and users can increase it for slow environments. This matches the implementation at lines 259-261.
438-459: LGTM! Redis TLS flags comprehensively cover all configuration scenarios.The flags provide flexible TLS configuration with:
- Global TLS enable/disable flag (default: true)
- Server TLS cert/key from file or Kubernetes secret
- Upstream CA from file or Kubernetes secret
- Insecure mode for development/testing (properly marked as INSECURE)
Default values are reasonable (TLS enabled by default, using argocd-redis-tls secret).
490-490: Verify the timeout increase from 2s to 30s is necessary.The timeout for fetching TLS configuration from Kubernetes was increased from 2 seconds to 30 seconds. While this provides more tolerance for slow clusters or cold starts, it's a significant increase (15x).
Please confirm this timeout is needed based on observed failures in your environment. If the increase is necessary, consider:
- Whether 30s is the right value (perhaps 10s would be sufficient?)
- Whether this affects the startup time or user experience
- Whether similar timeouts elsewhere in the codebase should be adjusted
Based on learnings, this aligns with the PR's goal of accommodating slow environments (e.g., the informer sync timeout increase), but verification ensures the value is appropriate.
| - name: ARGOCD_AGENT_REDIS_TLS_ENABLED | ||
| valueFrom: | ||
| configMapKeyRef: | ||
| name: argocd-agent-params | ||
| key: agent.redis.tls.enabled | ||
| optional: true | ||
| - name: ARGOCD_AGENT_REDIS_TLS_CA_PATH | ||
| valueFrom: | ||
| configMapKeyRef: | ||
| name: argocd-agent-params | ||
| key: agent.redis.tls.ca-path | ||
| optional: true | ||
| - name: ARGOCD_AGENT_REDIS_TLS_INSECURE | ||
| valueFrom: | ||
| configMapKeyRef: | ||
| name: argocd-agent-params | ||
| key: agent.redis.tls.insecure | ||
| optional: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wire Redis TLS env vars to the templated ConfigMap name and mount the CA volume
Two issues here will prevent Redis TLS from working reliably:
- Hard‑coded ConfigMap name for new env vars
Lines 136–153 use name: argocd-agent-params instead of the existing {{ include "argocd-agent-agent.paramsConfigMapName" . }} helper used everywhere else. This breaks deployments that override the params ConfigMap name via values.
You likely want:
- - name: ARGOCD_AGENT_REDIS_TLS_ENABLED
- valueFrom:
- configMapKeyRef:
- name: argocd-agent-params
+ - name: ARGOCD_AGENT_REDIS_TLS_ENABLED
+ valueFrom:
+ configMapKeyRef:
+ name: {{ include "argocd-agent-agent.paramsConfigMapName" . }}
key: agent.redis.tls.enabled
optional: true
- - name: ARGOCD_AGENT_REDIS_TLS_CA_PATH
- valueFrom:
- configMapKeyRef:
- name: argocd-agent-params
+ - name: ARGOCD_AGENT_REDIS_TLS_CA_PATH
+ valueFrom:
+ configMapKeyRef:
+ name: {{ include "argocd-agent-agent.paramsConfigMapName" . }}
key: agent.redis.tls.ca-path
optional: true
- - name: ARGOCD_AGENT_REDIS_TLS_INSECURE
- valueFrom:
- configMapKeyRef:
- name: argocd-agent-params
+ - name: ARGOCD_AGENT_REDIS_TLS_INSECURE
+ valueFrom:
+ configMapKeyRef:
+ name: {{ include "argocd-agent-agent.paramsConfigMapName" . }}
key: agent.redis.tls.insecure
optional: true- Volume is defined but never mounted into the container
You add the redis-tls-ca volume under volumes: (Lines 253–260), but the only volumeMounts entry is still userpass-passwd (Lines 229–232). Without a mount, the agent process won’t see the CA file at the path configured in agent.redis.tls.ca-path (e.g., /app/config/redis-tls/ca.crt).
Consider adding a mount, e.g.:
volumeMounts:
- name: userpass-passwd
mountPath: /app/config/creds
+{{- if .Values.redisTLS.secretName }}
+ - name: redis-tls-ca
+ mountPath: /app/config/redis-tls
+ readOnly: true
+{{- end }}(and keep agent.redis.tls.ca-path consistent with this directory, e.g. /app/config/redis-tls/ca.crt).
These two fixes are needed for TLS-enabled Redis connections to actually succeed in-cluster.
Also applies to: 229-260
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (7)
hack/dev-env/configure-redis-tls.sh (1)
68-70: Add error handling for context switch.If
kubectl config use-contextfails, subsequent kubectl operations may target the wrong cluster. The past review comment flagged this, but error handling appears to still be missing.# Switch context echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { + echo "Error: Failed to switch to context ${CONTEXT}" + exit 1 +}docs/configuration/redis-tls.md (1)
415-432: Documentation shows non-expanded$(REDIS_PASSWORD)in kubectl patch.The example uses single-quoted
-p='[...]'which prevents shell expansion of$(REDIS_PASSWORD). Users copying this will configure Redis with the literal string$(REDIS_PASSWORD)as the password. This was flagged in a past review but not addressed.Update the example to show proper password retrieval and interpolation:
+# First, get the Redis password from the secret +REDIS_PASSWORD=$(kubectl -n argocd get secret argocd-redis -o jsonpath='{.data.auth}' | base64 --decode) + # Update Redis args for TLS -kubectl patch deployment argocd-redis -n argocd --type='json' -p='[ +kubectl patch deployment argocd-redis -n argocd --type='json' -p="[ { \"op\": \"replace\", \"path\": \"/spec/template/spec/containers/0/args\", \"value\": [ \"--save\", \"\", \"--appendonly\", \"no\", - \"--requirepass\", \"$(REDIS_PASSWORD)\", + \"--requirepass\", \"${REDIS_PASSWORD}\", ... ] } -]' +]"hack/dev-env/configure-argocd-redis-tls.sh (2)
29-31: Add error handling for context switch.Same issue as in
configure-redis-tls.sh: ifkubectl config use-contextfails, subsequent operations may target the wrong cluster. The past review flagged this but it wasn't addressed.# Switch context echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { + echo "Error: Failed to switch to context ${CONTEXT}" + exit 1 +}
164-182: Inconsistent handling of missing volumes array for repo-server.The
argocd-serverconfiguration (lines 68-108) defensively handles the case where the volumes array doesn't exist, butargocd-repo-serverdirectly appends using/spec/template/spec/volumes/-which will fail if the array is missing. The past review flagged this inconsistency.Apply the same defensive pattern used for
argocd-server:if ! kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes[?(@.name=="redis-tls-ca")]}' | grep -q "redis-tls-ca"; then echo " Adding redis-tls-ca volume..." + + # Check if volumes array exists + VOLUMES_EXIST=$(kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes}' 2>/dev/null || echo "") + + if [ -z "$VOLUMES_EXIST" ] || [ "$VOLUMES_EXIST" = "null" ]; then + # Create volumes array with first element + if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[ + { + "op": "add", + "path": "/spec/template/spec/volumes", + "value": [{"name": "redis-tls-ca", "secret": {"secretName": "argocd-redis-tls", "items": [{"key": "ca.crt", "path": "ca.crt"}]}}] + } + ]'; then + echo " ERROR: Failed to create volumes array for argocd-repo-server" + exit 1 + fi + else + # Append to existing volumes array (existing code) if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[hack/dev-env/start-agent-managed.sh (1)
63-74: Certificate extraction lacks error handling.The kubectl commands extract TLS credentials to temporary files without checking for errors. If the secrets don't exist or extraction fails, the script continues with empty or corrupt files, causing cryptic TLS errors when the agent starts.
This issue was previously flagged in earlier review comments.
hack/dev-env/start-agent-autonomous.sh (1)
63-74: Certificate extraction lacks error handling.The kubectl commands extract TLS credentials without error checking, which can cause cryptic failures if secrets don't exist.
This issue was previously flagged and applies identically to the managed agent script.
test/e2e/fixture/cluster.go (1)
309-317:CleanupRedisCachedClientsdoes not actually close Redis clients (behaviour vs comment mismatch)The comment says this “closes all cached Redis clients”, but the implementation only resets the
cachedRedisClientsmap and relies on GC / process exit to clean up connections. This was already raised previously; reiterating with a concrete suggestion.If you want real connection cleanup between tests, you’ll need a way to call
Close()on the underlying*redis.Clients created ingetCacheInstance. For example:
- Change the cache to store a small struct:
type redisCachedClient struct { cache *appstatecache.Cache client *redis.Client } var ( cachedRedisClients = make(map[string]redisCachedClient) cachedRedisClientMutex sync.Mutex )
- Have
getCachedCacheInstancepopulate both fields (by refactoringgetCacheInstanceor adding a helper that returns both cache and client).- Then
CleanupRedisCachedClientscan iterate the map, callclient.Close()for each, and finally reset the map.If you intentionally rely on process teardown and don’t want to plumb through
*redis.Client, at least consider updating the comment to describe that this only clears the cache map, not active TCP connections.
🧹 Nitpick comments (9)
test/e2e/fixture/toxyproxy.go (1)
119-134: Principal-specific readiness timeout logic is reasonable; consider avoiding duplicated magic numbers (optional).The new
timeouthandling with a longer window forcompName == "principal"aligns with the informer sync behavior and should help reduce flakes. Non‑principal components still use the previous 120s behavior, which keeps semantics stable.If there is (or ends up being) a shared constant or configuration for the principal informer sync timeout elsewhere, consider wiring this code to that single source instead of hard-coding
120 * time.Secondhere to avoid drift in future changes. This is non‑blocking and can be deferred.test/e2e/redis_proxy_test.go (2)
120-124: SSE “settling” sleep is pragmatic but could be made condition‑basedThe extra 5s wait after establishing the SSE stream should help avoid the subscription race you described and seems reasonable for now. Longer term, consider replacing the fixed sleep with a condition‑based wait (e.g., wait until at least one initial SSE/resource-tree update is observed, with a timeout) so test duration isn’t tied to an arbitrary constant.
Also applies to: 326-330
588-588: Buffered SSE channel and HTTP transport tuning are reasonable; consider a few bounds and test‑only guardrailsThe buffered
msgChanplus the tunedhttp.Transport/http.Client(keep‑alives, idle timeout, no overall timeout for SSE) are aligned with long‑lived SSE streams and should reduce connection churn and message loss in these e2e tests.A few non‑blocking considerations:
- With
Timeout: 0andResponseHeaderTimeout: 0, if the endpoint is misconfigured/unreachable and the context lacks a deadline,client.Docan block for a long time. Ifsuite.Ctxdoesn’t already enforce a global test timeout, consider using a context with a finite deadline for the SSE stream creation path.InsecureSkipVerify: trueis understandable here given dynamically provisioned endpoints and test scope. It’d be good to keep this clearly isolated to e2e (which you’re doing) and maybe add a short comment/TODO about tightening it when CI has a stable CA / hostname story.Overall, these changes look appropriate for the current test environment.
Also applies to: 643-653, 661-663
hack/dev-env/gen-redis-tls-certs.sh (1)
68-72: Linux IP detection may fail silently on some systems.The
ip r show defaultcommand may not output the expected format on all Linux distributions or network configurations (e.g., multiple default routes, missingsrcfield). Consider adding a fallback or validation.else - LOCAL_IP=$(ip r show default 2>/dev/null | sed -e 's,.*\ src\ ,,' | sed -e 's,\ metric.*$,,' | head -n 1 || echo "") + LOCAL_IP=$(ip r show default 2>/dev/null | grep -oP 'src \K[\d.]+' | head -n 1 || \ + hostname -I 2>/dev/null | awk '{print $1}' || echo "") fihack/dev-env/configure-redis-tls.sh (1)
116-118: Suppressed errors during pod termination wait may hide issues.The
2>/dev/null || truepattern suppresses all errors fromkubectl wait. While this allows the script to continue if pods don't exist, it also hides legitimate errors (e.g., API server connectivity issues).Consider logging a message when the wait command fails:
-kubectl wait --for=delete pod -l app.kubernetes.io/name=argocd-repo-server -n ${NAMESPACE} --timeout=60s 2>/dev/null || true +kubectl wait --for=delete pod -l app.kubernetes.io/name=argocd-repo-server -n ${NAMESPACE} --timeout=60s 2>/dev/null || echo " (no pods to wait for or wait timed out)"principal/redisproxy/redisproxy.go (1)
130-154: Consider validation for mutually exclusive certificate configuration.The method allows both file-based and in-memory certificate configuration to be set simultaneously, with file-based taking precedence (lines 136-145). While this works, it could lead to confusion if both are configured. Consider adding validation at configuration time to ensure only one method is used, or document this precedence behavior clearly.
Additionally, consider upgrading the minimum TLS version to 1.3 for enhanced security:
return &tls.Config{ Certificates: []tls.Certificate{cert}, - MinVersion: tls.VersionTLS12, + MinVersion: tls.VersionTLS13, }, niltest/e2e/fixture/cluster.go (3)
276-307: Cached Redis clients: keying and lifecycle considerationsThe global
cachedRedisClientsmap keyed by"<source>:<addr>"with a mutex gives you a simple and thread-safe cache and should avoid repeated client creation during E2E runs. Two follow-ups to consider:
- The cache key ignores password/TLS settings. If a test ever changes credentials or TLS parameters while keeping the same address, you’ll silently reuse an old client. For current E2E usage this is probably fine, but worth keeping in mind if the fixture is extended.
- If you decide to explicitly close clients (see comment on
CleanupRedisCachedClients), you’ll likely want to change the map value to a small struct that also carries the underlying*redis.Client, or maintain a parallel map keyed the same way.Given this is test-only code, these are more about future-proofing than immediate correctness.
348-381: Managed-agent Redis address discovery and TLS defaults look solidThe address resolution order (LoadBalancer ingress →
spec.loadBalancerIP→ClusterIP) plus a clear error when none is available is sensible for E2E environments. Always enabling TLS for managed-agent Redis and wiring the CA path, with a final override of the address viaMANAGED_AGENT_REDIS_ADDR, nicely matches the “TLS by default, easy local override” goal.Only minor note: the override assumes the env value already includes the port; that’s fine, but it may be worth documenting in test setup docs / comments if not already done elsewhere.
412-445: Principal Redis address discovery and TLS wiring consistent with managed-agent pathThe principal-side
getPrincipalRedisConfigmirrors the managed-agent logic: same address resolution strategy, TLS enabled by default, CA path wired, and an env override (ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS) applied last. This symmetry makes the fixture predictable and easier to reason about.No functional issues spotted here; just ensure any test documentation mentions the expected format of the override env var (host:port).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
Makefile(1 hunks)agent/agent.go(4 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/agent/configuration.md(1 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/application_test.go(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(2 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(12 hunks)test/e2e/fixture/toxyproxy.go(1 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- test/e2e/application_test.go
🚧 Files skipped from review as they are similar to previous changes (10)
- principal/resource.go
- test/e2e/fixture/argoclient.go
- test/e2e/clusterinfo_test.go
- docs/getting-started/kubernetes/index.md
- test/e2e/rp_test.go
- install/helm-repo/argocd-agent-agent/values.schema.json
- principal/tracker/tracking.go
- test/run-e2e.sh
- docs/configuration/agent/configuration.md
- test/e2e/README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/dev-env/start-agent-managed.shhack/dev-env/start-agent-autonomous.shhack/dev-env/configure-argocd-redis-tls.shhack/dev-env/start-principal.shhack/dev-env/Procfile.e2eMakefilehack/dev-env/start-e2e.sh
🧬 Code graph analysis (6)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
ClusterDetails(43-59)AgentManagedName(38-38)AgentClusterServerURL(40-40)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
cmd/argocd-agent/agent.go (2)
agent/options.go (3)
WithRedisTLSEnabled(112-117)WithRedisTLSInsecure(128-133)WithRedisTLSCAPath(120-125)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
apply(94-247)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
principal/redisproxy/redisproxy.go (1)
internal/logging/logging.go (3)
Error(305-307)Warn(300-302)Trace(285-287)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Analyze (go)
🔇 Additional comments (46)
test/e2e/fixture/fixture.go (9)
97-97: LGTM: Timeout increase improves test reliability.The doubled timeout (60→120 seconds) for AppProject synchronization is appropriate given the TLS overhead introduced in this PR and aligns with other timeout increases throughout the file.
110-156: LGTM: Consistent timeout increases for resource deletion.The wait loop increases (lines 113 and 144) from 60 to 120 iterations provide adequate time for finalizer-based cleanup in TLS-enabled environments. The two-phase approach (wait for finalizers, then force-remove if needed) remains sound.
161-172: LGTM: Timeout increase aligned with other wait functions.The wait loop increase (60→120 iterations) in
WaitForDeletionis consistent with the changes toEnsureDeletionand appropriate for TLS-enabled environments.
232-266: LGTM: DeepCopy prevents mutation and warning-based cleanup improves resilience.The use of
DeepCopy()at lines 236 and 261 correctly prevents mutation of loop variables when adjusting namespace/name for cross-cluster deletion waits. The warning-based error handling (instead of early returns) ensures cleanup continues even when individual resources fail, which is appropriate for test teardown logic.
278-292: LGTM: Consistent warning-based cleanup for remaining applications.The warning-based error handling for remaining applications (lines 278-279, 291-292) is consistent with the approach used earlier in the cleanup flow and ensures maximum cleanup coverage.
312-325: LGTM: Correct DeepCopy usage with proper name transformation.The
DeepCopy()at line 318 with the"agent-autonomous-"name prefix (line 319) correctly maps autonomous agent AppProjects to their principal-side counterparts. The warning-based error handling with explanatory comments improves clarity.
345-374: LGTM: Proper DeepCopy usage and consistent warning handling.The
DeepCopy()at line 351 correctly prevents loop variable mutation. The namespace adjustment to"argocd"for the managed agent is appropriate, and the warning-based error handling maintains consistency with the rest of the cleanup logic.
487-491: LGTM: Non-fatal Redis reset improves cleanup resilience.Treating Redis reset failures as warnings (line 489) rather than fatal errors is appropriate for test cleanup, especially when the Redis connection may be unavailable due to environment issues (e.g., port-forward termination). The explanatory message provides helpful context.
497-498: Verify the function rename is correct.The function call changed from
getCacheInstancetogetCachedCacheInstanceat line 497. Ensure thatgetCachedCacheInstanceexists with the expected signature and that no remaining references to the oldgetCacheInstancefunction are left elsewhere in the codebase. The error wrapping with%wat line 498 follows best practices for error chain preservation.test/e2e/redis_proxy_test.go (2)
184-184: Extended pod‑replacement Eventually window looks appropriateBumping the pod‑creation Eventually timeout to 60s with a 5s interval is a sensible adjustment given TLS + Redis + cluster variability; the values still look bounded and won’t excessively slow failures.
Also applies to: 402-402
211-237: ResourceTree Eventually with transient error handling looks solidWrapping the post‑deletion ResourceTree check in
requires.Eventuallywith explicit handling for non‑nil errors and nil trees is a good improvement. It should help tolerate transient EOF/Redis/SSE issues while still failing deterministically if the new pod never appears in the tree. The logging around retries is also useful for diagnosing flakes.Also applies to: 430-456
hack/dev-env/gen-redis-tls-certs.sh (2)
1-26: LGTM - Certificate generation structure is sound.The script properly uses
set -efor error handling, generates a 4096-bit RSA CA with 10-year validity, and is idempotent by checking for existing files before regeneration.
106-136: LGTM - Agent certificate loop is well-structured.The loop generates certificates for both
autonomousandmanagedagents with appropriate SANs. The idempotent checks for existing files are correct.hack/dev-env/configure-redis-tls.sh (2)
61-66: LGTM - Certificate validation is comprehensive.The validation now correctly checks for the server certificate, key, and CA certificate as suggested in past reviews.
198-229: LGTM - Redis password handling is correct.The script properly retrieves the Redis password from the secret, fails fast if missing, and correctly interpolates it into the JSON patch using shell variable expansion with proper quoting.
hack/dev-env/configure-argocd-redis-tls.sh (1)
316-325: LGTM - Replica guard logic correctly uses explicit if statements.The replica validation now properly handles both empty and "0" values, ensuring at least 1 replica is scaled up. This addresses the past review concern about shell operator precedence.
Makefile (1)
59-79: LGTM - Redis TLS setup sequence is well-organized.The setup follows a logical per-cluster pattern: certificate generation → Redis TLS → ArgoCD TLS for each vcluster. Make's default behavior will stop on first script failure, and the scripts use
set -einternally.docs/configuration/redis-tls.md (2)
1-49: LGTM - Documentation overview and architecture are clear.The introduction, architecture diagram, and TLS configuration points are well-documented and provide a clear understanding of the Redis TLS setup.
329-340: LGTM - Principal options table is accurate.The flag names, environment variables, and defaults are correctly documented, matching the implementation in the codebase.
cmd/argocd-agent/agent.go (3)
184-199: LGTM - Redis TLS configuration logic is well-structured.The mutual exclusion check prevents conflicting
--redis-tls-insecureand--redis-tls-ca-pathoptions. The security warning for insecure mode is appropriate.
241-250: LGTM - Redis TLS flags with secure defaults.TLS is enabled by default (
true), insecure mode is disabled by default (false), and the environment variable naming follows the establishedARGOCD_AGENT_*convention.
73-77: LGTM - Redis TLS variable declarations.The new TLS configuration variables are properly scoped within the command function alongside other configuration options.
hack/dev-env/start-agent-managed.sh (3)
37-46: LGTM!The Redis TLS certificate detection logic is clear and appropriate for the dev/e2e environment. The messaging guides developers to generate certificates when needed.
48-61: LGTM!The Redis address configuration appropriately defaults to
localhost:6381for local development, with clear documentation about port-forward requirements. This approach supports TLS certificate validation sincelocalhostis included in the certificate SANs.
76-89: LGTM!The agent startup command cleanly integrates TLS-related arguments through variable injection. The ordering and structure are appropriate for the managed agent mode.
hack/dev-env/start-agent-autonomous.sh (2)
37-61: LGTM!The Redis TLS detection and address configuration mirror the managed agent script with appropriate port differentiation (6382 for autonomous vs 6381 for managed). This supports running multiple agents locally with distinct port-forwards.
76-91: LGTM!The autonomous agent startup command properly integrates TLS arguments with mode-specific configuration (autonomous mode with distinct metrics/healthz ports).
hack/dev-env/start-principal.sh (3)
23-28: LGTM!The Redis address configuration appropriately relies on external port-forward management (via Procfile.e2e or manual setup), avoiding the port conflict that was addressed in previous review iterations.
42-43: LGTM!Setting a longer informer sync timeout (120s) for E2E tests is appropriate for CI environments where cluster startup and informer synchronization may be slower.
47-65: LGTM!The Redis TLS configuration is thorough, checking for all required files (cert, key, and CA) and properly configuring both server TLS (for incoming connections from Argo CD) and upstream CA (for connections to Redis). The documentation of SANs is helpful for understanding the certificate requirements.
agent/agent.go (3)
328-348: LGTM!The cluster cache TLS configuration is well-structured, with appropriate handling of insecure mode (with warning), CA certificate loading, and error propagation. The TLS 1.2 minimum version is a secure default.
350-354: LGTM!The cluster cache initialization cleanly integrates the TLS configuration, with appropriate error handling and assignment to the agent's clusterCache field.
448-465: LGTM!The cluster cache refresh logic is well-implemented with an immediate startup update followed by periodic updates via ticker. Both managed and autonomous agents appropriately send cluster cache info updates, and context cancellation is properly handled.
internal/argocd/cluster/cluster.go (2)
175-192: LGTM!The signature change to
NewClusterCacheInstancecleanly adds TLS configuration support. Since this is an internal package, the breaking change is acceptable. The TLS config is properly wired into the Redis client options.
135-141: LGTM!Initializing the ConnectionState when it doesn't exist yet is appropriate for handling the initial agent connection. The default values (Successful status, descriptive message, current timestamp) are reasonable.
hack/dev-env/start-e2e.sh (2)
50-56: LGTM!The Redis address configuration uses localhost with distinct ports for each component, which enables TLS certificate validation (localhost is included in the certificate SANs) while supporting multiple concurrent agents. This aligns with the port-forward setup in Procfile.e2e.
58-59: LGTM!The Redis password retrieval properly separates the assignment from the export, addressing the shellcheck warning that was raised in previous review comments.
hack/dev-env/Procfile.e2e (1)
1-7: LGTM!The Procfile.e2e cleanly orchestrates the E2E test environment:
- Port-forward entries provide Redis and Argo CD server access with distinct ports for each component
- Sleep delays ensure port-forwards establish before components start (3s for principal, 5s for agents)
- Environment variable passing enables per-agent Redis address configuration
This structure supports running multiple agents with TLS-enabled Redis connections in the E2E test environment.
cmd/argocd-agent/principal.go (5)
259-261: LGTM!The informer sync timeout is conditionally applied only when greater than zero, allowing the 0 default to use the internal default while supporting explicit override for slow environments. This addresses the previous comment about clarifying default semantics.
263-275: LGTM!The Redis server TLS configuration properly handles both path-based and secret-based modes with validation ensuring cert and key are provided together. The logging clearly indicates which source is being used.
277-304: LGTM!The upstream TLS validation correctly ensures mutual exclusivity between the three modes (insecure, CA from file, CA from secret). The exclusion of the default secret name from the validation count is appropriate because it allows the default to be used when no other mode is explicitly configured, while still catching conflicts when users explicitly set multiple modes.
434-459: LGTM!The CLI flags for informer sync timeout and Redis TLS configuration are well-documented with clear descriptions. Redis TLS is appropriately enabled by default for security, and the default secret name is consistent across related flags.
490-490: LGTM!Increasing the timeout from 2s to 30s for fetching resource proxy TLS configuration from Kubernetes is appropriate. The original timeout was tight and could cause spurious failures in busy clusters or CI environments.
principal/redisproxy/redisproxy.go (1)
157-200: LGTM!The TLS-enabled and plaintext listener setup is well-structured with clear conditional logic and appropriate logging for both modes.
test/e2e/fixture/cluster.go (2)
128-141: Use of cached Redis cache instances in Get*ClusterInfo looks goodSwitching
GetManagedAgentClusterInfoandGetPrincipalClusterInfoto usegetCachedCacheInstancereduces repeated client creation and matches the new cached client infrastructure. The additional debug logging around principal lookups and cache misses is also helpful for E2E failures.Also applies to: 157-171
173-216: Verify TLS error handling and code duplication patterns in cluster.goThe review suggests two improvements to the TLS configuration code in
getCacheInstance:
- Restrict
os.Staterror handling to explicitly checkos.IsNotExist(err)rather than treating all errors as "file not found", and fail fast on permission or transient errors- Extract duplicated TLS setup logic (for principal and managed-agent branches) into a helper function to reduce duplication and keep both call sites in sync
The suggestions are sound best practices, but require verification that:
- The referenced lines 224-256 and 261-267 contain substantially identical TLS configuration code
- The current error handling genuinely treats all
os.Staterrors as "missing file"- These improvements align with the test fixture's actual implementation
ada2bb0 to
bd9c416
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
hack/dev-env/start-agent-managed.sh (1)
63-74: Verify error handling for certificate extraction.A previous review requested adding error handling to the kubectl certificate extraction commands (lines 68-73), and the review was marked as addressed. However, the current code still lacks explicit error checks. Please verify that proper error handling is in place or consider adding it to prevent silent failures:
kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \ - -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" + -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" || { echo "Failed to extract TLS cert"; exit 1; }test/e2e/fixture/cluster.go (1)
173-317: Consider factoring TLS configuration and revisiting cached client cleanup.Two small, non-blocking improvements to keep this fixture maintainable:
The TLS wiring for principal and managed-agent Redis in
getCacheInstanceis nearly identical (min version, CA loading with optional InsecureSkipVerify). Pulling that into a small helper likebuildTLSConfig(enabled bool, caPath string, who string) *tls.Configwould avoid drift if you later tweak verification behavior for one side.
CleanupRedisCachedClientscurrently just clears thecachedRedisClientsmap and relies on GC to close underlying connections. Ifappstatecache.Cacheever exposes aClose()or if you decide to track the underlying*redis.Clientalongside the cache, this would be the natural place to explicitly close them before resetting the map.Neither is urgent, but both would make future TLS changes and resource management a bit safer.
🧹 Nitpick comments (6)
test/e2e/fixture/toxyproxy.go (1)
119-134: Dynamic principal readiness timeout looks good; consider centralizing the 120s constantThe new
timeoutlogic (120s by default, 180s forprincipal) aligns with the comment about informer sync and should help reduce principal readiness flakes without impacting other components.As a minor improvement only if convenient, consider sourcing the
120 * time.Secondvalue from a shared constant or config (if one already exists for the principal informer sync timeout), so this check automatically tracks future changes to that timeout instead of relying on a duplicated magic number and comment.test/e2e/logs_test.go (2)
118-120: Consider polling for readiness instead of a hard sleep.The 15-second sleep addresses timing issues but makes the test unconditionally slower. The comment suggests potential test isolation problems ("recover from previous test state"). Consider polling for a specific readiness condition (e.g., checking if the log streaming endpoint responds, or verifying agent connectivity) instead of an arbitrary delay.
If a polling target is unclear in the E2E environment, you could combine a shorter initial delay with a readiness check:
- // Wait for log streaming proxy to be ready (especially when running after other tests) - // The managed agent needs more time to recover from previous test state - time.Sleep(15 * time.Second) + // Wait for log streaming proxy to be ready with exponential backoff + backoff := 1 * time.Second + for i := 0; i < 5; i++ { + // Quick health check: attempt to fetch fresh app metadata + testApp := &v1alpha1.Application{} + if err := suite.PrincipalClient.Get(suite.Ctx, types.NamespacedName{Namespace: "agent-managed", Name: appName}, testApp, metav1.GetOptions{}); err == nil { + break + } + time.Sleep(backoff) + backoff *= 2 + }
243-245: Hard sleep suggests connection cleanup issues; consider investigating proper teardown.The 5-second sleep to allow log streaming connections to close suggests that TLS-enabled connections may not be closing promptly. This could indicate missing context cancellation, improper defer statements, or connection pool cleanup issues in the log streaming implementation.
Consider investigating the log streaming connection lifecycle to ensure proper cleanup. If connections aren't closing immediately:
- Verify that contexts passed to log streaming are properly canceled.
- Check if Redis TLS connections have appropriate timeouts and are being closed in defer statements.
- Consider if connection pooling in TLS mode requires explicit drain/close calls.
If a sleep is necessary for E2E stability, at least reduce it and add a comment explaining the specific resource being awaited:
- // Allow time for log streaming connections to fully close before next test - time.Sleep(5 * time.Second) + // Brief delay to allow Redis TLS connections to fully close + // TODO: Investigate proper connection cleanup to eliminate this sleep + time.Sleep(2 * time.Second)test/e2e/fixture/argoclient.go (1)
387-409: Add LoadBalancer ingress IP as a fallback inGetArgoCDServerEndpoint.If
spec.LoadBalancerIPis empty and the service has only an ingress IP (no hostname),argoEndpointwill stay empty and callers will fail. Consider also falling back toIngress[0].IP:- argoEndpoint := srvService.Spec.LoadBalancerIP - if len(srvService.Status.LoadBalancer.Ingress) > 0 { - if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" { - argoEndpoint = hostname - } - } + argoEndpoint := srvService.Spec.LoadBalancerIP + if len(srvService.Status.LoadBalancer.Ingress) > 0 { + ingress := srvService.Status.LoadBalancer.Ingress[0] + if ingress.Hostname != "" { + argoEndpoint = ingress.Hostname + } else if ingress.IP != "" { + argoEndpoint = ingress.IP + } + }This keeps the env override behavior while making the K8s-based path more robust.
hack/dev-env/configure-redis-tls.sh (1)
68-71: Consider surfacing a clear error whenkubectl config use-contextfails.With
set -e, a bad context will stop the script but without an explicit message. A small guard improves UX:echo "Switching to context: ${CONTEXT}" -kubectl config use-context ${CONTEXT} +kubectl config use-context ${CONTEXT} || { + echo "Error: Failed to switch to context ${CONTEXT}" >&2 + exit 1 +}This makes it obvious why the script exited when the context is misconfigured.
test/run-e2e.sh (1)
88-121: Tighten macOS port-forward detection to verify all three ports.Using a single
lsof -i :6380 -i :6381 -i :6382only guarantees that at least one of the ports is open. You can have a misconfigured start-e2e (e.g., 6380 only) and still pass this check.Consider checking each port individually:
-if [[ "$OSTYPE" == "darwin"* ]]; then - if ! lsof -i :6380 -i :6381 -i :6382 >/dev/null 2>&1; then +if [[ "$OSTYPE" == "darwin"* ]]; then + missing=0 + for port in 6380 6381 6382; do + if ! lsof -i ":${port}" >/dev/null 2>&1; then + echo " Port-forward for localhost:${port} not detected" + missing=1 + fi + done + + if [[ $missing -ne 0 ]]; then echo "" echo " WARNING: Port-forwards not detected!" @@ - sleep 3 - else - echo "✓ Port-forwards detected (localhost:6380, 6381, 6382)" - echo "" - fi + sleep 3 + else + echo "✓ Port-forwards detected (localhost:6380, 6381, 6382)" + echo "" + fiThis makes the warning trigger whenever any of the three Redis forwards is missing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
Makefile(1 hunks)agent/agent.go(4 hunks)cmd/argocd-agent/agent.go(3 hunks)cmd/argocd-agent/principal.go(4 hunks)docs/configuration/agent/configuration.md(1 hunks)docs/configuration/redis-tls.md(1 hunks)docs/getting-started/kubernetes/index.md(3 hunks)hack/dev-env/Procfile.e2e(1 hunks)hack/dev-env/configure-argocd-redis-tls.sh(1 hunks)hack/dev-env/configure-redis-tls.sh(1 hunks)hack/dev-env/gen-redis-tls-certs.sh(1 hunks)hack/dev-env/start-agent-autonomous.sh(1 hunks)hack/dev-env/start-agent-managed.sh(1 hunks)hack/dev-env/start-e2e.sh(1 hunks)hack/dev-env/start-principal.sh(2 hunks)install/helm-repo/argocd-agent-agent/values.schema.json(1 hunks)internal/argocd/cluster/cluster.go(3 hunks)principal/redisproxy/redisproxy.go(5 hunks)principal/resource.go(1 hunks)principal/tracker/tracking.go(1 hunks)test/e2e/README.md(1 hunks)test/e2e/clusterinfo_test.go(2 hunks)test/e2e/fixture/argoclient.go(3 hunks)test/e2e/fixture/cluster.go(9 hunks)test/e2e/fixture/fixture.go(12 hunks)test/e2e/fixture/toxyproxy.go(1 hunks)test/e2e/logs_test.go(3 hunks)test/e2e/redis_proxy_test.go(6 hunks)test/e2e/rp_test.go(2 hunks)test/run-e2e.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
- hack/dev-env/start-agent-autonomous.sh
- install/helm-repo/argocd-agent-agent/values.schema.json
- test/e2e/rp_test.go
- hack/dev-env/start-principal.sh
- hack/dev-env/start-e2e.sh
- hack/dev-env/gen-redis-tls-certs.sh
- internal/argocd/cluster/cluster.go
- hack/dev-env/configure-argocd-redis-tls.sh
- hack/dev-env/Procfile.e2e
- Makefile
- docs/configuration/agent/configuration.md
- test/e2e/redis_proxy_test.go
- test/e2e/clusterinfo_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.
Applied to files:
hack/dev-env/start-agent-managed.shtest/run-e2e.shtest/e2e/README.md
🧬 Code graph analysis (5)
test/e2e/fixture/fixture.go (3)
internal/logging/logfields/logfields.go (1)
Name(59-59)internal/backend/interface.go (1)
Namespace(124-127)test/e2e/fixture/cluster.go (3)
ClusterDetails(43-59)AgentManagedName(38-38)AgentClusterServerURL(40-40)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
NewClusterCacheInstance(176-192)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
Event(115-118)internal/logging/logfields/logfields.go (1)
Event(34-34)
cmd/argocd-agent/agent.go (4)
agent/options.go (3)
WithRedisTLSEnabled(112-117)WithRedisTLSInsecure(128-133)WithRedisTLSCAPath(120-125)principal/options.go (1)
WithRedisTLSEnabled(493-498)internal/logging/logging.go (1)
Warn(300-302)internal/env/env.go (2)
BoolWithDefault(30-39)StringWithDefault(46-55)
principal/redisproxy/redisproxy.go (2)
principal/listen.go (1)
Listener(55-61)internal/logging/logging.go (3)
Error(305-307)Warn(300-302)Trace(285-287)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md
150-150: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
486-486: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
504-504: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/getting-started/kubernetes/index.md
233-233: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build and push image
- GitHub Check: Run end-to-end tests
- GitHub Check: Run unit tests
- GitHub Check: Lint Go code
- GitHub Check: Build & cache Go code
- GitHub Check: Analyze (go)
🔇 Additional comments (17)
test/e2e/logs_test.go (2)
240-240: Verify the need for 2x timeout increase in autonomous mode.The timeout increase from 30s to 60s (and interval from 1s to 2s) indicates TLS overhead in autonomous mode as well, though less severe than managed mode (2x vs 3x). Since autonomous agents have a simpler connectivity model (direct connection without proxy), this degradation suggests Redis TLS handshake or connection setup delays are affecting both agent types.
Consider gathering metrics on log fetch latency before and after TLS enablement to quantify the performance impact and determine if further optimization is needed.
125-139: Fetching fresh app metadata is a sound improvement; timeout increases warrant investigation.The logic to fetch fresh application data before requesting logs ensures current metadata and resource versions are used, reducing stale-data issues.
However, the timeout increase from 30s to 90s and polling interval from 1s to 3s represents a significant 3x change. If this was added due to TLS-related overhead, consider investigating the root cause:
- Whether managed agent's Redis TLS connection setup is adding unexpected latency.
- If there are connection pooling or keep-alive issues with TLS-enabled Redis.
- Whether certificate validation is causing delays.
Rather than scaling timeouts indefinitely, understanding the performance impact of the TLS changes would help ensure the solution is robust for production use.
principal/resource.go (1)
42-42: LGTM!The timeout increase from 10s to 30s is appropriate given the addition of TLS handshakes and potentially higher latency Redis operations in the resource request path. This aligns with timeout values used elsewhere in the TLS implementation.
principal/tracker/tracking.go (1)
75-78: LGTM!The change from an unbuffered to a buffered channel (capacity 1) is appropriate for preventing potential deadlocks in the request-response pattern. The inline comment clearly documents the rationale, which is helpful for future maintainers.
cmd/argocd-agent/agent.go (1)
184-199: LGTM!The Redis TLS configuration logic is well-structured with proper validation ensuring mutual exclusivity between insecure mode and CA-based validation. The warning logs for insecure mode are appropriate security reminders.
agent/agent.go (2)
328-354: LGTM!The TLS configuration for the cluster cache is properly implemented with appropriate security controls:
- Minimum TLS version enforcement (TLS 1.2)
- Warning log when insecure mode is used
- Proper CA certificate loading and validation
- Clear error propagation
The code correctly mirrors the TLS configuration pattern used elsewhere in the codebase.
450-465: LGTM!The cluster cache info update goroutine is well-structured:
- Performs an immediate update on startup before waiting for the first tick (line 453)
- Uses the validated
cacheRefreshIntervalwith a sensible default (30s)- Properly respects context cancellation for clean shutdown
This pattern ensures the principal receives initial cluster state promptly rather than waiting for the first interval.
principal/redisproxy/redisproxy.go (3)
98-154: LGTM!The TLS configuration API is well-designed with clear separation of concerns:
- Server TLS configuration (cert/key for incoming connections)
- Upstream TLS configuration (CA for outgoing connections to Redis)
- Support for both in-memory certificates and file-based loading
The
createServerTLSConfighelper properly handles both configuration sources and enforces TLS 1.2 minimum.
162-183: LGTM!The dual-mode startup (TLS vs plaintext) is clearly implemented with appropriate logging at each path. The TLS configuration is created and applied before starting the listener, ensuring security is enforced from the first connection.
836-926: LGTM! Excellent TLS implementation.The upstream connection establishment is comprehensive and addresses all previous review concerns:
Security improvements:
- Dial timeout prevents indefinite hangs (line 847)
- TLS handshake deadline prevents stalled connections (line 904)
- Security warning when server TLS enabled but upstream TLS not configured (lines 858-862)
- Warning when CA configuration ignored due to insecure mode (lines 874-877)
Design improvements:
- Upstream TLS decoupled from server TLS (line 866) - allows independent configuration
- Proper SNI hostname extraction for TLS (lines 896-901)
- Deadline cleared after successful handshake (lines 916-919)
- Support for CA from pool, file path, or insecure mode
The implementation correctly handles all TLS scenarios while maintaining clear error messages and security warnings.
cmd/argocd-agent/principal.go (3)
259-261: LGTM!The informer sync timeout configuration is cleanly implemented, only applying the option when a non-zero timeout is explicitly set. This allows the server to use its internal default (60s) when not specified.
263-304: Redis TLS configuration is well-structured.The Redis TLS setup properly handles:
- Server TLS for incoming connections from Argo CD (cert/key from file or secret)
- Upstream TLS for connections to Redis (CA from file, secret, or insecure mode)
- Appropriate warning logs for insecure mode
- Input validation for cert/key pairs
The configuration options provide good flexibility for different deployment scenarios.
277-291: Clarify validation logic for default secret name.The mutual exclusivity validation on lines 286-287 excludes the default secret name
"argocd-redis-tls"from the mode count. This allows users to specify--redis-upstream-ca-pathwhile the secret name defaults to"argocd-redis-tls", and validation passes withmodesSet=1.This creates an inconsistency between the validation logic and the error message: the message states "Only one mode can be specified," but the code treats the default secret as non-exclusive. Clarify whether:
- The default secret should count toward the mode limit (making
--redis-upstream-ca-path+ default secret mutually exclusive), OR- The default secret is intentionally a fallback that doesn't count as a mode (in which case, update the error message to reflect this)
docs/configuration/redis-tls.md (1)
150-150: [Rewritten review comment]
[Classification tag]test/e2e/README.md (1)
83-105: Redis TLS section and script references look consistent.The Redis TLS section now accurately reflects the new dev-env scripts (
gen-redis-tls-certs.sh,configure-redis-tls.sh,configure-argocd-redis-tls.sh) and the requirement that TLS be enabled for all E2E runs. No changes needed here.docs/getting-started/kubernetes/index.md (1)
159-234: Redis TLS getting-started steps are technically sound and aligned with the code.The new Redis TLS sections (2.4 and 4.4) correctly:
- Generate a CA and server certs with appropriate SANs,
- Create the
argocd-redis-tlssecret withtls.crt,tls.key, andca.crt,- Patch the
argocd-redisdeployment to use TLS-only on port 6379, and- Reuse the same CA across control-plane and workload clusters.
The
REDIS_PASSWORDhandling in the JSON patch is now shell-expanded correctly. The added “Redis TLS Configuration” link at the bottom ties this doc into the deeper configuration guide. No further changes required.Also applies to: 341-390, 655-655
test/e2e/fixture/fixture.go (1)
108-172: Improved cleanup robustness and safer object handling look good.
- Extending
EnsureDeletion/WaitForDeletionto 120s and stripping finalizers on timeout makes test teardown more resilient to slow clusters.- Using
DeepCopy()for Applications and AppProjects before tweaking namespace/name avoids mutating loop variables and is a nice safety improvement.- The new
resetManagedAgentClusterInfocall is correctly treated as best-effort, so transient Redis/port-forward issues don’t cause test failures.No changes needed here.
Also applies to: 230-267, 295-375, 487-500
Assisted-by: Cursor Signed-off-by: Rizwana777 <[email protected]>
bd9c416 to
c6242e3
Compare
fixes #454
JIRA - https://issues.redhat.com/browse/GITOPS-8091
gen-redis-tls-certs.sh:
Generates CA and Redis server TLS certificates with appropriate SANs for all vclusters
configure-redis-tls.sh:
Patches Redis deployments to enable TLS-only mode and creates the argocd-redis-tls secret
configure-argocd-redis-tls.sh:
Configures Argo CD components (server, repo-server, application-controller) to connect to Redis using TLS
E2E tests use
InsecureSkipVerify: trueto skip certificate validation while maintaining TLS encryption, simplifying automated testing with dynamic LoadBalancer addresses that don't match certificate SANs. Please let me know if this is incorrect and need to be changedSummary by CodeRabbit
New Features
Documentation
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.