Skip to content

Conversation

@Rizwana777
Copy link
Collaborator

@Rizwana777 Rizwana777 commented Nov 27, 2025

fixes #454

JIRA - https://issues.redhat.com/browse/GITOPS-8091

gen-redis-tls-certs.sh:
Generates CA and Redis server TLS certificates with appropriate SANs for all vclusters

configure-redis-tls.sh:
Patches Redis deployments to enable TLS-only mode and creates the argocd-redis-tls secret

configure-argocd-redis-tls.sh:
Configures Argo CD components (server, repo-server, application-controller) to connect to Redis using TLS

E2E tests use InsecureSkipVerify: true to skip certificate validation while maintaining TLS encryption, simplifying automated testing with dynamic LoadBalancer addresses that don't match certificate SANs. Please let me know if this is incorrect and need to be changed

Summary by CodeRabbit

  • New Features

    • End-to-end Redis TLS support: secure Redis connections for principal, agents, and proxy (server/upstream), with file- or secret-based certs and an optional insecure/dev mode.
    • Improved local/E2E tooling to generate certs, configure TLS, and run TLS-enabled test environments.
  • Documentation

    • New Redis TLS guide and updated Kubernetes getting-started steps covering cert generation, installation, and verification.
  • Configuration

    • New CLI flags, environment variables, Helm values, and manifest options to enable and supply Redis TLS and related volume mounts.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 27, 2025

Caution

Review failed

An error occurred during the review process. Please try again later.

Walkthrough

Adds end-to-end Redis TLS support: CLI flags, option wiring, TLS config propagation to Redis clients/proxy/cluster cache, Kubernetes manifests and Helm values, dev scripts for cert generation and TLS setup, and E2E/test updates to use and validate Redis TLS.

Changes

Cohort / File(s) Summary
Agent code & options
agent/agent.go, agent/inbound_redis.go, agent/options.go
Add Redis TLS flags/fields, CA path and insecure options; load CA file into tls.Config; pass tls.Config into cluster cache; add AgentOption helpers.
Agent CLI wiring
cmd/argocd-agent/agent.go
Add CLI flags for Redis TLS, validate mutual exclusivity, and wire options into agent startup.
Principal server & proxy
cmd/argocd-agent/principal.go, principal/options.go, principal/server.go, principal/redisproxy/redisproxy.go
Add server- and upstream-TLS surfaces, load certs/CA from paths or secrets, start TLS listener when enabled, wrap upstream connections with TLS (CA, CA path, or insecure), SNI & handshake handling.
Cluster cache / manager
internal/argocd/cluster/cluster.go, internal/argocd/cluster/manager.go, internal/argocd/cluster/*_test.go
Extend NewClusterCacheInstance / NewManager signatures to accept tls.Config; wire tlsConfig into Redis client options; update tests to new arity and compression arg.
Kubernetes manifests & Helm
install/kubernetes/.../agent-deployment.yaml, install/kubernetes/.../principal-deployment.yaml, install/helm-repo/.../agent-params-cm.yaml, install/helm-repo/.../agent-deployment.yaml, install/helm-repo/.../values.yaml, install/helm-repo/.../values.schema.json, install/helm-repo/.../README.md
Add env vars for Redis TLS, mount TLS secret volumes, add TLS keys to params ConfigMaps and Helm values/schema, add networkPolicy block and defaults.
Dev / E2E scripts
hack/dev-env/gen-redis-tls-certs.sh, hack/dev-env/configure-redis-tls.sh, hack/dev-env/configure-argocd-redis-tls.sh, hack/dev-env/start-*.sh, hack/dev-env/Procfile.e2e, hack/dev-env/start-e2e.sh
Add cert generation and cluster TLS configuration scripts; update E2E/dev startup scripts to support TLS, port-forward defaults, and new env vars.
E2E fixtures & tests
test/e2e/fixture/*, test/e2e/*_test.go, test/run-e2e.sh, test/e2e/README.md
Enable TLS-by-default in fixtures, add cached Redis clients with tls.Config, increase timeouts, buffer SSE channels, improve HTTP client settings, add TLS validation checks for E2E runs.
Docs
docs/configuration/redis-tls.md, docs/getting-started/kubernetes/index.md, docs/configuration/agent/configuration.md
New Redis TLS documentation, Kubernetes TLS setup steps, and minor doc formatting; link TLS guidance into getting-started.
Misc / small changes
principal/resource.go, principal/tracker/tracking.go, various tests
Increase resource request timeout, make an event channel buffered, and propagate timeout/test adjustments for TLS runs.

Sequence Diagram(s)

sequenceDiagram
    participant Argo as ArgoCD (server/repo)
    participant Proxy as Redis Proxy (principal)
    participant RedisP as Redis (control-plane)
    participant Agent as Agent-side Redis (workload)

    Argo->>Proxy: Connect (TLS) to proxy endpoint
    Note over Proxy: createServerTLSConfig() -> load cert/key
    Proxy->>Argo: TLS Handshake (server cert)
    Argo->>Proxy: Redis protocol over TLS

    Proxy->>RedisP: Dial TCP -> wrap with TLS (upstream)
    Note over Proxy: load CA pool or CA path or set InsecureSkipVerify
    Proxy->>RedisP: TLS Handshake (SNI set)
    RedisP->>Proxy: Handshake OK

    Argo->>Proxy: AUTH / GET/SET (encrypted)
    Proxy->>RedisP: Forward command (encrypted)
    RedisP->>Proxy: Response
    Proxy->>Argo: Response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

  • Areas to inspect closely:
    • TLS propagation and signature changes: agent/agent.go, internal/argocd/cluster/manager.go, internal/argocd/cluster/cluster.go
    • Upstream TLS handshake, SNI handling, CA loading, and deadline logic: principal/redisproxy/redisproxy.go
    • CLI/option wiring and secret-from-Kubernetes loading: principal/options.go, cmd/argocd-agent/principal.go, cmd/argocd-agent/agent.go
    • Dev scripts idempotency and Kubernetes patching/rollout logic: hack/dev-env/*.sh
    • E2E fixture caching and test timeout changes: test/e2e/fixture/*, test/e2e/*_test.go

Possibly related PRs

Suggested reviewers

  • chetan-rns
  • jgwest
  • mikeshng
  • jannfis

Poem

🐇 I hopped through certs and tiny key,

I signed a CA beneath the tree,
Now Redis whispers safe and snug,
Encrypted bytes—a cozy hug,
Hop, hop, TLS—secure and free 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary feature: enabling Redis TLS encryption by default across all connections. It is specific, concise, and reflects the main objective of the changeset.
Linked Issues check ✅ Passed The PR comprehensively implements all coding requirements from issue #454: adds TLS config parameters to agent/principal [agent/options.go, principal/options.go, cmd/argocd-agent/], updates default manifests with TLS enabled [install/kubernetes/, install/helm-repo/], integrates TLS in E2E tests [test/e2e/], adds scripts for certificate generation and TLS configuration [hack/dev-env/*.sh], and enables Redis TLS for all components.
Out of Scope Changes check ✅ Passed All changes are in-scope and aligned with issue #454. Minor non-TLS improvements (timeout adjustments, buffered channels, port-forward setup) directly support TLS infrastructure testing and are necessary for E2E test stability with the new TLS configuration.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 4 times, most recently from 5743de3 to 40254ae Compare November 27, 2025 11:43
@codecov-commenter
Copy link

codecov-commenter commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 5.95611% with 300 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.04%. Comparing base (09ce442) to head (c6242e3).

Files with missing lines Patch % Lines
principal/redisproxy/redisproxy.go 0.00% 100 Missing ⚠️
cmd/argocd-agent/principal.go 0.00% 63 Missing ⚠️
principal/options.go 0.00% 32 Missing ⚠️
principal/server.go 6.25% 28 Missing and 2 partials ⚠️
agent/agent.go 22.58% 23 Missing and 1 partial ⚠️
cmd/argocd-agent/agent.go 0.00% 22 Missing ⚠️
agent/inbound_redis.go 0.00% 16 Missing and 1 partial ⚠️
agent/options.go 0.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #664      +/-   ##
==========================================
- Coverage   46.15%   45.04%   -1.11%     
==========================================
  Files          92       92              
  Lines       10689    10973     +284     
==========================================
+ Hits         4933     4943      +10     
- Misses       5259     5529     +270     
- Partials      497      501       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 40254ae to 3df4a33 Compare November 27, 2025 12:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (11)
hack/dev-env/start-principal.sh (1)

23-43: Port-forward setup looks good; consider addressing the shellcheck hint.

The port-forward logic is sound. The shellcheck warning (SC2064) about using double quotes in the trap is a false positive here since $PORT_FORWARD_PID is set once and won't change. However, you could use single quotes for consistency with shellcheck best practices:

-       trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT
+       trap 'kill $PORT_FORWARD_PID 2>/dev/null || true' EXIT

Note: With single quotes, the variable will expand when the trap is triggered rather than when it's set, but in this case both work correctly since the PID doesn't change.

internal/argocd/cluster/cluster_test.go (1)

31-37: Cluster manager tests correctly updated for new constructor signature

All NewManager invocations now provide the Redis compression type and a trailing nil TLS config, matching the new constructor; existing test behavior is preserved. If you later want more coverage for the Redis TLS path introduced in this PR, adding dedicated tests around a non-nil TLS config in another test file would be a good follow-up.

Also applies to: 223-226, 303-305

agent/inbound_redis.go (1)

345-372: Consider using TLS 1.3 as minimum version.

The TLS configuration sets MinVersion: tls.VersionTLS12, but for new implementations in 2025, TLS 1.3 should be preferred as the minimum version for better security. TLS 1.2 has known vulnerabilities in certain configurations.

Apply this diff:

 	if a.redisProxyMsgHandler.redisTLSEnabled {
 		tlsConfig = &tls.Config{
-			MinVersion: tls.VersionTLS12,
+			MinVersion: tls.VersionTLS13,
 		}

That said, the CA loading logic and error handling are well-implemented with appropriate warnings for insecure mode and system CA fallback.

test/run-e2e.sh (1)

61-66: Use jq for structured JSON parsing instead of grep.

Line 62 uses grep "tls-port" on JSON output, which is fragile and could produce false positives (e.g., matching in comments, annotations, or labels).

Replace with structured JSON querying using jq:

-        # Check if Redis is configured with TLS (it's a Deployment, not StatefulSet)
-        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | grep -q "tls-port"; then
+        # Check if Redis is configured with TLS
+        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | \
+             jq -e '.spec.template.spec.containers[].args[] | select(contains("--tls-port"))' >/dev/null; then
             echo "ERROR: Redis Deployment in ${CONTEXT} is not configured with TLS!"
             echo "Please run: ./hack/dev-env/configure-redis-tls.sh ${CONTEXT}"
             exit 1
         fi

This approach reliably checks for the --tls-port argument in the container args array.

hack/dev-env/gen-redis-tls-certs.sh (1)

17-17: Consider ECDSA keys for better performance.

The script generates 4096-bit RSA keys, which are secure but relatively slow. For development and testing, consider using ECDSA P-256 keys instead, which provide equivalent security with better performance and smaller certificate sizes.

Example:

-    openssl genrsa -out "${CREDS_DIR}/ca.key" 4096
+    openssl ecparam -genkey -name prime256v1 -out "${CREDS_DIR}/ca.key"

This is optional for a dev/test certificate generation script, but ECDSA is increasingly preferred in modern TLS implementations.

test/e2e/fixture/cluster.go (1)

40-54: E2E Redis TLS wiring is correct; consider small helper for TLSConfig

The new *RedisTLSEnabled fields and getCacheInstance TLSConfig setup give tests a clear, deterministic TLS path (TLS 1.2, InsecureSkipVerify only in e2e). Defaulting both TLSEnabled flags to true in the config helpers matches the “TLS-only e2e” objective. You might later factor the repeated TLSConfig construction for principal/managed into a tiny helper, but it’s not required.

Also applies to: 165-204, 251-268, 273-315

internal/argocd/cluster/cluster.go (1)

17-32: TLS parameterization of cluster cache is clean and backwards-compatible

Extending NewClusterCacheInstance with a *tls.Config and wiring it directly into redis.Options.TLSConfig cleanly enables TLS while keeping nil as the “no TLS” path. Callers now own policy, which is appropriate. Consider updating any GoDoc on this function to mention the new TLS behavior, but the implementation itself looks solid.

Also applies to: 168-178

hack/dev-env/configure-argocd-redis-tls.sh (1)

1-201: Dev script works; consider restoring context and clarifying the banner

The script does what it needs for dev/e2e, but two improvements would help:

  1. Context restorationkubectl config use-context ${CONTEXT} permanently switches the user’s context. Mirroring hack/dev-env/configure-redis-tls.sh by capturing the original context and restoring it in a trap would make this safer to run manually.
  2. Clarify the “proper TLS certificate validation” note – Redis connections are indeed validated via --redis-use-tls and --redis-ca-certificate, but argocd-server is started with --insecure, which weakens client→server TLS. Rewording the banner to “Using proper Redis TLS certificate validation (server is insecure for dev only)” would avoid confusion.

These are UX/docs-level tweaks; the functional Redis TLS wiring looks fine.

principal/server.go (1)

349-372: Redis proxy and cluster-manager TLS wiring is coherent and option-driven

The server now cleanly drives Redis TLS from ServerOptions: redisProxy is toggled via redisTLSEnabled, with clear precedence for server cert sources (path vs in‑memory) and upstream verification (insecure vs CA path vs CA pool). The cluster manager reuses the same upstream TLS knobs to build clusterMgrRedisTLSConfig and passes it down to cluster.NewManager, so its Redis cache observes the same trust policy. Error paths on CA file read/parse are explicit and early, which is good.

If you want extra transparency, you could log a brief message when TLS is enabled but neither redisUpstreamTLSInsecure, redisUpstreamTLSCA, nor redisUpstreamTLSCAPath are set (i.e., relying on system CAs), but that’s optional.

Also applies to: 400-427

agent/agent.go (1)

17-24: Cluster cache Redis TLS follows agent Redis TLS options; consider minor reuse/logging tweaks

The new clusterCacheTLSConfig correctly mirrors the agent’s Redis TLS options (enabled flag, insecure mode, CA path) and feeds them into NewClusterCacheInstance, so the cluster cache honors the same security posture as the main Redis client. Error handling on CA read/parse is clear and fails fast.

Two optional refinements to consider later:

  • Factor the TLSConfig construction shared between this file and agent/inbound_redis.go into a small helper to keep behavior perfectly in sync.
  • When TLS is enabled but no CA path is set and redisTLSInsecure is false, add a log line indicating that system CAs are being used for the cluster cache as well (to match the visibility you already give the main Redis client).

Also applies to: 323-345

principal/redisproxy/redisproxy.go (1)

131-165: Remove unused key parsing.

Line 154 parses the private key but never uses the result. This validation is redundant since the key has already been marshaled from a valid crypto.PrivateKey interface at line 145.

Apply this diff to remove the dead code:

 		cert.Certificate = [][]byte{certDER}
 		cert.PrivateKey = rp.tlsServerKey
 		cert.Leaf = rp.tlsServerCert
-
-		// Try to parse the key
-		if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil {
-			return nil, fmt.Errorf("failed to parse private key: %w", err)
-		}
 	} else {
 		return nil, fmt.Errorf("no TLS certificate configured")
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 101d4c8 and 3df4a33.

📒 Files selected for processing (37)
  • Makefile (1 hunks)
  • agent/agent.go (2 hunks)
  • agent/inbound_redis.go (3 hunks)
  • agent/options.go (1 hunks)
  • agent/outbound_test.go (1 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (3 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (2 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/README.md (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.yaml (1 hunks)
  • install/kubernetes/agent/agent-deployment.yaml (3 hunks)
  • install/kubernetes/agent/agent-params-cm.yaml (1 hunks)
  • install/kubernetes/principal/principal-deployment.yaml (3 hunks)
  • install/kubernetes/principal/principal-params-cm.yaml (1 hunks)
  • internal/argocd/cluster/cluster.go (2 hunks)
  • internal/argocd/cluster/cluster_test.go (3 hunks)
  • internal/argocd/cluster/informer_test.go (6 hunks)
  • internal/argocd/cluster/manager.go (3 hunks)
  • internal/argocd/cluster/manager_test.go (3 hunks)
  • principal/options.go (2 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/server.go (3 hunks)
  • test/e2e/README.md (2 hunks)
  • test/e2e/fixture/cluster.go (5 hunks)
  • test/run-e2e.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/run-e2e.sh
  • Makefile
  • hack/dev-env/start-e2e.sh
  • install/helm-repo/argocd-agent-agent/values.yaml
  • test/e2e/README.md
  • hack/dev-env/Procfile.e2e
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
  • install/kubernetes/agent/agent-params-cm.yaml
  • install/kubernetes/agent/agent-deployment.yaml
🧬 Code graph analysis (12)
cmd/argocd-agent/agent.go (2)
agent/options.go (3)
  • WithRedisTLSEnabled (112-117)
  • WithRedisTLSInsecure (128-133)
  • WithRedisTLSCAPath (120-125)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
agent/inbound_redis.go (2)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/manager_test.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
principal/server.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (169-185)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (169-185)
principal/options.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/server.go (1)
  • Server (72-164)
internal/tlsutil/kubernetes.go (2)
  • TLSCertFromSecret (44-66)
  • X509CertPoolFromSecret (106-128)
agent/outbound_test.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
🪛 Shellcheck (0.11.0)
hack/dev-env/start-e2e.sh

[warning] 58-58: Declare and assign separately to avoid masking return values.

(SC2155)

hack/dev-env/start-principal.sh

[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build and push image
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Analyze (go)
🔇 Additional comments (43)
hack/dev-env/start-agent-managed.sh (3)

37-46: LGTM!

The TLS certificate detection and conditional enablement logic is clear and well-structured. The user guidance for missing certificates is helpful.


48-61: LGTM!

The Redis address configuration is well-documented and provides clear guidance for local development with TLS. The default localhost address appropriately aligns with certificate SANs.


66-67: LGTM!

The TLS and address arguments are correctly injected into the agent startup command.

hack/dev-env/start-principal.sh (2)

58-74: LGTM!

The TLS certificate detection correctly handles both server-side certificates (for the Redis proxy) and upstream CA validation. The logic is sound and well-documented.


82-82: LGTM!

The TLS arguments are correctly injected into the principal startup command.

Makefile (1)

59-70: LGTM!

The Redis TLS setup sequence is well-structured and correctly configures TLS for all three vclusters. The messaging clearly indicates that TLS is required for E2E tests, aligning with the PR objectives.

install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)

93-101: LGTM!

The Redis TLS configuration keys are well-documented and follow the existing naming conventions. The "INSECURE" warning on the insecure flag is appropriate.

hack/dev-env/start-agent-autonomous.sh (3)

37-46: LGTM!

The TLS certificate detection logic is consistent with the managed agent script and works correctly.


48-61: LGTM!

The Redis address configuration correctly uses localhost:6382, allowing the autonomous agent to run alongside the managed agent without port conflicts.


66-67: LGTM!

The TLS and address arguments are correctly injected into the agent startup command.

agent/outbound_test.go (1)

464-464: LGTM!

The test correctly adapts to the extended NewManager signature by passing nil for the new tlsConfig parameter. This is appropriate for a test that doesn't require TLS configuration.

install/helm-repo/argocd-agent-agent/values.yaml (3)

136-136: LGTM!

The default TLS root CA path provides a sensible default for users and aligns with conventional mount paths.


138-151: LGTM!

The Redis TLS configuration is comprehensive and well-documented. TLS is appropriately enabled by default with secure settings, aligning with the PR objectives. The string values ("true"/"false") are appropriate for ConfigMap usage.


153-163: LGTM!

The network policy configuration is a good security enhancement that allows users to restrict Redis traffic. The default selectors and structure are appropriate.

test/e2e/README.md (1)

41-65: Unable to verify the InsecureSkipVerify claim due to repository access limitations.

The repository is currently inaccessible for automated verification. However, the review comment raises a valid concern: the documentation states that "principal and agents use InsecureSkipVerify: true" when connecting to Redis via LoadBalancer addresses, but this conflicts with the described behavior of startup scripts that use localhost port-forwards (where localhost should be in the certificate SANs and proper certificate validation should work).

This discrepancy needs manual verification by examining:

  1. The actual Redis client configuration in the principal and agent code
  2. Whether E2E tests genuinely use InsecureSkipVerify or perform proper certificate validation
  3. The difference between LoadBalancer connections (mentioned in docs) vs. localhost port-forward connections (described in startup scripts)
internal/argocd/cluster/manager_test.go (1)

11-11: NewManager signature update is wired correctly in tests

Importing cacheutil and passing cacheutil.RedisCompressionGZip plus a trailing nil for the TLS config matches the updated NewManager signature; test behavior remains the same and looks correct.

Also applies to: 57-58, 78-79

docs/configuration/redis-tls.md (1)

1-411: Redis TLS doc is consistent with implementation and manifests

The new documentation cleanly matches the flags, env vars, ConfigMap keys, volume mount paths, and defaults introduced in the code/manifests; it provides enough guidance and the security caveats around *_INSECURE are clear.

cmd/argocd-agent/agent.go (1)

74-77: Agent Redis TLS flags and wiring look correct

The new Redis TLS flags/envs are correctly bound, and the options passed into the agent (WithRedisTLSEnabled, WithRedisTLSInsecure, WithRedisTLSCAPath) implement the documented behavior (CLI default off, K8s/Helm on, optional CA or insecure skip) without obvious edge-case issues.

Also applies to: 184-194, 236-245

install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (1)

136-153: Helm agent deployment TLS wiring is consistent and non-breaking

The new Redis TLS env vars and the conditional redis-tls-ca volume/volumeMount are consistent with the agent ConfigMap keys and agent CLI expectations; the optional: true and .Values.redisTLS.secretName guard make this backwards-compatible.

Also applies to: 232-236, 258-266

install/kubernetes/agent/agent-params-cm.yaml (1)

88-99: Agent Redis TLS ConfigMap defaults align with deployment and docs

The new agent.redis.tls.* entries are consistent with the agent deployment envs, default to TLS-on with a CA path that matches the mounted secret, and keep insecure mode explicitly off by default.

install/kubernetes/principal/principal-deployment.yaml (1)

233-274: Principal deployment Redis TLS configuration is consistent and safe by default

The new Redis TLS env vars, redis-proxy port, and redis-server-tls / redis-upstream-tls-ca volumes are wired consistently with the documented paths and argocd-redis-tls secret; marking the secret volumes as optional: true keeps the manifest robust while still enabling TLS by default when the secret is present.

Also applies to: 280-287, 302-307, 324-339

hack/dev-env/Procfile.e2e (1)

1-6: I was unable to access the repository directly due to a clone failure. However, based on the review comment structure and the concern it raises, I can provide a rewritten version that maintains the verification request while being clear about what needs to be confirmed:


Confirm that ARGOCD_AGENT_REDIS_ADDRESS is properly consumed by agent startup scripts

The Procfile sets ARGOCD_AGENT_REDIS_ADDRESS when invoking start-agent-managed.sh and start-agent-autonomous.sh, but the agent code may expect a different environment variable or command-line flag. Verify that:

  1. start-agent-managed.sh and start-agent-autonomous.sh translate ARGOCD_AGENT_REDIS_ADDRESS into a --redis-addr CLI flag or pass it through correctly to the agent process
  2. The agent executable does not default to a hardcoded Redis address if the expected variable is absent
  3. The e2e setup actually uses the forwarded Redis ports (6380/6381/6382) rather than falling back to defaults

If the scripts or agent code expect REDIS_ADDR instead of ARGOCD_AGENT_REDIS_ADDRESS, either rename the variable here or update the scripts accordingly.

agent/inbound_redis.go (1)

51-54: LGTM - Clean TLS configuration fields.

The addition of these three fields provides a clear and straightforward mechanism to control Redis TLS behavior.

install/helm-repo/argocd-agent-agent/README.md (1)

68-72: LGTM - Clear Redis TLS configuration documentation.

The Redis TLS configuration is well-documented with sensible defaults (enabled: "true", CA path, and secret name). The inline documentation clearly indicates that insecure mode is for development only.

agent/options.go (1)

111-133: LGTM - Redis TLS option setters follow established patterns.

The three new option setters (WithRedisTLSEnabled, WithRedisTLSCAPath, WithRedisTLSInsecure) are implemented consistently with existing option setters in the file. The pattern of setting the field and returning nil is appropriate.

test/run-e2e.sh (1)

24-76: Good enforcement of Redis TLS as a hard requirement for E2E tests.

The comprehensive verification checks (certificates, secrets, and deployment configuration) across all vclusters ensure that E2E tests run only in a properly secured environment. The clear error messages with remediation steps are helpful for developers.

hack/dev-env/gen-redis-tls-certs.sh (1)

28-123: Comprehensive certificate generation with proper SANs.

The script generates certificates for all necessary components (control-plane, proxy, and agent vclusters) with appropriate Subject Alternative Names covering localhost, IP addresses, and cluster DNS. The idempotency checks ensure the script can be re-run safely.

docs/getting-started/kubernetes/index.md (2)

159-282: Redis TLS setup steps are consistent with manifests and defaults

Generation of CA/server certs, shared argocd-redis-tls secret, Redis TLS args, and verification flow all line up with the principal/agent manifests and default paths. No issues from a functional perspective.


389-478: Workload-cluster Redis TLS mirrors control-plane flow correctly

The workload-cluster TLS instructions reuse the same CA, secret structure, args, and verification pattern, which keeps the principal/workload Redis configuration aligned and predictable. Looks good.

internal/argocd/cluster/informer_test.go (1)

3-15: Tests correctly adapted to extended NewManager signature

Using cacheutil.RedisCompressionGZip and a trailing nil TLS argument keeps the tests aligned with the new constructor while preserving previous behavior. No further changes needed here.

Also applies to: 17-51, 67-88, 96-116

install/kubernetes/principal/principal-params-cm.yaml (1)

140-166: Principal Redis TLS ConfigMap defaults align with deployment wiring

The new principal.redis.tls.* keys (enable flag, server cert/key paths, server/CA secret names, upstream CA path, and insecure switch) match the principal Deployment’s volume mounts and the ServerOptions fields. Enabling TLS by default here is consistent with the PR’s objective, and the “INSECURE” comment on the upstream flag is clear.

install/kubernetes/agent/agent-deployment.yaml (3)

149-166: LGTM! Redis TLS environment variables properly configured.

The three Redis TLS environment variables follow the established pattern and are appropriately marked as optional, ensuring backward compatibility with existing deployments.


193-195: LGTM! Volume mount configured securely.

The redis-tls-ca volume is correctly mounted as read-only, following security best practices.


205-211: LGTM! Secret-backed volume configured correctly.

The redis-tls-ca volume is properly configured with optional: true, preventing deployment failures when TLS is not enabled while maintaining compatibility with TLS-enabled configurations.

internal/argocd/cluster/manager.go (1)

26-26: LGTM! TLS configuration properly integrated.

The TLS config parameter is correctly threaded through NewManager to NewClusterCacheInstance, enabling TLS-protected Redis connections for cluster caching. The nil-able *tls.Config type allows optional TLS configuration while maintaining backward compatibility at the implementation level.

Also applies to: 71-71, 81-81

hack/dev-env/configure-redis-tls.sh (2)

23-42: LGTM! Context validation is clear and robust.

The case statement properly validates the context parameter and maps it to the appropriate certificate prefix with helpful error messages for invalid inputs.


47-54: LGTM! Cleanup trap follows best practices.

The trap ensures the original kubectl context is restored on exit, preventing side effects from the script execution.

principal/redisproxy/redisproxy.go (4)

65-76: LGTM! TLS configuration fields well-organized.

The TLS configuration fields clearly separate server-side and upstream concerns, and support both in-memory and file-based certificate loading for flexibility.


98-128: LGTM! TLS configuration API is clean and straightforward.

The setter methods provide a clear API for configuring both server-side and upstream TLS, supporting multiple configuration sources.


167-211: LGTM! TLS listener creation properly implemented.

The Start() method cleanly handles both TLS and non-TLS modes with appropriate logging and error handling.


847-908: LGTM! Upstream TLS connection properly implemented.

The TLS upgrade logic for upstream Redis connections correctly handles CA certificate validation from both in-memory and file sources, SNI configuration, and test-mode insecure skip verify. The conditional TLS enablement (line 864) allows for flexible deployment modes including TLS-terminating proxy scenarios.

principal/options.go (2)

80-88: LGTM! Redis TLS configuration fields well-structured.

The new Redis TLS fields follow the established pattern for TLS configuration in ServerOptions, maintaining consistency with existing TLS fields and supporting flexible configuration sources.


492-548: LGTM! Redis TLS option functions follow established patterns.

The six new Redis TLS configuration functions are well-implemented, following the existing ServerOption pattern consistently. They provide flexible configuration through files, secrets, and direct values, with appropriate error handling and integration with the tlsutil package.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 3df4a33 to 211af17 Compare November 27, 2025 15:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
hack/dev-env/gen-redis-tls-certs.sh (1)

14-26: Don’t hide OpenSSL errors; let failures surface and stop the script.

All the openssl calls currently send stderr to /dev/null, which makes certificate generation failures very hard to diagnose, even with set -e in place. It’d be better to let stderr through (and optionally add explicit exit checks) so a broken OpenSSL invocation clearly reports why it failed and the script aborts.

For example:

-    openssl genrsa -out "${CREDS_DIR}/ca.key" 4096 2>/dev/null
+    openssl genrsa -out "${CREDS_DIR}/ca.key" 4096
+    if [[ $? -ne 0 ]]; then
+        echo "Failed to generate CA private key" >&2
+        exit 1
+    fi
-    openssl req -new -x509 -days 3650 -key "${CREDS_DIR}/ca.key" \
-        -out "${CREDS_DIR}/ca.crt" \
-        -subj "/C=US/ST=State/L=City/O=Organization/OU=Unit/CN=Redis CA" 2>/dev/null
+    openssl req -new -x509 -days 3650 -key "${CREDS_DIR}/ca.key" \
+        -out "${CREDS_DIR}/ca.crt" \
+        -subj "/C=US/ST=State/L=City/O=Organization/OU=Unit/CN=Redis CA"
+    if [[ $? -ne 0 ]]; then
+        echo "Failed to generate CA certificate" >&2
+        exit 1
+    fi

and apply the same pattern (remove 2>/dev/null, add clear error messages if desired) to the other openssl genrsa/req/x509 calls in this script.

Also applies to: 31-32, 47-58, 61-64, 79-90, 93-96, 111-121

hack/dev-env/start-e2e.sh (1)

58-58: Avoid masking kubectl errors when exporting REDIS_PASSWORD

Combining export with command substitution can hide failures from kubectl and is what ShellCheck SC2155 is warning about (also raised in an earlier review).

Splitting assignment and export (and checking the exit code) makes failures explicit:

-export REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode)
+REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode) || {
+  echo "Failed to read Redis password from argocd-redis secret in vcluster-agent-managed" >&2
+  exit 1
+}
+export REDIS_PASSWORD
hack/dev-env/configure-redis-tls.sh (1)

61-66: Validate all required certificate/key files before creating the secret

Right now the script only checks for creds/redis-tls/ca.crt. If the per-context *.crt or *.key is missing, kubectl create secret will fail with a less obvious error. This was already raised in an earlier review and is still applicable.

You can make the error clearer by validating all three files up front:

 # Check certificates exist
 if [ ! -f "creds/redis-tls/ca.crt" ]; then
     echo "Error: Redis TLS certificates not found"
     echo "Please run: ./gen-redis-tls-certs.sh"
     exit 1
 fi
+
+if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ]; then
+    echo "Error: Redis TLS certificate or key not found for ${REDIS_CERT_PREFIX}"
+    echo "Please run: ./gen-redis-tls-certs.sh"
+    exit 1
+fi

Also applies to: 81-88

🧹 Nitpick comments (10)
cmd/argocd-agent/principal.go (2)

258-288: Redis TLS wiring is sound; consider clarifying precedence and adding tests.

The overall flow (redisTLSEnabled gate, server TLS from path vs secret, upstream TLS with insecure/CA path/CA secret) mirrors existing TLS patterns and looks correct. Two refinements to consider:

  1. Silent precedence between upstream CA path and secret
    When both a CA path and a (possibly customized) CA secret are configured, the CA path branch wins and the secret is ignored, with no warning. That’s safe but can surprise operators troubleshooting TLS. A lightweight improvement would be to log (or optionally fatal on) the case where a non-default CA secret name is set alongside a CA path, e.g. log that the secret is being ignored in favor of the file-based CA. This keeps behavior but makes it explicit.

  2. Tests for flag/env combinations and TLS behavior
    Given the number of new flags and the Codecov report pointing out missing coverage in this file, it would be valuable to add unit tests around:

    • redisTLSEnabled true/false.
    • Server TLS: (cert+key), partial, and secret-based.
    • Upstream TLS: insecure vs CA path vs default secret, including precedence behavior.
      Even a small table-driven test on NewPrincipalRunCommand option wiring or a constructor helper would help lock in these semantics.

Overall, the wiring itself looks correct; this is mainly about making edge-case behavior explicit and test-backed.


419-441: Redis TLS flags and env bindings look good; minor help-text tweak optional.

The new flags and env variable bindings are consistent with existing patterns (ARGOCD_PRINCIPAL_*), and the separation between server TLS and upstream TLS is clear. As a minor polish, you might clarify in the --redis-tls-enabled description that it controls both the proxy’s listening TLS and the upstream TLS to argocd-redis (since the code configures both) to avoid ambiguity in CLI help output.

agent/options.go (1)

127-133: Consider adding runtime warning for insecure mode.

The comment indicates this option is "for testing only," but there's no runtime warning when this insecure mode is enabled. Consider adding a warning log message when TLS verification is disabled to alert operators of the security implications in production environments.

Example:

func WithRedisTLSInsecure(insecure bool) AgentOption {
	return func(o *Agent) error {
		o.redisProxyMsgHandler.redisTLSInsecure = insecure
		if insecure {
			log().Warn("INSECURE: Redis TLS certificate verification disabled. This should only be used for testing.")
		}
		return nil
	}
}
principal/server.go (1)

400-427: Consider extracting CA loading logic into a helper function.

The CA certificate loading logic (lines 413-424) is duplicated in multiple files (agent/agent.go, principal/redisproxy/redisproxy.go). Consider extracting this into a shared helper function to improve maintainability.

Example helper:

// In internal/tlsutil or similar package
func LoadCACertPool(caPath string) (*x509.CertPool, error) {
	caCert, err := os.ReadFile(caPath)
	if err != nil {
		return nil, fmt.Errorf("failed to read CA certificate from %s: %w", caPath, err)
	}
	caCertPool := x509.NewCertPool()
	if !caCertPool.AppendCertsFromPEM(caCert) {
		return nil, fmt.Errorf("failed to parse CA certificate from %s", caPath)
	}
	return caCertPool, nil
}
principal/redisproxy/redisproxy.go (1)

141-156: Simplify in-memory certificate handling.

The PKCS8 marshaling (line 145) and parsing (line 154) appears to be validation only, as the parsed result is discarded. This roundtrip is unnecessary. The private key can be assigned directly to cert.PrivateKey.

Apply this diff:

 	} else if rp.tlsServerCert != nil && rp.tlsServerKey != nil {
 		// Convert cert and key to tls.Certificate
 		certDER := rp.tlsServerCert.Raw
-		// For private key, we need to marshal it
-		keyDER, err := x509.MarshalPKCS8PrivateKey(rp.tlsServerKey)
-		if err != nil {
-			return nil, fmt.Errorf("failed to marshal private key: %w", err)
-		}
 		cert.Certificate = [][]byte{certDER}
 		cert.PrivateKey = rp.tlsServerKey
 		cert.Leaf = rp.tlsServerCert
-
-		// Try to parse the key
-		if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil {
-			return nil, fmt.Errorf("failed to parse private key: %w", err)
-		}
 	} else {
install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (1)

136-153: Redis TLS env and volume wiring look correct (consider non‑optional CA secret).

The new ARGOCD_AGENT_REDIS_TLS_* env vars and the redis-tls-ca mount/volume are consistent with the ConfigMap keys and documented CA path (/app/config/redis-tls/ca.crt), so the wiring itself looks good.

One behavioral nuance: the redis-tls-ca secret is marked optional: true, so the pod will still start if the TLS secret is missing and the agent will only fail later at runtime. If you’d prefer a fail‑fast configuration error when TLS is enabled but the CA secret is absent, you could drop optional: true on that secret.

Also applies to: 232-236, 257-266

hack/dev-env/start-agent-managed.sh (1)

37-62: Managed-agent Redis TLS handling is correct; consider de-duping with autonomous script.

The managed-agent script’s Redis TLS and address handling matches the autonomous script and the Procfile port‑forwards (localhost:6381), so behavior looks correct.

If these scripts evolve further, you might consider factoring the shared Redis TLS/address logic into a small helper (or sourcing a common start-agent-common.sh) to avoid future drift between managed and autonomous modes.

Also applies to: 66-67

test/run-e2e.sh (1)

24-70: Redis TLS precheck is solid; consider stricter context detection

The TLS gating logic (cert presence + per-context secret and tls-port check) looks good and aligns with the “TLS-only E2E” objective.

Minor robustness improvement: in the loop, kubectl config get-contexts | grep -q "${CONTEXT}" will succeed on substring matches and silently skip missing contexts. If a vcluster context is missing, it might be clearer to fail early.

You could tighten this and fail when a context is absent:

-for CONTEXT in vcluster-control-plane vcluster-agent-autonomous vcluster-agent-managed; do
-    if kubectl config get-contexts | grep -q "${CONTEXT}"; then
+for CONTEXT in vcluster-control-plane vcluster-agent-autonomous vcluster-agent-managed; do
+    if kubectl config get-contexts | awk 'NR>1 { print $2 }' | grep -qx "${CONTEXT}"; then
         echo "Checking Redis TLS in ${CONTEXT}..."
         # ...
         echo "✓ Redis TLS configured in ${CONTEXT}"
-    fi
+    else
+        echo "ERROR: kube context ${CONTEXT} is not configured; missing setup?" >&2
+        exit 1
+    fi
 done
hack/dev-env/start-principal.sh (1)

23-43: Fix trap quoting to avoid ShellCheck SC2064 warning

The port-forward logic looks good, but ShellCheck is right that the trap should avoid expanding $PORT_FORWARD_PID at definition time.

You can keep behavior and silence SC2064 by using single quotes and quoting the variable inside:

-       # Cleanup function to kill port-forward on exit
-       trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT
+       # Cleanup function to kill port-forward on exit
+       trap 'kill "$PORT_FORWARD_PID" 2>/dev/null || true' EXIT

This expands PORT_FORWARD_PID when the trap runs, not when it’s set, and follows common shell best practices.

test/e2e/fixture/cluster.go (1)

40-50: Redis TLS wiring for E2E cache clients is consistent with the TLS-only test requirement

The additions to ClusterDetails and getCacheInstance correctly gate TLS usage on the new *RedisTLSEnabled flags and build a tls.Config with MinVersion: tls.VersionTLS12. Given this file is strictly under test/e2e, using InsecureSkipVerify: true here is an acceptable trade-off to keep tests working against dynamically addressed Redis endpoints while still enforcing encrypted transport.

The updated getManagedAgentRedisConfig / getPrincipalRedisConfig logic to:

  • Prefer LoadBalancer ingress IP/hostname,
  • Fall back to spec.loadBalancerIP, then ClusterIP,
  • And unconditionally set the *RedisTLSEnabled flags to true,

matches the PR goal that Redis-with-TLS is now the “happy path” for tests and will loudly fail if TLS isn’t actually configured.

If you later stabilize the Redis hostnames to always match certificate SANs, you might consider tightening this further by dropping InsecureSkipVerify and wiring in a RootCAs pool from the test CA, but that’s an optional hardening step and not required for this PR.

Also applies to: 165-195, 225-333

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3df4a33 and 211af17.

📒 Files selected for processing (37)
  • Makefile (1 hunks)
  • agent/agent.go (2 hunks)
  • agent/inbound_redis.go (3 hunks)
  • agent/options.go (1 hunks)
  • agent/outbound_test.go (1 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (3 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (2 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/README.md (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.yaml (1 hunks)
  • install/kubernetes/agent/agent-deployment.yaml (3 hunks)
  • install/kubernetes/agent/agent-params-cm.yaml (1 hunks)
  • install/kubernetes/principal/principal-deployment.yaml (3 hunks)
  • install/kubernetes/principal/principal-params-cm.yaml (1 hunks)
  • internal/argocd/cluster/cluster.go (2 hunks)
  • internal/argocd/cluster/cluster_test.go (3 hunks)
  • internal/argocd/cluster/informer_test.go (6 hunks)
  • internal/argocd/cluster/manager.go (3 hunks)
  • internal/argocd/cluster/manager_test.go (3 hunks)
  • principal/options.go (2 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/server.go (3 hunks)
  • test/e2e/README.md (2 hunks)
  • test/e2e/fixture/cluster.go (7 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (16)
  • install/kubernetes/agent/agent-params-cm.yaml
  • internal/argocd/cluster/cluster_test.go
  • hack/dev-env/configure-argocd-redis-tls.sh
  • install/helm-repo/argocd-agent-agent/README.md
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • agent/inbound_redis.go
  • internal/argocd/cluster/cluster.go
  • cmd/argocd-agent/agent.go
  • principal/options.go
  • install/kubernetes/principal/principal-params-cm.yaml
  • install/helm-repo/argocd-agent-agent/values.yaml
  • install/kubernetes/agent/agent-deployment.yaml
  • agent/outbound_test.go
  • install/kubernetes/principal/principal-deployment.yaml
  • docs/getting-started/kubernetes/index.md
  • Makefile
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/run-e2e.sh
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
  • hack/dev-env/Procfile.e2e
  • test/e2e/README.md
  • hack/dev-env/start-e2e.sh
🧬 Code graph analysis (7)
internal/argocd/cluster/manager_test.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
principal/server.go (2)
internal/logging/logging.go (3)
  • Warn (300-302)
  • Debug (290-292)
  • Info (295-297)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
agent/options.go (1)
agent/agent.go (2)
  • AgentOption (136-136)
  • Agent (65-117)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (169-185)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (169-185)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh

[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

hack/dev-env/start-e2e.sh

[warning] 58-58: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Lint Go code
  • GitHub Check: Run unit tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (18)
cmd/argocd-agent/principal.go (1)

90-97: Redis TLS option variables are well-scoped and consistent.

Names and grouping align with existing Redis options and distinguish clearly between server and upstream TLS concerns; no issues from a readability or correctness standpoint.

agent/options.go (2)

111-117: LGTM!

The WithRedisTLSEnabled option follows the established pattern for agent options and correctly enables TLS for Redis connections.


119-125: LGTM!

The CA path setter is correctly implemented. File validation occurs later during runtime initialization, which is an appropriate design choice.

principal/server.go (1)

352-371: LGTM!

The Redis proxy TLS configuration correctly handles both server-side TLS (for incoming Argo CD connections) and upstream TLS (for connections to principal's Redis), with appropriate support for path-based and in-memory certificates.

internal/argocd/cluster/manager.go (1)

71-81: LGTM!

The TLS configuration parameter is correctly added to NewManager and propagated to NewClusterCacheInstance. The API design allows nil to be passed for non-TLS scenarios, maintaining backward compatibility.

principal/redisproxy/redisproxy.go (4)

65-76: LGTM!

The TLS configuration fields are well-structured, with clear separation between server-side TLS (for incoming Argo CD connections) and upstream TLS (for connections to principal's Redis). Supporting both path-based and in-memory certificates provides good flexibility.


98-128: LGTM!

The TLS setter methods follow a clean, straightforward pattern for configuration. Validation occurs later during TLS config creation or connection establishment, which is an appropriate design choice.


173-194: LGTM!

The TLS listener setup is correctly implemented with appropriate error handling and clear logging to distinguish between TLS and non-TLS modes.


863-905: Verify the TLS wrapping condition.

Line 864 checks if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure). This means TLS wrapping only occurs if both tlsEnabled is true and at least one upstream TLS option is set.

Is this the intended behavior? Should upstream TLS be enabled whenever tlsEnabled is true, even without CA configuration? The current logic might skip TLS wrapping if tlsEnabled is true but none of the CA/insecure options are set.

Please verify the intended behavior:

  • Should TLS be used for upstream Redis whenever tlsEnabled is true?
  • Or should it only use TLS when CA/insecure options are explicitly configured?

If upstream TLS should always be enabled when tlsEnabled is true, consider:

-	if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+	if rp.tlsEnabled {
 		tlsConfig := &tls.Config{
 			MinVersion: tls.VersionTLS12,
 		}
internal/argocd/cluster/manager_test.go (1)

57-57: LGTM!

The test updates correctly pass nil for the TLS configuration parameter, maintaining test coverage for non-TLS scenarios. The API change is properly reflected in all test cases.

Also applies to: 78-78

internal/argocd/cluster/informer_test.go (1)

19-19: LGTM!

All test cases are consistently updated to pass the new TLS configuration parameter. The nil values are appropriate for testing non-TLS scenarios, and the tests continue to validate the core cluster informer functionality.

Also applies to: 33-33, 50-50, 87-87, 115-115

install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)

93-101: Redis TLS agent params wiring looks consistent.

The new agent.redis.tls.* keys and defaults line up with the Helm values and deployment env wiring; I don’t see issues here.

hack/dev-env/start-agent-autonomous.sh (1)

37-62: Dev Redis TLS/address handling is coherent with the new TLS setup.

Detecting the local CA, constructing --redis-tls-* flags, and defaulting to localhost:6382 (with matching port‑forward guidance) all look correct and align with the cert generation script and Procfile wiring.

Also applies to: 66-67

test/e2e/README.md (1)

27-66: E2E Redis TLS documentation aligns well with the new tooling.

The new note and “Redis TLS” section clearly describe how TLS is auto‑configured for make setup-e2e, how to regenerate/reconfigure certs, and why InsecureSkipVerify is used in tests. Wording and commands match the added dev‑env scripts.

hack/dev-env/Procfile.e2e (1)

1-6: Procfile wiring for Redis port‑forwards and agents looks consistent.

Mapping Redis to 6380/6381/6382 and then starting principal/agents (with the agents honoring ARGOCD_AGENT_REDIS_ADDRESS) lines up with the start‑agent scripts and the TLS/localhost assumptions; this looks good.

hack/dev-env/start-principal.sh (1)

56-74: TLS enablement via REDIS_TLS_ARGS is consistent with the new Redis TLS flow

The detection of Redis TLS certs under ${SCRIPTPATH}/creds/redis-tls and conditional population of REDIS_TLS_ARGS is clear and matches the new --redis-* flags exposed by the principal. Passing $REDIS_TLS_ARGS into the go run ... principal invocation is a straightforward way to keep TLS optional but default-on when certs are present.

No functional issues spotted here.

hack/dev-env/start-e2e.sh (1)

50-55: Localhost Redis addresses fit the TLS/port-forward design

Hard-coding the Redis addresses to localhost:6380/6381/6382 and delegating port-forwards to goreman/Procfile.e2e matches the certificate SAN strategy and simplifies TLS validation for E2E.

This looks consistent with the new principal dev script and the Redis TLS configuration flow.

hack/dev-env/configure-redis-tls.sh (1)

47-55: Redis TLS configuration script is thorough and idempotent

The script’s overall flow—mapping context to cert prefix, preserving/restoring the initial kube context via a trap, creating/updating the argocd-redis-tls secret with --dry-run=client | kubectl apply, and patching volumes, volumeMounts, and args only when needed—looks well thought out and idempotent for the dev/e2e environment.

The JSON patches for both initial and append cases on volumes/volumeMounts and the rollout wait logic are appropriate for ensuring Redis ends up in TLS-only mode before proceeding.

Also applies to: 93-181, 188-204

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/clusterinfo_test.go (1)

150-156: Inconsistent timeout: this assertion still uses 30s/1s.

The final re-connected check for the autonomous agent uses 30*time.Second, 1*time.Second, while all other similar assertions in this file were updated to 60*time.Second, 2*time.Second. This inconsistency could lead to flaky tests under the same port-forward latency conditions mentioned in the comments.

Apply this diff for consistency:

 	requires.Eventually(func() bool {
 		return fixture.HasConnectionStatus(fixture.AgentAutonomousName, appv1.ConnectionState{
 			Status:     appv1.ConnectionStatusSuccessful,
 			Message:    fmt.Sprintf(message, fixture.AgentAutonomousName, "connected"),
 			ModifiedAt: &metav1.Time{Time: time.Now()},
 		}, clusterDetail)
-	}, 30*time.Second, 1*time.Second)
+	}, 60*time.Second, 2*time.Second)
🧹 Nitpick comments (8)
principal/resource.go (1)

39-39: Provide justification for the 3x timeout increase.

The timeout has been increased from 10 to 30 seconds without explanation. While this may be necessary to accommodate TLS handshake and encryption overhead introduced by this PR, the lack of documentation makes it unclear whether this change masks underlying performance issues or is genuinely required.

Please clarify why this increase is needed and consider documenting it in a comment. Additionally, as noted in the TODO above, making this timeout configurable would allow better tuning for different deployment scenarios, especially given the variance in TLS overhead across environments.

principal/listen.go (1)

174-196: Helpful logging additions for debugging the startup flow.

The logging statements provide useful visibility into the WebSocket enablement path and server startup sequence, which will help with troubleshooting.

Minor formatting suggestion: consider removing the emoji (line 174) and leading spaces in log messages (lines 176, 196) for consistency with standard structured logging conventions.

Apply this diff for consistent log message formatting:

-	log().WithField("enableWebSocket", s.enableWebSocket).Info("🔧 Checking if WebSocket is enabled")
+	log().WithField("enableWebSocket", s.enableWebSocket).Info("Checking if WebSocket is enabled")
 	if s.enableWebSocket {
-		log().Info(" WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
+		log().Info("WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
 		opts := []grpchttp1server.Option{grpchttp1server.PreferGRPCWeb(true)}
 
 		downgradingHandler := grpchttp1server.CreateDowngradingHandler(s.grpcServer, http.NotFoundHandler(), opts...)
 		go func() {
 			log().Info("Starting gRPC server.Serve() - server is now accepting connections")
 			err = s.grpcServer.Serve(s.listener.l)
-			log().WithError(err).Warn(" gRPC server.Serve() exited")
+			log().WithError(err).Warn("gRPC server.Serve() exited")
 			errch <- err
 		}()
hack/dev-env/start-agent-managed.sh (1)

63-74: Consider restricting permissions on extracted TLS credentials.

The extracted TLS private key is written to /tmp/agent-managed-tls.key with default permissions, potentially making it readable by other users on shared systems.

Apply restrictive permissions before writing sensitive files:

 # Extract mTLS client certificates and CA from Kubernetes secret for agent authentication
 echo "Extracting mTLS client certificates and CA from Kubernetes..."
 TLS_CERT_PATH="/tmp/agent-managed-tls.crt"
 TLS_KEY_PATH="/tmp/agent-managed-tls.key"
 ROOT_CA_PATH="/tmp/agent-managed-ca.crt"
+
+# Set restrictive permissions for private key
+umask 077
 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \
   -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}"
 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \
   -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}"
 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-ca \
   -o jsonpath='{.data.tls\.crt}' | base64 -d > "${ROOT_CA_PATH}"
+# Restore default umask
+umask 022
 echo " mTLS client certificates and CA extracted"
hack/dev-env/start-principal.sh (1)

41-42: Use single quotes in trap to defer variable expansion (shellcheck SC2064).

While the current code works because $PORT_FORWARD_PID is set before the trap, using single quotes is the conventional and safer pattern.

-       trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT
+       trap 'kill $PORT_FORWARD_PID 2>/dev/null || true' EXIT
docs/configuration/redis-tls.md (2)

487-494: Add language specifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified. Since this is script output, use text or console.

-```
+```text
 Generating Redis TLS certificates in hack/dev-env/creds/redis-tls...

498-513: Add language specifiers to remaining script output blocks.

Same issue as above - these console output examples should have a language specifier for markdownlint compliance.

-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ║  Configure Redis Deployment for TLS                     ║
-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ║  Configure Argo CD Components for Redis TLS             ║

Also applies to: 516-532

test/e2e/redis_proxy_test.go (1)

120-123: Hard-coded sleep for SSE stream stabilization.

While the 5-second sleep addresses the race condition mentioned in the comment, it's a fixed delay that may be insufficient under heavy load or excessive in fast environments. Consider using a more deterministic approach if flakiness persists.

An alternative would be to wait for an initial SSE message (e.g., the current resource tree state) before proceeding, though the current approach is pragmatic for E2E tests.

test/e2e/fixture/cluster.go (1)

259-267: Cleanup doesn't explicitly close Redis connections.

CleanupRedisCachedClients clears the cache map but doesn't explicitly close the underlying Redis connections. While Go's garbage collector will eventually clean them up, explicit closure ensures immediate resource release and avoids connection pool exhaustion in long test runs.

Consider closing the Redis clients explicitly. The appstatecache.Cache wraps a cacheutil.Cache which has a redisClient. You may need to expose or track the underlying redis.Client to call Close():

// If the underlying redis.Client is accessible, close it explicitly
// For now, this may require refactoring getCacheInstance to return both
// the cache and the client, or using a wrapper struct

If the current approach works reliably in tests without connection issues, this can be deferred.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 211af17 and 6247404.

📒 Files selected for processing (26)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/auth.go (1 hunks)
  • principal/listen.go (3 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • Makefile
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/configure-argocd-redis-tls.sh
  • hack/dev-env/start-agent-autonomous.sh
  • install/helm-repo/argocd-agent-agent/values.schema.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-agent-managed.sh
  • test/e2e/rp_test.go
  • test/run-e2e.sh
  • hack/dev-env/Procfile.e2e
  • test/e2e/README.md
🧬 Code graph analysis (6)
principal/auth.go (1)
internal/logging/logging.go (2)
  • Trace (285-287)
  • Warn (300-302)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
  • Event (112-115)
internal/logging/logfields/logfields.go (1)
  • Event (34-34)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

487-487: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


498-498: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


516-516: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh

[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

🔇 Additional comments (35)
test/e2e/README.md (1)

21-107: Unable to verify referenced helper scripts and documentation due to repository access limitations.

The documentation structure and Redis TLS guidance are well-presented, but the verification of referenced files could not be completed in this environment. The following items require manual verification by the developer or CI system:

  • hack/dev-env/gen-redis-tls-certs.sh (executable)
  • hack/dev-env/configure-redis-tls.sh (executable)
  • hack/dev-env/configure-argocd-redis-tls.sh (executable)
  • hack/dev-env/reverse-tunnel/setup.sh (executable)
  • hack/dev-env/reverse-tunnel/README.md (exists)

Confirm these files exist and are properly executable before merging.

principal/listen.go (1)

224-230: LGTM! Clear service registration logging.

The logging statements provide good visibility into the gRPC service registration flow, making it easier to debug startup issues and verify that all services are registered successfully.

principal/tracker/tracking.go (1)

75-78: Unable to verify review comment due to repository access limitation.

The review comment cannot be verified at this time because the repository is inaccessible. To properly assess the concerns about the buffered channel change at lines 75-78, the following would need to be verified:

  1. Whether a real deadlock scenario exists between processRedisEventResponse and sendSynchronousRedisMessageToAgent
  2. Whether this change is directly related to the Redis TLS PR objective or should be split into a separate PR
  3. Whether a buffer capacity of 1 is sufficient for the actual send/receive patterns in the code
  4. Whether the buffering addresses the root cause or masks a deeper synchronization issue

Without access to the codebase to examine the sender/receiver implementations and their usage patterns, the original review comment's concerns remain unresolved.

principal/auth.go (1)

154-164: LGTM - Debug logging additions for auth interceptor.

The trace-level logging provides useful debugging information for authentication flow. The emoji prefixes add visual distinction in logs, which can be helpful during debugging sessions.

Consider documenting the emoji convention if used elsewhere, or removing them to maintain consistent log formatting across the codebase.

test/e2e/fixture/argoclient.go (1)

316-334: LGTM - Environment variable override for ArgoCD server endpoint.

The early return when ARGOCD_SERVER_ADDRESS is set provides useful flexibility for E2E tests, particularly when dynamic LoadBalancer addresses don't match certificate SANs. The fallback to K8s service lookup is preserved correctly.

test/e2e/fixture/fixture.go (3)

109-154: Increased polling timeouts for deletion operations.

The timeout increase from 60 to 120 seconds accommodates potential TLS handshake overhead and certificate validation delays during E2E tests.


229-241: Improved cleanup resilience with non-fatal warnings.

Using warnings instead of returning errors prevents cleanup failures from cascading and failing entire test suites. The DeepCopy() usage correctly avoids mutating loop variables when adjusting namespace/name.


457-470: Graceful degradation when Redis is unavailable.

The cleanup now logs a warning and continues if Redis is unavailable (e.g., port-forward died), rather than failing the cleanup. This improves test reliability.

hack/dev-env/start-agent-managed.sh (1)

37-61: LGTM - Redis TLS detection and address configuration.

The conditional TLS enablement based on certificate presence is a clean pattern. The port-forward guidance message helps developers understand the setup requirements.

hack/dev-env/start-principal.sh (3)

23-43: LGTM - Port-forward setup with cleanup trap.

The port-forward establishment with PID tracking, validation, and cleanup trap is a robust pattern for local development. The 2-second wait allows time for the connection to stabilize.


58-76: LGTM - Redis TLS certificate detection.

Properly checks for all three required files (cert, key, CA) before enabling TLS. The descriptive comments about certificate SANs help future maintainers understand the setup.


84-86: Undefined variable MTLS_ARGS referenced.

$MTLS_ARGS is used but not defined in this script. If it's intentionally optional (set externally), this is fine; otherwise, it may cause unintended behavior.

internal/argocd/cluster/cluster.go (2)

135-142: LGTM - Defensive initialization of ConnectionState.

Good defensive programming: initializing ConnectionState when the agent first connects prevents potential nil pointer issues and provides meaningful status information.


176-191: TLS configuration support added to Redis cache initialization.

The signature change to accept *tls.Config enables TLS for Redis connections. Passing nil for tlsConfig maintains backward compatibility (no TLS).

Note: Verification of caller updates could not be completed due to repository access limitations. Manual verification is required to ensure all calls to NewClusterCacheInstance have been updated to pass the tlsConfig parameter.

hack/dev-env/Procfile.e2e (1)

1-7: Port-forward configuration for TLS-enabled Redis connections.

The port-forward setup allows TLS certificate validation to work correctly since localhost is included in certificate SANs. The staggered sleep delays (3s for principal, 5s for agents) ensure port-forwards are established before components start.

Verify that MANAGED_AGENT_REDIS_ADDR and AUTONOMOUS_AGENT_REDIS_ADDR environment variables are defined in your development environment setup or sourced before running this Procfile, as they are required by lines 6-7 but not defined locally in this file.

agent/agent.go (2)

323-345: TLS configuration implementation looks correct.

The TLS configuration for the cluster cache Redis client is well-structured:

  • Uses minimum TLS 1.2 (appropriate security baseline)
  • Properly handles insecure mode with warning log (line 330) - this addresses the previous review feedback
  • Correctly loads and parses CA certificate with appropriate error handling

One minor observation: The error message at line 339 could be more consistent with the message at line 335 by using %w for error wrapping.

-			return nil, fmt.Errorf("failed to parse CA certificate for cluster cache from %s", a.redisProxyMsgHandler.redisTLSCAPath)
+			return nil, fmt.Errorf("failed to parse CA certificate for cluster cache from %s: no valid certificates found", a.redisProxyMsgHandler.redisTLSCAPath)

445-460: Good improvement: immediate cluster cache info update on startup.

The refactored goroutine correctly:

  • Sends an initial update immediately on startup (line 448) rather than waiting for the first ticker interval
  • Properly defers ticker.Stop() for cleanup
  • Handles context cancellation appropriately

This ensures the principal receives cluster cache info promptly after agent startup.

cmd/argocd-agent/principal.go (3)

258-288: Redis TLS configuration logic is well-structured.

The implementation correctly:

  • Validates mutual exclusivity between --redis-upstream-tls-insecure and --redis-upstream-ca-path (lines 273-275)
  • Uses the secret as the default fallback when neither insecure nor CA path is specified (lines 285-286)
  • Logs appropriate messages for each configuration path

The previous review comment suggested validating all three modes (insecure, CA path, CA secret) as mutually exclusive. However, the current behavior is actually reasonable: the secret serves as a default when no explicit configuration is provided, which is a common pattern. If you prefer explicit mutual exclusivity for all three, let me know.


419-441: CLI flags are well-defined with consistent naming.

The Redis TLS flags follow the established patterns:

  • Consistent ARGOCD_PRINCIPAL_REDIS_* environment variable naming
  • Reasonable defaults (false for CLI, argocd-redis-tls for secret names)
  • Clear descriptions for each flag

471-471: Timeout increase is reasonable for production reliability.

The 30-second timeout for secret retrieval allows for network latency and Kubernetes API server load, which is appropriate for production environments.

test/e2e/clusterinfo_test.go (1)

108-115: Timeout increases are appropriate and well-documented.

The increased timeouts (60s/2s) with explanatory comments appropriately account for port-forward latency in long test runs. This should improve test stability in CI environments.

Also applies to: 123-129

test/e2e/rp_test.go (3)

162-169: Good refactoring: centralized endpoint and secret retrieval.

Using fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret helpers:

  • Reduces code duplication across tests
  • Centralizes the logic for environment variable checks and Kubernetes fallback
  • Makes tests more maintainable when endpoint retrieval logic changes

The helper at fixture/argoclient.go:314-336 properly checks ARGOCD_SERVER_ADDRESS environment variable first, then falls back to Kubernetes service lookup.


295-306: Consistent refactoring across test functions.

The same fixture helper pattern is applied correctly here, maintaining consistency with Test_ResourceProxy_Argo.


509-510: Minor formatting change, no functional impact.

docs/configuration/redis-tls.md (2)

1-17: Comprehensive and well-structured documentation.

This documentation thoroughly covers:

  • Architecture overview with clear diagrams
  • Quick start guides for different environments (dev, E2E, production)
  • Detailed configuration options for both principal and agent
  • Troubleshooting section with practical solutions
  • Security best practices

The table of contents and section organization make it easy to navigate.


736-755: Security best practices are appropriately scoped.

The security recommendations cover essential practices:

  • Strong key sizes (4096-bit RSA)
  • Appropriate certificate validity (1 year)
  • Private key protection with RBAC
  • Certificate rotation planning
  • Clear warning against insecure options in production
hack/dev-env/gen-redis-tls-certs.sh (1)

1-10: Well-structured certificate generation script.

The script properly handles idempotency with file existence checks, uses set -e for error handling, and generates appropriate SANs for each component. The cleanup of temporary files (CSR, EXT, SRL) is good practice.

test/run-e2e.sh (2)

24-45: Good enforcement of TLS prerequisites.

The script properly validates TLS certificates exist before running tests and provides clear, actionable error messages guiding users to run the setup scripts. This aligns with the PR objective of making Redis TLS mandatory for E2E tests.


82-115: Environment variable exports are macOS-only; verify Linux CI Redis connectivity strategy.

The Redis address environment variables (ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS, MANAGED_AGENT_REDIS_ADDR, AUTONOMOUS_AGENT_REDIS_ADDR, ARGOCD_SERVER_ADDRESS) are only set when running on macOS. Confirm that Linux CI environments have a strategy for accessing Redis services—either through MetalLB LoadBalancer service IPs or by setting these environment variables explicitly for Linux as well.

test/e2e/redis_proxy_test.go (2)

210-237: Good retry handling for transient Redis connection issues.

Wrapping the ResourceTree call in an Eventually block with explicit error logging handles the EOF errors mentioned in the comment. This is a robust pattern for E2E tests dealing with TLS-enabled Redis connections that may experience transient failures.


642-653: Appropriate SSE transport configuration.

The Timeout: 0 settings are correct for SSE streams which are long-lived connections. The IdleConnTimeout: 300s helps maintain connections during test execution. The InsecureSkipVerify: true is documented in the PR as intentional for E2E tests with dynamic LoadBalancer addresses.

hack/dev-env/configure-redis-tls.sh (2)

81-121: Good practice: scaling down components before TLS transition.

Scaling down ArgoCD components before enabling Redis TLS prevents SSL handshake errors from pods attempting non-TLS connections. Storing replica counts in a ConfigMap enables proper restoration by the companion script.


199-215: Correct Redis TLS configuration.

The Redis args properly configure TLS-only mode (--port 0 disables plain TCP, --tls-port 6379 enables TLS). The --tls-auth-clients no setting means clients authenticate via password only, not mutual TLS, which is appropriate for this use case.

test/e2e/fixture/cluster.go (2)

182-201: TLS configuration with InsecureSkipVerify is appropriate for E2E tests.

Using InsecureSkipVerify: true for E2E tests is explicitly documented in the PR description as a workaround for dynamic LoadBalancer addresses that may not match certificate SANs. The MinVersion: tls.VersionTLS12 ensures a reasonable security baseline.


319-327: Good: TLS enabled by default with environment override support.

Enabling TLS by default (ManagedAgentRedisTLSEnabled = true) aligns with the PR objective. The environment variable override (MANAGED_AGENT_REDIS_ADDR) supports local development with port-forwards, which is consistent with the macOS handling in run-e2e.sh.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 4 times, most recently from 40d7b3c to 8b47b98 Compare December 4, 2025 13:18
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/rp_test.go (1)

162-169: Good refactor using fixture helpers.

Replacing direct Kubernetes lookups with fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret centralizes configuration retrieval and supports environment variable overrides (e.g., ARGOCD_SERVER_ADDRESS), improving test flexibility.

♻️ Duplicate comments (6)
docs/configuration/redis-tls.md (3)

487-495: Add language identifier to fenced code block.

This output example block should have a language identifier. Based on markdownlint feedback and past review comments, apply this fix:

-```
+```text
 Generating Redis TLS certificates in hack/dev-env/creds/redis-tls...
 Generating CA key and certificate...
 ...
-```
+```

This resolves the MD040 linting error.

Based on learnings, past review flagged this same issue which was marked as addressed in commit 6247404, but the static analysis tool still reports it.


498-514: Add language identifier to fenced code block.

This output example block should have a language identifier:

-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ║  Configure Redis Deployment for TLS                     ║
 ...
-```
+```

Based on learnings, this was flagged in a past review and marked as addressed, but the linter still reports it.


516-533: Add language identifier to fenced code block.

This output example block should have a language identifier:

-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ║  Configure Argo CD Components for Redis TLS             ║
 ...
-```
+```

Based on learnings, this was flagged in a past review and marked as addressed, but the linter still reports it.

hack/dev-env/gen-redis-tls-certs.sh (1)

68-86: Guard against empty LOCAL_IP when emitting SAN IP entry.

If LOCAL_IP ends up empty, the extension file will contain IP.3 = , which can cause OpenSSL failures or produce an invalid certificate; only add IP.3 when a non-empty IP was detected.

-    cat > "${CREDS_DIR}/redis-proxy.ext" <<EOF
-subjectAltName = @alt_names
-[alt_names]
-DNS.1 = argocd-redis-proxy
-DNS.2 = argocd-redis-proxy.argocd
-DNS.3 = argocd-redis-proxy.argocd.svc
-DNS.4 = argocd-redis-proxy.argocd.svc.cluster.local
-DNS.5 = localhost
-DNS.6 = rathole-container-internal
-IP.1 = 127.0.0.1
-IP.2 = 127.0.0.2
-IP.3 = ${LOCAL_IP}
-EOF
+    cat > "${CREDS_DIR}/redis-proxy.ext" <<EOF
+subjectAltName = @alt_names
+[alt_names]
+DNS.1 = argocd-redis-proxy
+DNS.2 = argocd-redis-proxy.argocd
+DNS.3 = argocd-redis-proxy.argocd.svc
+DNS.4 = argocd-redis-proxy.argocd.svc.cluster.local
+DNS.5 = localhost
+DNS.6 = rathole-container-internal
+IP.1 = 127.0.0.1
+IP.2 = 127.0.0.2
+EOF
+
+    if [ -n "${LOCAL_IP}" ]; then
+        echo "IP.3 = ${LOCAL_IP}" >> "${CREDS_DIR}/redis-proxy.ext"
+    fi
hack/dev-env/configure-redis-tls.sh (1)

61-66: CA certificate validation still missing.

The validation checks for server certificate and key but not for ca.crt, which is used at Line 128 when creating the secret. This is the same issue flagged in previous reviews.

Apply this diff to add CA validation:

 # Check certificates exist
-if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ]; then
-    echo "Error: Redis TLS certificate or key not found for ${REDIS_CERT_PREFIX}"
+if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ] || [ ! -f "creds/redis-tls/ca.crt" ]; then
+    echo "Error: Redis TLS certificates not found (${REDIS_CERT_PREFIX}.crt, ${REDIS_CERT_PREFIX}.key, or ca.crt)"
     echo "Please run: ./gen-redis-tls-certs.sh"
     exit 1
 fi
cmd/argocd-agent/principal.go (1)

272-288: Incomplete mutual exclusivity validation for upstream TLS modes.

Lines 273-275 validate that --redis-upstream-tls-insecure and --redis-upstream-ca-path are mutually exclusive, but don't check whether --redis-upstream-ca-path and --redis-upstream-ca-secret-name are both specified. If both are provided, Line 281 silently takes precedence, ignoring the secret configuration without warning.

Consider validating all three modes for mutual exclusivity:

+				// Validate upstream TLS configuration - only one mode allowed
+				modesSet := 0
+				if redisUpstreamTLSInsecure {
+					modesSet++
+				}
+				if redisUpstreamTLSCAPath != "" {
+					modesSet++
+				}
+				if redisUpstreamTLSCASecretName != "" {
+					modesSet++
+				}
+				if modesSet > 1 {
+					cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name")
+				}
+
-				// Validate upstream TLS configuration - insecure and CA path are mutually exclusive
-				if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" {
-					cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path")
-				}
-
 				// Redis upstream TLS (for connections to principal's argocd-redis)
 				if redisUpstreamTLSInsecure {
🧹 Nitpick comments (4)
principal/resource.go (1)

39-39: Timeout extension appropriate for TLS operations.

The increase from 10 to 30 seconds accommodates longer TLS handshake and certificate loading operations introduced by Redis TLS support across the system.

Consider the existing TODO at line 38—making this timeout configurable would provide better flexibility for different deployment scenarios.

principal/redisproxy/redisproxy.go (1)

131-165: Unused parse result at line 154.

Line 154 parses the private key but discards the result. If this is validation-only, the error check suffices. Otherwise, consider removing the parse call.

Apply this diff to remove the unused parse:

 		cert.PrivateKey = rp.tlsServerKey
 		cert.Leaf = rp.tlsServerCert
 
-		// Try to parse the key
-		if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil {
-			return nil, fmt.Errorf("failed to parse private key: %w", err)
-		}
 	} else {
 		return nil, fmt.Errorf("no TLS certificate configured")
 	}
hack/dev-env/start-principal.sh (1)

23-43: Minor: adjust trap quoting to satisfy ShellCheck SC2064.

The trap currently interpolates PORT_FORWARD_PID at definition time; switching to a single-quoted trap and quoting the variable inside avoids SC2064 and is the idiomatic form while preserving behavior.

-       # Cleanup function to kill port-forward on exit
-       trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT
+       # Cleanup function to kill port-forward on exit
+       trap 'kill "$PORT_FORWARD_PID" 2>/dev/null || true' EXIT

Also applies to: 58-76, 84-85

test/e2e/fixture/fixture.go (1)

109-155: Best-effort cleanup behavior change looks intentional; consider whether other paths should match.

Doubling the deletion wait loops, using DeepCopy for cross-namespace waits, and downgrading many application/AppProject cleanup failures to warnings will reduce e2e flakiness, but may leave residual resources when deletions keep failing; if you want fully consistent best-effort semantics, you might also convert the remaining repo/namespace cleanup errors to warnings, otherwise this mixed strategy seems reasonable for tests.

Also applies to: 159-171, 218-292, 294-357, 457-470

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6247404 and 8b47b98.

📒 Files selected for processing (47)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • agent/inbound_redis.go (3 hunks)
  • agent/options.go (1 hunks)
  • agent/outbound_test.go (1 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (2 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/README.md (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (2 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.yaml (1 hunks)
  • install/kubernetes/agent/agent-deployment.yaml (3 hunks)
  • install/kubernetes/agent/agent-params-cm.yaml (1 hunks)
  • install/kubernetes/principal/principal-deployment.yaml (3 hunks)
  • install/kubernetes/principal/principal-params-cm.yaml (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • internal/argocd/cluster/cluster_test.go (3 hunks)
  • internal/argocd/cluster/informer_test.go (6 hunks)
  • internal/argocd/cluster/manager.go (3 hunks)
  • internal/argocd/cluster/manager_test.go (3 hunks)
  • principal/auth.go (1 hunks)
  • principal/listen.go (3 hunks)
  • principal/options.go (2 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/server.go (3 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (15)
  • Makefile
  • principal/listen.go
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
  • internal/argocd/cluster/manager_test.go
  • install/kubernetes/principal/principal-deployment.yaml
  • principal/tracker/tracking.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • test/run-e2e.sh
  • install/kubernetes/principal/principal-params-cm.yaml
  • internal/argocd/cluster/informer_test.go
  • docs/getting-started/kubernetes/index.md
  • agent/outbound_test.go
  • test/e2e/fixture/argoclient.go
  • hack/dev-env/start-agent-autonomous.sh
  • cmd/argocd-agent/agent.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/rp_test.go
  • hack/dev-env/start-agent-managed.sh
  • install/kubernetes/agent/agent-deployment.yaml
  • install/kubernetes/agent/agent-params-cm.yaml
  • hack/dev-env/configure-argocd-redis-tls.sh
  • test/e2e/README.md
  • hack/dev-env/Procfile.e2e
  • install/helm-repo/argocd-agent-agent/values.yaml
  • hack/dev-env/start-e2e.sh
🧬 Code graph analysis (12)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
principal/server.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
agent/inbound_redis.go (2)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
agent/options.go (2)
principal/options.go (1)
  • WithRedisTLSEnabled (493-498)
agent/agent.go (2)
  • AgentOption (136-136)
  • Agent (65-117)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (4)
  • HasConnectionStatus (60-74)
  • AgentManagedName (37-37)
  • ClusterDetails (42-56)
  • AgentAutonomousName (38-38)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

487-487: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


498-498: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


516-516: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh

[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Lint Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (29)
principal/auth.go (1)

154-165: Trace logging enhances auth flow observability.

The added trace logs with visual markers help debug authentication flow during development. Trace level is appropriate for detailed debugging without impacting production logs.

test/e2e/rp_test.go (1)

295-305: Consistent fixture helper usage.

The same fixture helper pattern applied here maintains consistency across test functions and eliminates code duplication.

principal/server.go (2)

352-371: Redis proxy TLS configuration properly structured.

The TLS configuration logic correctly prioritizes configuration sources (path-based vs. in-memory) for both server TLS (incoming connections) and upstream TLS (connections to Redis). The conditional structure ensures TLS is only configured when explicitly enabled.


400-426: Cluster manager TLS configuration with robust error handling.

The TLS configuration for the cluster manager includes:

  • Proper error handling for CA certificate file operations (lines 414-421)
  • Certificate parsing validation with clear error messages
  • Security warning for insecure mode (line 408)
  • Appropriate minimum TLS version (1.2)

The CA loading from file has good error propagation that will fail server startup if certificates are misconfigured.

internal/argocd/cluster/cluster.go (2)

176-185: TLS configuration properly integrated into Redis cache.

The updated signature accepts an optional TLS config and correctly wires it into Redis client options. The nil-safe design allows TLS to be disabled when not needed.


135-142: Good defensive initialization of connection state.

When ConnectionState doesn't exist (agent just connected), this code properly initializes it with a Successful status and timestamp. This prevents gaps in connection tracking during cluster cache stats updates.

hack/dev-env/start-e2e.sh (1)

50-59: Static localhost addresses simplify TLS-enabled E2E setup.

Replacing dynamic LoadBalancer IP lookups with static localhost addresses (backed by port-forwards managed by Goreman) ensures TLS certificate validation works correctly, as localhost is included in the certificate SANs.

The separation of REDIS_PASSWORD assignment and export (lines 58-59) correctly addresses the previous shellcheck SC2155 warning, allowing proper error handling if the kubectl command fails.

test/e2e/README.md (1)

21-108: Comprehensive E2E workflow documentation with TLS guidance.

The expanded documentation clearly describes:

  • The multi-step setup process with Redis TLS configured automatically
  • Remote cluster considerations with reverse tunnel setup
  • Multi-terminal workflow requirements
  • Manual TLS reconfiguration procedures
  • Environment detection for local vs CI testing

This significantly improves the developer experience for running TLS-enabled E2E tests.

principal/redisproxy/redisproxy.go (2)

168-211: LGTM: TLS listener implementation is sound.

The TLS-enabled listener is correctly configured with MinVersion set to TLS12, and the branching logic cleanly separates TLS and non-TLS paths with appropriate logging.


847-908: Verify upstream TLS condition covers all intended scenarios.

The condition at line 864 requires tlsEnabled AND at least one of (CA, CAPath, Insecure) to wrap the upstream connection with TLS. Confirm this aligns with the intended behavior—specifically, whether tlsEnabled alone (without CA/CAPath/Insecure) should skip upstream TLS or raise an error.

install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)

93-101: LGTM: Redis TLS configuration added correctly.

The three new TLS-related keys (enabled, ca-path, and insecure) are properly documented and bound to Helm values, consistent with the TLS implementation across the codebase.

test/e2e/clusterinfo_test.go (1)

108-115: LGTM: Timeout increases accommodate TLS latency.

The timeout increases from 30s to 60s with adjusted polling intervals are appropriate for handling potential port-forward latency in long test runs, and the explanatory comments clarify the rationale.

Also applies to: 123-129, 141-142

internal/argocd/cluster/cluster_test.go (1)

36-36: LGTM: Test updated for new TLS parameter.

The nil TLS config parameter correctly aligns test call sites with the updated NewManager signature, appropriately passing nil for tests that don't exercise TLS functionality.

Also applies to: 225-225, 304-304

install/helm-repo/argocd-agent-agent/values.yaml (1)

136-162: LGTM: TLS and network policy configuration added.

The new redisTLS configuration block and networkPolicy settings are well-documented and use secure defaults (TLS enabled by default, insecure mode disabled), consistent with the broader TLS implementation.

install/helm-repo/argocd-agent-agent/README.md (1)

45-50: LGTM: Documentation updated for TLS configuration.

The documentation entries for redisTLS, networkPolicy, and tlsRootCAPath accurately reflect the corresponding values.yaml changes.

Also applies to: 68-72, 96-96

install/kubernetes/agent/agent-params-cm.yaml (1)

88-99: LGTM: Kubernetes manifest updated with Redis TLS configuration.

The three new Redis TLS configuration keys are properly documented with secure defaults and mount paths that align with the deployment configuration and Helm templates.

test/e2e/sync_test.go (1)

371-371: Verify hook name "before" matches the test fixture in test/data/pre-sync.

The hook Job name was changed from "pre-post-sync-before" to "before" at lines 371 and 466. Ensure the Job name in the test fixture corresponds to this value to maintain test data consistency.

Also applies to: 466-466

hack/dev-env/Procfile.e2e (1)

1-7: Procfile e2e Redis/server port-forwards and startup ordering look consistent.

Port mappings for Redis (6380/6381/6382) match the defaults used in the dev start scripts, and placing the port-forwards before principal/agent startup should ensure TLS-capable Redis endpoints are reachable when processes start.

agent/options.go (1)

111-133: Redis TLS AgentOptions cleanly mirror existing option pattern.

The new WithRedisTLSEnabled / WithRedisTLSCAPath / WithRedisTLSInsecure helpers follow the same style as the existing Redis options and provide straightforward wiring into redisProxyMsgHandler without changing other Agent behavior.

hack/dev-env/start-agent-managed.sh (1)

37-83: Managed agent Redis TLS and mTLS wiring look sound.

Enabling Redis TLS based on creds/redis-tls/ca.crt, defaulting the Redis address to localhost:6381 to match the port-forward, and extracting mTLS client cert/key/CA into /tmp for --tls-client-cert/--tls-client-key/--root-ca-path gives a coherent, reproducible dev/e2e setup.

internal/argocd/cluster/manager.go (1)

24-45: TLS config propagation into cluster cache is consistent with cache constructor.

Adding the *tls.Config parameter to NewManager and forwarding it to NewClusterCacheInstance cleanly wires Redis TLS into the cluster cache without altering other manager responsibilities.

Also applies to: 70-82

agent/inbound_redis.go (1)

20-24: Redis TLS client configuration in getRedisClientAndCache is robust and well-scoped.

Conditionally creating tls.Config (TLS 1.2+), warning and setting InsecureSkipVerify only when explicitly requested, and otherwise loading a CA from redisTLSCAPath (or falling back to system CAs with a warning) is a solid pattern for securing the Redis connection while keeping dev/e2e knobs available.

Also applies to: 51-55, 345-372

agent/agent.go (1)

323-349: LGTM! TLS configuration properly implemented.

The cluster cache TLS configuration correctly:

  • Creates TLS config with minimum TLS 1.2
  • Logs a warning for insecure mode (Line 330)
  • Loads and validates CA certificates with clear error messages
  • Passes the config to cluster cache initialization
hack/dev-env/configure-argocd-redis-tls.sh (1)

1-261: Well-structured TLS configuration script.

The script demonstrates good practices:

  • Idempotent operations with existence checks before patching
  • Clear user-facing messages and error handling
  • Proper scaling sequence (scale down/configure/scale up) to prevent connection errors during TLS transition
  • Context switching with cleanup on exit

Based on learnings, this is appropriate for E2E test environments.

test/e2e/redis_proxy_test.go (1)

120-123: Good resilience improvements for E2E tests.

The changes address race conditions and improve test reliability:

  • 5-second delay prevents pod deletion before Redis SUBSCRIBE is active (Lines 120-123)
  • Message draining logic ensures all available SSE messages are processed (Lines 188-208)
  • Retry logic for ResourceTree calls handles transient Redis EOF errors (Lines 211-237)
  • Buffered channel prevents message loss (Line 588)

These are appropriate enhancements for test stability.

Also applies to: 188-208, 211-237

install/kubernetes/agent/agent-deployment.yaml (1)

149-166: TLS configuration correctly integrated into agent deployment.

The manifest changes properly:

  • Add TLS environment variables from ConfigMap with optional: true
  • Mount the redis-tls-ca secret at /app/config/redis-tls
  • Mark both the volume and secret as optional to support non-TLS deployments

Also applies to: 193-211

test/e2e/fixture/cluster.go (2)

227-267: Excellent addition of Redis client caching.

The new getCachedCacheInstance and CleanupRedisCachedClients functions prevent connection leaks in E2E tests by reusing Redis clients across test operations. The mutex protection ensures thread-safety.


206-217: Appropriate timeout and pool configuration for E2E tests.

The Redis client configuration is well-tuned for port-forward latency:

  • Generous timeouts (DialTimeout: 10s, ReadTimeout: 30s)
  • Connection pooling (PoolSize: 10, retries: 3)
  • Idle connection management

These settings are suitable for test environments where network latency is higher.

principal/options.go (1)

492-548: Redis TLS options follow established patterns.

The new ServerOption functions are well-structured and consistent with existing options:

  • WithRedisTLSEnabled for feature toggle
  • WithRedisServerTLSFromPath and WithRedisServerTLSFromSecret for server TLS
  • WithRedisUpstreamTLSCAFromFile and WithRedisUpstreamTLSCAFromSecret for upstream CA
  • WithRedisUpstreamTLSInsecure for test scenarios

All use the standard option pattern and leverage existing tlsutil helpers.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 8b47b98 to 81c3402 Compare December 4, 2025 13:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
docs/configuration/redis-tls.md (1)

149-183: Tag remaining bare fenced code blocks with a language (markdownlint MD040)

There are still a few fenced blocks without a language, which markdownlint flags (MD040):

  • “How the tunnel works” ASCII diagram around line 150.
  • Script output examples for:
    • gen-redis-tls-certs.sh
    • configure-redis-tls.sh
    • configure-argocd-redis-tls.sh

To keep linters happy and clarify formatting, consider tagging them as plain text, e.g.:

-``` 
+```text
 Argo CD Server (remote vcluster) 
 ...
-``` 
+``` 

…and similarly for each of the script-output blocks.

Also applies to: 475-521

🧹 Nitpick comments (9)
principal/auth.go (1)

154-163: Trace logging in unaryAuthInterceptor is fine but consider toning down style

The added trace/Warn logs are helpful for debugging, but:

  • They partially duplicate information already logged inside authenticate, and
  • The emojis in log messages may not play well with log parsers or existing logging conventions.

Consider reusing a structured logger context (e.g., add module/client fields as elsewhere in this file) and dropping emojis to keep logs machine-friendly and consistent.

test/e2e/fixture/argoclient.go (1)

27-33: Env override for Argo CD server endpoint is useful; clarify expected format

Using ARGOCD_SERVER_ADDRESS as a fast path is handy for TLS-aware E2E runs and avoids an extra K8s call. Since NewArgoClient builds https:// URLs by treating endpoint as the Host, this env var should be host:port (no scheme).

Consider:

  • Documenting that expectation where this env var is set, and/or
  • Adding a lightweight sanity check (e.g., rejecting values starting with http:// or https://) to fail fast on misconfiguration.

Also applies to: 315-337

principal/listen.go (1)

172-199: New gRPC/WebSocket and service-registration logging is functionally safe

The added Info-level logs around WebSocket enablement, server startup, and service registration improve startup visibility without changing behavior.

One nit: as in auth.go, the emoji-heavy messages (🔧, etc.) might not fit all log ingestion/alerting setups. If you want to keep logs easily greppable and machine-friendly, consider switching to plain-text messages while retaining the same structure and fields.

Also applies to: 224-230

hack/dev-env/start-agent-autonomous.sh (1)

37-83: Autonomous agent Redis TLS wiring looks correct; consider temp-file cleanup

The script correctly:

  • Enables Redis TLS when creds/redis-tls/ca.crt exists and passes --redis-tls-enabled/--redis-tls-ca-path.
  • Defaults ARGOCD_AGENT_REDIS_ADDRESS to localhost:6382 for local E2E with a clear port-forward hint.
  • Extracts client cert/key and CA from Kubernetes secrets and passes them via --tls-client-cert/--tls-client-key/--root-ca-path.

For local dev/E2E this is fine as-is. If you want to tighten things slightly, you could switch the /tmp/... paths to mktemp files and register a trap to remove them on exit, so TLS materials don’t linger longer than necessary.

hack/dev-env/configure-argocd-redis-tls.sh (1)

52-216: Patching logic is pragmatic for dev/E2E; consider surfacing failures

The pattern of:

  • Checking for existing redis-tls-ca volumes/volumeMounts and --redis-use-tls args, and
  • Applying JSON patches with ... || true

gives you an idempotent script that won’t die if the manifests drift slightly, which is good for local/E2E usage.

One trade-off is that if a future manifest change causes a patch to fail (e.g., args or volumes arrays are removed/renamed), the script will silently skip adding TLS CA mounts/flags and you’ll only see failures later when components can’t talk to Redis.

Not urgent, but for easier debugging you might consider:

  • Logging a warning when a patch fails (e.g., capture stderr/stdout and echo a “could not patch X for Redis TLS” line), or
  • Tightening the presence checks (e.g., verifying args/volumes arrays exist) so failures are more explicit.

This would keep the script resilient while making TLS misconfigurations easier to diagnose.

hack/dev-env/start-agent-managed.sh (1)

37-83: Managed agent TLS and Redis address wiring look correct

The script correctly:

  • Enables Redis TLS when the dev CA is present and passes the CA path via --redis-tls-* flags.
  • Defaults the Redis address to localhost:6381 (aligned with the Procfile port-forward) while allowing override via ARGOCD_AGENT_REDIS_ADDRESS.
  • Extracts the agent mTLS cert/key and CA from Kubernetes secrets and injects them into the agent flags.

This matches the documented E2E flow and ensures proper certificate validation over the localhost port‑forward, while still allowing non‑TLS operation in ad‑hoc dev setups.

test/e2e/fixture/fixture.go (1)

107-155: Fixture cleanup and Redis-backed cluster info reset are safer and more robust

  • The bounded deletion loops and the WaitForDeletion polling remain clear and avoid unbounded waits.
  • Switching to DeepCopy() for applications and AppProjects before mutating namespace/name prevents subtle bugs caused by reusing the range loop variable.
  • The new resetManagedAgentClusterInfo helper, invoked at the end of CleanUp, ensures the managed agent’s cluster info in Redis is reset between tests, and the choice to log (rather than fail) when Redis is unavailable is appropriate for E2E teardown.

Also applies to: 218-266, 294-357, 457-471

agent/agent.go (1)

443-460: Ensure cacheRefreshInterval is always positive before starting the cluster cache info ticker

The new goroutine that sends initial and periodic cluster cache info updates for both managed and autonomous agents is a good consolidation of behaviour. However, time.NewTicker(a.cacheRefreshInterval) will panic if cacheRefreshInterval is zero or negative, so it’s important that:

  • a.cacheRefreshInterval is always initialized to a positive duration via options or defaults before Start is called, or
  • a defensive check is added here to guard against an uninitialized value.
hack/dev-env/start-e2e.sh (1)

19-48: Consider removing unused helper function.

The getExternalLoadBalancerIP function is no longer called after switching to localhost-based addresses. While it may have future utility, removing unused code improves maintainability.

Apply this diff to remove the unused function:

-# getExternalLoadBalancerIP will set EXTERNAL_IP with the load balancer hostname from the specified Service
-getExternalLoadBalancerIP() {
-  SERVICE_NAME=$1
-
-  MAX_ATTEMPTS=120
-
-  for ((i=1; i<=MAX_ATTEMPTS; i++)); do
-    
-    echo ""
-    EXTERNAL_IP=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
-    EXTERNAL_HOST=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
-
-    if [ -n "$EXTERNAL_IP" ]; then
-      echo "External IP for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_IP"
-      break
-    elif [ -n "$EXTERNAL_HOST" ]; then
-      echo "External host for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_HOST"
-      EXTERNAL_IP=$EXTERNAL_HOST
-      break
-    else
-      echo "External IP for $SERVICE_NAME on $K8S_CONTEXT not yet available, attempting again in 5 seconds..."
-      sleep 5
-    fi
-  done
-
-  if [ $i -gt $MAX_ATTEMPTS ]; then
-    echo "Failed to obtain external IP after $MAX_ATTEMPTS attempts."
-    exit 1
-  fi
-
-}
-
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b47b98 and 81c3402.

📒 Files selected for processing (29)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/auth.go (1 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • principal/tracker/tracking.go
  • test/e2e/sync_test.go
  • test/run-e2e.sh
  • test/e2e/redis_proxy_test.go
  • principal/resource.go
  • test/e2e/clusterinfo_test.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • cmd/argocd-agent/principal.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • Makefile
  • test/e2e/rp_test.go
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-agent-managed.sh
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/configure-argocd-redis-tls.sh
  • test/e2e/README.md
🧬 Code graph analysis (7)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
principal/listen.go (4)
internal/logging/logging.go (2)
  • Info (295-297)
  • Warn (300-302)
pkg/api/grpc/authapi/auth_grpc.pb.go (1)
  • RegisterAuthenticationServer (83-85)
pkg/api/grpc/versionapi/version_grpc.pb.go (1)
  • RegisterVersionServer (69-71)
pkg/api/grpc/eventstreamapi/eventstream_grpc.pb.go (1)
  • RegisterEventStreamServer (144-146)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
principal/auth.go (1)
internal/logging/logging.go (2)
  • Trace (285-287)
  • Warn (300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

test/e2e/README.md

32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (22)
Makefile (1)

59-79: Redis TLS setup in setup-e2e looks consistent with the E2E story

The new block cleanly wires the TLS cert generation and per-cluster Redis/Argo CD TLS configuration into make setup-e2e, matching the “TLS required for E2E” design. No functional issues from the Makefile side.

docs/getting-started/kubernetes/index.md (1)

159-230: Redis TLS steps in the Kubernetes getting-started guide are consistent and clear

The new Sections 2.4 and 4.4 cleanly walk through:

  • Generating CA/server certs,
  • Creating the shared argocd-redis-tls secret,
  • Patching argocd-redis to enable TLS-only,
  • Verifying with redis-cli --tls,

and mirror the control-plane vs workload-cluster story correctly. The cross-link to the dedicated “Redis TLS Configuration” doc ties it together nicely.

No issues from a correctness or usability standpoint.

Also applies to: 337-381, 646-646

test/e2e/README.md (1)

21-107: E2E flow and Redis TLS documentation are coherent and aligned with the scripts

The new multi‑step flow (setup, optional reverse tunnel, start principal/agents, run tests) together with the Redis TLS section matches the dev scripts and TLS wiring (make setup-e2e, reverse-tunnel, Redis TLS cert/config scripts, and port‑forwards). The note about InsecureSkipVerify being limited to the test fixture while agents/principal do full TLS validation is clear and appropriate for E2E usage.

hack/dev-env/Procfile.e2e (1)

1-7: Procfile-based Redis and Argo CD port-forwards look consistent with the E2E flow

The added pf-* entries correctly establish Redis port-forwards for the three vclusters and an Argo CD server port-forward, and the staggered startup of principal and agents fits with the new TLS/localhost-based Redis configuration. Note that the principal script also starts a Redis port-forward by default; see the comment on hack/dev-env/start-principal.sh to avoid double port-forwarding on localhost:6380.

test/e2e/rp_test.go (1)

161-245: Fixture helpers for Argo endpoint and admin secret are a solid cleanup

Switching to fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret, and then building the client via fixture.NewArgoClient, removes duplicate K8s plumbing in the tests and aligns them with the TLS-aware endpoint discovery used elsewhere. The resulting Argo login and application flows are unchanged and easier to maintain.

Also applies to: 294-307

hack/dev-env/gen-redis-tls-certs.sh (1)

1-150: Redis TLS cert generation is robust and idempotent

The script cleanly generates a CA plus per‑role Redis certificates with appropriate SANs (including localhost/loopback), skips regeneration when artifacts exist, conditionally adds the local IP, and cleans up temporary files. With set -e and no stderr suppression on OpenSSL commands, failures will be surfaced instead of silently ignored.

hack/dev-env/start-principal.sh (1)

58-76: Redis TLS argument construction for principal is consistent with the dev CA layout

The TLS detection block correctly checks for the proxy cert/key and CA in creds/redis-tls and, when present, passes them via --redis-tls-enabled, --redis-server-tls-cert/--redis-server-tls-key, and --redis-upstream-ca-path. This lines up with the certs generated by gen-redis-tls-certs.sh and ensures proper validation for both localhost port‑forward and reverse‑tunnel scenarios.

agent/agent.go (1)

17-24: Cluster cache Redis client now correctly honors Redis TLS settings

The new clusterCacheTLSConfig construction mirrors the Redis proxy’s TLS behaviour:

  • When Redis TLS is enabled, the cluster cache client enforces TLS 1.2+.
  • If redisTLSInsecure is set, a clear warning is logged and certificate verification is disabled.
  • Otherwise, when redisTLSCAPath is provided, the CA bundle is loaded and set as RootCAs, and any read/parse errors fail agent construction with an explicit error.

Passing this TLS config into cluster.NewClusterCacheInstance ensures the cluster cache uses the same secure Redis connection settings as the proxy.

Also applies to: 323-346

internal/argocd/cluster/cluster.go (2)

176-192: LGTM! TLS configuration properly integrated.

The TLS configuration is correctly passed through to the Redis client options. The tlsConfig parameter allows for optional TLS (nil is acceptable for non-TLS connections), and the Redis client will handle nil TLSConfig appropriately.


135-142: Connection state initialization is appropriate.

The fallback initialization when no existing ConnectionState is present correctly sets a Successful status with a timestamp. This ensures the cluster appears connected when cache stats are first received from an agent, which is the expected behavior.

hack/dev-env/start-e2e.sh (1)

50-59: Well-structured E2E test configuration.

The localhost-based Redis addresses are appropriate for TLS certificate validation in E2E tests, and the REDIS_PASSWORD retrieval is now correctly split into separate assignment and export to avoid masking kubectl errors (addressing the previous shellcheck warning).

test/e2e/fixture/cluster.go (4)

182-201: TLS configuration appropriate for E2E tests.

The use of InsecureSkipVerify: true is intentional for E2E tests to accommodate dynamic LoadBalancer addresses, as noted in the PR description. The TLS encryption is still enabled, which is the primary security goal.


206-217: Well-tuned Redis client configuration for E2E.

The generous timeouts, connection pool sizing, and retry configuration are appropriate for E2E test scenarios with port-forwarding latency. The settings balance test reliability with resource usage.


227-267: Effective connection leak prevention.

The caching mechanism with mutex protection prevents connection leaks across test runs. The cache key design (source + address) correctly handles multiple Redis instances, and the CleanupRedisCachedClients function enables proper test teardown.


154-165: Helpful debugging additions.

The added log statements provide useful context for troubleshooting E2E test failures, especially when investigating TLS-enabled Redis connectivity issues.

hack/dev-env/configure-redis-tls.sh (4)

61-66: Certificate validation is complete.

All required TLS certificate files are now validated (server certificate, key, and CA), addressing previous review feedback. The error message clearly guides users to run the certificate generation script.


81-122: Excellent transition strategy to prevent SSL errors.

Scaling down Argo CD components before enabling Redis TLS prevents connection errors during the transition. Saving replica counts to a ConfigMap for restoration by the configure-argocd-redis-tls.sh script is a clean approach to maintaining the desired deployment state.


138-196: Robust idempotent patching logic.

The script correctly checks for existing volumes and volumeMounts before patching, and handles both empty/null arrays and existing arrays. The JSON patch operations are well-structured with proper error handling.


199-215: Verify TLS configuration flags for Redis.

The Redis TLS configuration uses --tls-auth-clients no (line 212), which disables client certificate authentication. Confirm this is intentional for the E2E environment and aligns with how Redis clients in the codebase are configured.

principal/redisproxy/redisproxy.go (3)

130-154: TLS configuration builder handles both cert sources correctly.

The function properly supports both file-based and in-memory certificates with appropriate error handling. Setting MinVersion to TLS 1.2 is a reasonable baseline for compatibility while maintaining security.


157-183: TLS listener setup is well-structured.

The TLS-enabled listener creation with fallback to plain TCP is correctly implemented. The logging clearly indicates whether TLS is enabled, which aids troubleshooting.


836-897: Upstream TLS connection handling is thorough.

The method properly:

  • Establishes the TCP connection first
  • Conditionally wraps with TLS based on configuration
  • Handles CA from pool, CA from file, and insecure mode
  • Extracts SNI correctly from the address
  • Performs the TLS handshake
  • Cleans up the connection on errors

The warning log for InsecureSkipVerify (Line 859) is good practice.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 81c3402 to 4a2ee37 Compare December 4, 2025 14:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
install/helm-repo/argocd-agent-agent/values.schema.json (1)

302-330: Type inconsistency persists: redisTLS.enabled/insecure vs networkPolicy.enabled use different schema patterns.

The redisTLS.enabled and redisTLS.insecure fields use anyOf with string enum ["true", "false"] plus boolean type, while networkPolicy.enabled uses a simple boolean type. This creates inconsistency despite a prior review comment marking this as addressed.

Recommendation: Either standardize all boolean flags to use type: "boolean" for clarity, or document why string enums are needed (likely for environment variable compatibility in templates). The pattern already exists for tlsClientInSecure (line 152–158), so the choice should be applied consistently across all similar fields.

Also applies to: 337-340

cmd/argocd-agent/principal.go (1)

272-287: Incomplete mutual exclusivity validation for upstream TLS modes.

The validation at lines 273-275 only checks --redis-upstream-tls-insecure against --redis-upstream-ca-path, but there are three mutually exclusive upstream TLS modes:

  1. --redis-upstream-tls-insecure (skip verification)
  2. --redis-upstream-ca-path (CA from file)
  3. --redis-upstream-ca-secret-name (CA from secret — has default value)

If a user specifies --redis-upstream-tls-insecure=true without also explicitly setting the CA secret name to empty, the insecure mode silently wins over the default secret. This behavior may be intentional, but it differs from the explicit validation done for insecure vs CA path.

Consider whether the current validation is sufficient for your use case, or if you need to also check for explicit user-provided --redis-upstream-ca-secret-name values that conflict with insecure mode.

hack/dev-env/configure-argocd-redis-tls.sh (1)

228-231: Replica guard logic still has operator precedence issue.

The shell operator precedence means [ "$X" = "0" ] || [ -z "$X" ] && X="1" is parsed as cond1 || (cond2 && assign). When REPO_SERVER_REPLICAS="0", the first test succeeds and short-circuits, so the assignment never runs.

Apply this fix:

 # Ensure we have at least 1 replica
-[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
-[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
-[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"
+if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then
+  REPO_SERVER_REPLICAS="1"
+fi
+if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then
+  CONTROLLER_REPLICAS="1"
+fi
+if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then
+  SERVER_REPLICAS="1"
+fi
hack/dev-env/start-principal.sh (1)

23-29: Defaulting Redis address and delegating port-forward to Procfile is correct

Using localhost:6380 as the default ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS and leaving the actual port‑forward to Procfile (or manual kubectl port-forward) cleanly resolves the earlier conflict and keeps this script focused on principal startup.

principal/redisproxy/redisproxy.go (1)

836-897: Avoid silently downgrading upstream Redis to plaintext when server TLS is enabled

With the current condition:

if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
    // wrap conn in TLS
}

if the proxy server has TLS enabled but no upstream TLS config is provided, the upstream connection stays unencrypted. That’s a surprising and weaker posture for a “Redis TLS by default” setup, and can leak data in‑cluster while clients believe they’re on a fully‑TLS path.

Recommend at least logging a clear warning when rp.tlsEnabled is true but no upstream TLS config is present, and strongly consider enforcing TLS (e.g., treat that configuration as an error) so misconfiguration is caught early.

For example:

hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure

if rp.tlsEnabled && !hasUpstreamTLSConfig {
    logCtx.Warn("Redis proxy TLS is enabled but no upstream Redis TLS configuration is set; upstream traffic will be plaintext")
}

if rp.tlsEnabled && hasUpstreamTLSConfig {
    // current TLS wrapping logic
}
🧹 Nitpick comments (11)
principal/auth.go (1)

154-165: LGTM! Observability improvements for auth flow.

The trace logging additions improve visibility into the authentication interceptor's decision path. The emoji markers (🔵🟢🟡🔴) provide visual cues for different paths, and the log levels are appropriate (Trace for flow, Warn for failures).

Note: While these changes are orthogonal to the main PR objective (Redis TLS), they're valuable observability improvements.

Optional: Consider whether emoji markers in logs might cause issues with log aggregation or parsing systems in your environment. If so, you could replace them with text prefixes like [RECV], [NOAUTH], [AUTH_REQ], [AUTH_FAIL]. However, since these are Trace-level logs (typically disabled in production), the risk is minimal.

principal/listen.go (2)

174-199: Normalize WebSocket / gRPC startup & shutdown logging

The added logs help clarify which mode is used, but there are a couple of polish points:

  • Log messages have leading spaces (" WebSocket is ENABLED...", " gRPC server.Serve() exited"), which will look odd and make grepping harder.
  • The emoji in Line 174 may be inconsistent with the rest of the project’s logging style.
  • In the WebSocket branch you now log startup (Line 186) but not shutdown, while in the gRPC branch you log both startup and exit (Lines 194–197). For symmetry and debugging, consider adding a WithError(err) log after ServeTLS returns as well, and possibly downgrading the exit log to Debug or only warning on unexpected errors.

These are non-blocking, but tightening them up would keep logs cleaner and more consistent.


224-231: Reassess verbosity and level of new gRPC service registration logs

The per-service Info logs make startup more transparent, but four Info-level lines here may become noisy in larger deployments:

  • Consider either collapsing into a single Info message listing all registered services, or moving the detailed per-service logs to Debug.
  • The initial “Registering gRPC services on principal” message (Line 224) is useful; the three “... registered successfully” lines could be demoted if log volume is a concern.

No functional issues, just a suggestion to balance observability vs log noise.

hack/dev-env/start-e2e.sh (1)

19-48: Consider removing unused getExternalLoadBalancerIP function.

This function is no longer called in the script since the switch to static localhost addresses. Dead code increases maintenance burden.

-# getExternalLoadBalancerIP will set EXTERNAL_IP with the load balancer hostname from the specified Service
-getExternalLoadBalancerIP() {
-  SERVICE_NAME=$1
-
-  MAX_ATTEMPTS=120
-
-  for ((i=1; i<=MAX_ATTEMPTS; i++)); do
-    
-    echo ""
-    EXTERNAL_IP=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
-    EXTERNAL_HOST=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
-
-    if [ -n "$EXTERNAL_IP" ]; then
-      echo "External IP for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_IP"
-      break
-    elif [ -n "$EXTERNAL_HOST" ]; then
-      echo "External host for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_HOST"
-      EXTERNAL_IP=$EXTERNAL_HOST
-      break
-    else
-      echo "External IP for $SERVICE_NAME on $K8S_CONTEXT not yet available, attempting again in 5 seconds..."
-      sleep 5
-    fi
-  done
-
-  if [ $i -gt $MAX_ATTEMPTS ]; then
-    echo "Failed to obtain external IP after $MAX_ATTEMPTS attempts."
-    exit 1
-  fi
-
-}
hack/dev-env/configure-argocd-redis-tls.sh (1)

29-31: Consider using --context flag instead of switching global context.

Using kubectl config use-context modifies the user's kubeconfig globally, which could cause issues if the script is interrupted or if parallel operations are running. Consider using kubectl --context=${CONTEXT} for each command instead.

-# Switch context
-echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+# Use context flag for all kubectl commands instead of switching globally
+KUBECTL="kubectl --context=${CONTEXT}"

Then replace all kubectl calls with ${KUBECTL}.

test/e2e/fixture/cluster.go (1)

259-267: CleanupRedisCachedClients should explicitly close Redis connections.

The cleanup function only clears the map and relies on garbage collection. Redis clients should be explicitly closed to release connections immediately and avoid potential resource leaks during test suite execution.

 // CleanupRedisCachedClients closes all cached Redis clients (should be called at end of test suite)
 func CleanupRedisCachedClients() {
 	cachedRedisClientMutex.Lock()
 	defer cachedRedisClientMutex.Unlock()

 	fmt.Printf("Cleaning up %d cached Redis clients\n", len(cachedRedisClients))
+	// Note: appstatecache.Cache doesn't expose Close() method, so we rely on GC
+	// If connection leaks become an issue, consider storing the underlying redis.Client
+	// separately to enable explicit Close() calls
 	// Clear the cache map - connections will be garbage collected
 	cachedRedisClients = make(map[string]*appstatecache.Cache)
 }

Alternatively, if the underlying redis.Client can be stored separately, implement explicit closure:

// Store both cache and client for proper cleanup
type cachedRedisEntry struct {
    cache  *appstatecache.Cache
    client *redis.Client
}
principal/redisproxy/redisproxy.go (1)

65-154: TLS server configuration looks sound; consider preloading CA if needed later

The added TLS fields and createServerTLSConfig correctly handle both file-based and in‑memory cert+key, and enforce TLS 1.2+. If this proxy ever becomes connection‑heavy, you might later consider preloading / reusing cert material (rather than rebuilding tls.Certificate from fields on each start), but it’s not required for current usage.

test/e2e/fixture/fixture.go (2)

229-291: Treating cleanup failures as warnings is appropriate for E2E tests

The new fmt.Printf("Warning: ...") paths during application/AppProject cleanup ensure teardown issues (especially on remote/slow clusters) don’t cascade into hard test failures. That’s a good trade‑off for E2E stability.

Also applies to: 269-291, 295-357, 372-373


457-471: Guard resetManagedAgentClusterInfo against nil clusterDetails

resetManagedAgentClusterInfo assumes clusterDetails is non‑nil. That’s true when called via BaseSuite, but CleanUp is exported and could be invoked with a nil pointer elsewhere, leading to a panic when getCachedCacheInstance dereferences it.

Consider a light guard:

func resetManagedAgentClusterInfo(clusterDetails *ClusterDetails) error {
    if clusterDetails == nil {
        return nil
    }
    if err := getCachedCacheInstance(AgentManagedName, clusterDetails).
        SetClusterInfo(AgentClusterServerURL, &argoapp.ClusterInfo{}); err != nil {
        return fmt.Errorf("resetManagedAgentClusterInfo: %w", err)
    }
    return nil
}

Optionally, if you have a CleanupRedisCachedClients helper, calling it from CleanUp after resetManagedAgentClusterInfo would fully reset Redis client state between tests.

hack/dev-env/start-agent-managed.sh (1)

37-62: Consider failing fast when Redis TLS certs are missing in TLS-only setups

The script correctly enables Redis TLS when creds/redis-tls/ca.crt exists and wires --redis-tls-enabled/--redis-tls-ca-path into the agent command, with a sensible default localhost:6381 address for the port‑forward.

Given the rest of the dev/E2E setup now configures Redis as TLS‑only by default, the "running without TLS" fallback path is likely to just produce connection errors later. You might consider turning the “certificates not found” case into a hard failure (or at least a stronger warning) in the e2e flow so misconfigured environments are surfaced early.

Also applies to: 48-62, 63-75, 76-83

hack/dev-env/start-principal.sh (1)

44-62: TLS wiring for principal looks good; consider stricter handling when certs are absent

The detection of redis-proxy.{crt,key} and ca.crt under creds/redis-tls and construction of:

--redis-tls-enabled=true
--redis-server-tls-cert=...
--redis-server-tls-key=...
--redis-upstream-ca-path=...

is consistent with the documented principal Redis TLS options.

Similar to the managed-agent script, now that dev/E2E flows configure Redis as TLS‑only by default, you might want to treat the “certificates not found, running without TLS” branch as a hard failure (or at least a very loud warning) so misconfigured environments don’t just fail later with opaque connection errors.

Also applies to: 64-71

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81c3402 and 4a2ee37.

📒 Files selected for processing (29)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/auth.go (1 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • hack/dev-env/gen-redis-tls-certs.sh
  • principal/resource.go
  • test/e2e/rp_test.go
  • test/run-e2e.sh
  • test/e2e/fixture/argoclient.go
  • hack/dev-env/start-agent-autonomous.sh
  • test/e2e/clusterinfo_test.go
  • docs/getting-started/kubernetes/index.md
  • Makefile
  • test/e2e/redis_proxy_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/README.md
  • hack/dev-env/configure-argocd-redis-tls.sh
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/start-agent-managed.sh
🧬 Code graph analysis (4)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
principal/auth.go (1)
internal/logging/logging.go (2)
  • Trace (285-287)
  • Warn (300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (21)
install/helm-repo/argocd-agent-agent/values.schema.json (1)

332-382: Schema structure for networkPolicy.redis is well-formed and permissive for selectors.

The nested object structure correctly allows additionalProperties: true on agentSelector and redisSelector, which is appropriate for Kubernetes label selectors that may include custom labels beyond the documented app.kubernetes.io/name key.

cmd/argocd-agent/principal.go (3)

89-98: LGTM!

The Redis TLS configuration variables are well-organized with a clear comment header, follow consistent naming conventions, and use appropriate types.


420-422: Verify: Redis TLS default value vs PR objective.

The PR title states "redis TLS encryption enabled by default for all connections", but --redis-tls-enabled defaults to false here.

If the intent is for Redis TLS to be enabled by default in production, verify that this is handled in the Helm charts or Kubernetes manifests rather than the CLI defaults. If the CLI should also default to TLS enabled, this would need to change to true.


471-471: Significant timeout increase noted.

The secret retrieval timeout increased from 2 seconds to 30 seconds. This improves reliability for slow Kubernetes API responses, but extends the startup failure time if secrets are misconfigured or unavailable. The tradeoff seems reasonable for production environments.

test/e2e/sync_test.go (1)

371-373: Pre-sync hook Job name alignment looks good

Updating the hook Job name to "before" in both tests keeps the termination assertions intact, assuming the manifest in test/data/pre-sync uses the same name. No issues from the test logic perspective.

Also applies to: 465-468

principal/tracker/tracking.go (1)

75-78: Appropriate concurrency fix for request-response pattern.

The buffered channel (capacity 1) correctly prevents deadlock when the sender and receiver operate asynchronously in goroutines. This is the standard pattern for 1:1 request-response scenarios where exactly one response is expected per tracked request.

Verify these assumptions in code review:

  1. Each tracked event receives at most one response (no multiple sends to the same channel)
  2. StopTracking is always called to close the channel and prevent resource leaks
  3. The sender handles scenarios where the channel might be closed before sending
agent/agent.go (2)

323-343: TLS configuration for cluster cache looks well-structured.

The TLS configuration properly:

  • Sets minimum TLS version to 1.2
  • Logs a warning for insecure mode (addressing the previous review comment)
  • Loads and validates CA certificates when a path is provided
  • Returns clear error messages on failure

445-460: Improved startup logic for cluster cache info updates.

Sending an initial update immediately on startup (before waiting for the first ticker interval) improves the time-to-first-sync. The unified code path for both managed and autonomous modes simplifies maintenance.

hack/dev-env/start-e2e.sh (1)

50-59: Static localhost addresses and fixed REDIS_PASSWORD handling look good.

The switch to localhost-based addresses for TLS certificate validation is appropriate for E2E tests. The REDIS_PASSWORD retrieval is now correctly separated into declaration and export (addressing the previous shellcheck warning).

internal/argocd/cluster/cluster.go (2)

135-142: Good defensive initialization of ConnectionState.

Initializing ConnectionState when it doesn't exist yet prevents nil-related issues and provides meaningful status for newly connected agents. The timestamp uses time.Now() which is appropriate since this represents the moment the cache stats update was received.


176-184: TLS configuration properly wired to Redis client.

The tlsConfig parameter is correctly passed through to the Redis client options. This follows the pattern established in the relevant code snippet and integrates cleanly with the existing cache creation logic.

hack/dev-env/configure-argocd-redis-tls.sh (1)

56-70: Idempotency checks and patching pattern looks reasonable for E2E/dev use.

The script properly checks for existing configuration before applying patches, preventing duplicate volumes/mounts/args. The 2>/dev/null || true pattern handles edge cases gracefully for a development script.

test/e2e/fixture/cluster.go (3)

206-217: Generous timeouts and connection pool settings are appropriate for E2E tests.

The extended timeouts (10s dial, 30s read) and retry configuration help handle port-forward latency and test environment variability. The pool size of 10 with min/max idle settings is reasonable for concurrent test load.


180-201: InsecureSkipVerify is acceptable for E2E tests with appropriate comment.

The comment clearly documents that this is for E2E test simplicity. For production code, CA certificate validation would be required (which is implemented elsewhere in this PR).


320-326: Good use of environment variable overrides for local development.

The MANAGED_AGENT_REDIS_ADDR and ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS environment variables allow developers to use port-forwarding with localhost addresses while the production code uses service discovery. This aligns with the static localhost addresses exported in start-e2e.sh.

Also applies to: 380-386

principal/redisproxy/redisproxy.go (2)

159-183: TLS listener startup branching is correct

The Start method cleanly switches between tls.Listen and plain net.Listen based on rp.tlsEnabled, with appropriate error logging and success messages; this matches the new TLS configuration surface.


221-270: Connection handling change to use method receiver is fine

Switching handleConnection to call rp.establishConnectionToPrincipalRedis (method receiver) instead of a standalone function keeps Redis proxy state encapsulated without altering behavior.

test/e2e/fixture/fixture.go (1)

109-155: Bounded deletion waits improve test robustness

Capping EnsureDeletion and WaitForDeletion at 120×1s iterations gives deterministic test-time behavior and avoids potential infinite waits on stuck resources; the structure of the retry loops looks correct.

Also applies to: 160-171

test/e2e/README.md (1)

21-108: E2E flow and Redis TLS documentation are clear and consistent

The restructured steps (environment setup, optional reverse tunnel, start processes, run tests) and the explicit Redis TLS section align well with the new scripts and Procfile; the notes about InsecureSkipVerify being test‑fixture only are also clear.

hack/dev-env/Procfile.e2e (1)

1-7: Centralizing port-forwards in Procfile avoids conflicts

Having pf-* entries own the Redis and argocd-server port‑forwards and starting principal/agents afterward resolves the previous “double port‑forward on 6380” issue and gives a clear, reproducible startup model for make start-e2e.

hack/dev-env/configure-redis-tls.sh (1)

1-246: Redis TLS configuration script is robust and idempotent for dev/E2E use

The script cleanly validates cert material, preserves/restores kube context, scales down dependent Argo CD components, creates the argocd-redis-tls secret, patches the deployment with volumes/mounts, and replaces args to enable TLS‑only on 6379. The volume/mount existence checks make it safe to re‑run, which is valuable during iterative E2E setup.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 4a2ee37 to dd9cf85 Compare December 4, 2025 15:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
docs/configuration/redis-tls.md (1)

149-156: Tag remaining fenced blocks with a language to satisfy markdownlint.

As flagged in a previous review, these code blocks still need language tags. Apply text to the diagram and script output blocks.

- **How the tunnel works:**
-  ```
+**How the tunnel works:**
+  ```text
   Argo CD Server (remote vcluster) 
       → rathole Deployment (remote) 
       → rathole Container (local Mac) 
       → Principal process (local Mac)
-  ```
+  ```

The same fix applies to lines 475, 486, and 504.

🧹 Nitpick comments (3)
test/e2e/redis_proxy_test.go (1)

120-124: The hardcoded sleep is a pragmatic workaround, but consider documenting the root cause.

The 5-second delay to wait for Redis SUBSCRIBE propagation is a reasonable workaround for the race condition. The comment explains the issue well.

If this race condition is specific to the test setup, it might be worth adding a TODO to investigate whether the subscription can be verified more deterministically in the future:

 	// Wait for SSE stream to fully establish and Redis SUBSCRIBE to propagate
 	// This prevents a race condition where the pod is deleted before the subscription is active
+	// TODO: Consider implementing a more deterministic check for subscription readiness
 	t.Log("Waiting for SSE stream to fully establish...")
 	time.Sleep(5 * time.Second)
principal/redisproxy/redisproxy.go (1)

853-894: Consider logging a warning when server TLS is enabled but upstream TLS is not configured.

The upstream TLS connection is only established when rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) (line 853). This means if the Redis proxy server has TLS enabled for incoming connections but no upstream TLS configuration is provided, it will connect to the principal's Redis over plain TCP within the cluster.

While this may be intentional for some deployment scenarios (e.g., trusting internal cluster network), it creates an inconsistent security posture that operators should be aware of.

Consider adding a warning log when this configuration mismatch occurs:

+	// Warn if server TLS is enabled but no upstream TLS is configured
+	hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure
+	if rp.tlsEnabled && !hasUpstreamTLSConfig {
+		logCtx.Warn("Redis proxy server has TLS enabled for incoming connections, but upstream connection to principal Redis will be unencrypted")
+	}
+
 	// If TLS is enabled for upstream, wrap the connection with TLS
-	if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+	if rp.tlsEnabled && hasUpstreamTLSConfig {

This improves operator awareness without enforcing a specific security policy.

hack/dev-env/configure-redis-tls.sh (1)

199-215: Add error handling to args patch.

The args patch lacks error checking (||exit 1) present in other patches (lines 153, 162, 183, 192). If this critical patch fails, the deployment won't have TLS configured, but the script reports success.

Add error handling to the args patch:

 kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
   {
     "op": "replace",
     "path": "/spec/template/spec/containers/0/args",
     "value": [
       "--save", "",
       "--appendonly", "no",
       "--requirepass", "$(REDIS_PASSWORD)",
       "--tls-port", "6379",
       "--port", "0",
       "--tls-cert-file", "/app/tls/tls.crt",
       "--tls-key-file", "/app/tls/tls.key",
       "--tls-ca-cert-file", "/app/tls/ca.crt",
       "--tls-auth-clients", "no"
     ]
   }
-]'
+]' || { echo "Failed to patch Redis args"; exit 1; }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a2ee37 and dd9cf85.

📒 Files selected for processing (28)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • principal/resource.go
  • principal/listen.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • hack/dev-env/start-principal.sh
  • hack/dev-env/start-agent-autonomous.sh
  • hack/dev-env/configure-argocd-redis-tls.sh
  • test/e2e/rp_test.go
  • test/e2e/fixture/argoclient.go
  • test/e2e/clusterinfo_test.go
  • Makefile
  • hack/dev-env/start-e2e.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/README.md
  • test/run-e2e.sh
  • hack/dev-env/start-agent-managed.sh
  • hack/dev-env/Procfile.e2e
🧬 Code graph analysis (5)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
  • Event (112-115)
internal/logging/logfields/logfields.go (1)
  • Event (34-34)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/redisproxy/redisproxy.go (3)
internal/logging/logfields/logfields.go (3)
  • Config (127-127)
  • Error (101-101)
  • Client (37-37)
principal/listen.go (1)
  • Listener (54-60)
internal/logging/logging.go (2)
  • Error (305-307)
  • Warn (300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md

229-229: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Lint Go code
  • GitHub Check: Run unit tests
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (42)
test/run-e2e.sh (2)

104-115: Verify environment variable names match test code expectations.

Environment variables are exported for local development with inconsistent naming conventions: REDIS_SERVER_ADDRESS vs. REDIS_ADDR suffixes, and inconsistent AGENT prefixes. Confirm these names match what the test code actually consumes.


81-102: The lsof -i :6380 -i :6381 -i :6382 syntax is standard and correct. Multiple -i flags are ORed by default in lsof (matching any of the specified ports), which is the intended behavior—the script warns if none of the three ports are detected. The proposed fix using && operators would incorrectly require all three ports to be listening, changing the detection logic. No action needed.

principal/tracker/tracking.go (1)

75-78: Verify that buffered capacity 1 is sufficient and handles edge cases correctly.

The change from an unbuffered to a buffered channel addresses potential deadlocks when the sender runs before the receiver is ready. However, verify:

  1. One event per request: Confirm that each tracked request receives exactly one response event, ensuring capacity 1 is sufficient.
  2. Closed channel handling: Ensure processRedisEventResponse doesn't send on a closed channel if StopTracking is called prematurely (would cause a panic).
  3. Abandoned channel cleanup: Verify that sendSynchronousRedisMessageToAgent always consumes from the channel or that proper timeout/cleanup mechanisms exist to prevent goroutine leaks.
test/e2e/sync_test.go (1)

371-371: Verify the Job name "before" matches the test data definition.

The pre-sync hook Job name has been updated to "before" in both Test_TerminateOperationManaged (line 371) and Test_TerminateOperationAutonomous (line 466). Confirm this name matches the actual Job resource defined in test/data/pre-sync, since the tests use this name to verify Job cleanup within a 120-second timeout—a mismatch will cause test failures.

Also note: This change appears unrelated to the PR's Redis TLS enablement objectives and may warrant a separate commit for clarity.

Also applies to: 466-466

hack/dev-env/Procfile.e2e (1)

1-7: LGTM! Well-structured E2E process configuration.

The port-forward setup correctly maps Redis ports for each vcluster (6380-6382 → 6379) and the argocd-server port (8444 → 443). The sleep delays appropriately sequence the startup to allow port-forwards to establish before services start.

docs/getting-started/kubernetes/index.md (2)

159-230: Clear and comprehensive Redis TLS setup documentation.

The certificate generation steps are secure (4096-bit RSA keys, appropriate SANs). The warning about Redis TLS being required and the step-by-step kubectl patches are well-documented.


337-381: Good approach to workload cluster TLS setup.

Using distinct file names (redis-workload.key/crt) while reusing the same CA is a clean pattern that prevents confusion. The instructions maintain consistency with the control-plane setup in Section 2.4.

docs/configuration/redis-tls.md (2)

677-697: Solid security best practices section.

The recommendations for strong keys (4096-bit RSA), certificate rotation, and the explicit warning against insecure options in production are valuable. Good callout about using readOnly: true for volume mounts.


1-50: Comprehensive and well-organized Redis TLS documentation.

The documentation provides clear guidance from quick start through production deployment, with thorough troubleshooting. The architecture diagram effectively illustrates the TLS configuration points.

test/e2e/README.md (2)

21-108: Clear and practical E2E test documentation updates.

The multi-terminal workflow is well-explained, and the distinction between local/remote cluster setups helps users understand when reverse tunnel is needed. The Redis TLS section appropriately documents that TLS is mandatory and provides manual reconfiguration steps.


107-108: Good clarification on InsecureSkipVerify usage.

The documentation correctly explains that InsecureSkipVerify: true is used only in test fixtures for cross-environment compatibility, while TLS encryption remains fully enabled. This aligns with the PR description's request for feedback on this approach.

test/e2e/redis_proxy_test.go (3)

187-208: Improved message draining logic with proper retry semantics.

The messagesDrained flag correctly tracks whether any messages were processed, and the drain-all-then-retry pattern is more robust than checking one message at a time. This should reduce flaky test failures.


210-238: Good resilience added to ResourceTree verification.

Wrapping the ResourceTree call in Eventually with proper error handling addresses transient Redis EOF errors that can occur during TLS connection resets. The 30-second timeout with 2-second intervals provides adequate retries.


586-653: Well-configured HTTP transport for SSE streams.

The transport settings are appropriate:

  • Buffered channel (100) prevents message loss during processing
  • Timeout: 0 is correct for long-lived SSE connections
  • IdleConnTimeout: 300s keeps connections alive for extended test runs
  • InsecureSkipVerify: true is documented in the E2E README as test-only behavior
internal/argocd/cluster/cluster.go (3)

18-18: LGTM!

The crypto/tls import is necessary for the new TLS configuration parameter added to NewClusterCacheInstance.


135-142: LGTM!

The initialization of ConnectionState when it doesn't exist provides a sensible default when cluster cache stats are received before an explicit connection status update. This improves agent connection tracking.


176-184: LGTM!

The TLS configuration is properly integrated into the Redis client initialization. The signature change is consistent with the broader TLS enablement across the codebase.

agent/agent.go (3)

19-23: LGTM!

The new imports are necessary for TLS configuration and CA certificate loading from files.


323-343: LGTM!

The TLS configuration logic is well-structured:

  • Properly handles insecure mode with appropriate warning
  • Loads and validates CA certificates from file
  • Provides clear error messages on failure

345-349: LGTM!

The TLS configuration is correctly passed to the cluster cache instance creation, matching the updated signature.

cmd/argocd-agent/principal.go (3)

90-97: LGTM!

The Redis TLS flag variables are well-named and cover all necessary configuration options for both server and upstream TLS.


419-440: LGTM!

The CLI flag definitions are comprehensive and follow consistent naming conventions. Environment variable support is properly integrated with sensible defaults.


471-471: LGTM!

The increased timeout (30s) for TLS configuration retrieval from Kubernetes is reasonable and aligns with the broader TLS enablement changes.

principal/redisproxy/redisproxy.go (3)

21-27: LGTM!

The TLS-related fields are well-structured with clear separation between server (incoming connections) and upstream (outgoing connections) TLS configurations. The comments provide good context.

Also applies to: 65-76


98-154: LGTM!

The TLS configuration methods are well-designed:

  • Clean public API for both server and upstream TLS
  • Proper handling of both file-based and memory-based certificates
  • Appropriate error handling and minimum TLS version

157-183: LGTM!

The Start() method cleanly handles both TLS and non-TLS modes with appropriate logging and error handling.

hack/dev-env/start-agent-managed.sh (4)

37-46: LGTM!

The Redis TLS detection logic is appropriate for a development script, with helpful guidance when certificates are not found.


48-62: LGTM!

The Redis address configuration is well-documented with helpful comments explaining the localhost default and port-forward requirements for TLS validation.


63-74: LGTM!

The mTLS certificate extraction properly retrieves client certificates and CA from Kubernetes secrets. The use of temporary files is appropriate for a development script.


76-90: LGTM!

The agent startup command properly includes all TLS-related arguments (client certificates, Redis TLS, etc.) in a logical order.

hack/dev-env/gen-redis-tls-certs.sh (4)

14-26: LGTM!

The CA generation logic is idempotent and uses strong cryptographic parameters (4096-bit RSA). The conditional generation prevents overwriting existing certificates.


28-58: LGTM!

The certificate generation pattern is well-structured and idempotent. The use of extension files for Subject Alternative Names follows modern OpenSSL practices.


67-90: LGTM!

The local IP detection and conditional SAN addition is well-handled. The script correctly avoids adding an empty IP entry when local IP detection fails, which was a previously identified issue.


138-150: LGTM!

The cleanup of temporary files is appropriate, and the success message provides a helpful summary of all generated certificates.

test/e2e/fixture/fixture.go (4)

109-112: LGTM!

The timeout increases (60s → 120s) are appropriate for TLS-enabled E2E tests, which may experience additional latency from TLS handshakes and port-forwarding in the test environment.

Also applies to: 143-143, 160-160


231-240: LGTM!

The error handling changes improve test cleanup resilience by continuing cleanup even when individual deletions fail. This is appropriate for test teardown where partial cleanup is preferable to complete failure, and warnings ensure issues are still visible.

Also applies to: 256-265, 277-278, 290-291, 312-324, 344-356, 371-373


235-240: LGTM!

The DeepCopy calls properly prevent mutation of loop variables when modifying objects for deletion waiting. This is correct and defensive programming.

Also applies to: 260-265, 316-324, 349-356


457-470: LGTM!

The non-fatal cluster info reset is appropriate for test cleanup scenarios where Redis may be unavailable (e.g., port-forward terminated). Using the cached cache instance is consistent with the broader caching pattern introduced in cluster.go.

test/e2e/fixture/cluster.go (4)

19-22: LGTM!

The new imports and TLS-enabled flags appropriately extend the test fixture to support TLS-enabled Redis configurations.

Also applies to: 44-51


181-224: LGTM!

The TLS configuration with InsecureSkipVerify is appropriately documented for E2E tests. The connection tuning parameters (timeouts, pool sizing, retries) are generous and suitable for handling port-forwarding latency in test environments.


226-267: LGTM!

The Redis client caching mechanism properly prevents connection leaks in E2E tests:

  • Thread-safe with mutex protection
  • Cache key includes address for proper isolation
  • Exported cleanup function for test suite teardown
  • Reasonable reliance on GC for connection cleanup in test code

299-327: LGTM!

The Redis configuration retrieval logic is robust:

  • Comprehensive fallback chain (LoadBalancer → LoadBalancerIP → ClusterIP)
  • TLS always enabled for E2E tests
  • Environment variable overrides support local development workflows
  • Clear error messages document the fallback chain

Also applies to: 360-387

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from dd9cf85 to a349781 Compare December 4, 2025 16:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
cmd/argocd-agent/principal.go (1)

258-288: Harden upstream Redis TLS mode validation to cover all flag combinations

The overall Redis TLS wiring looks good: server-side TLS validates cert/key pairing and falls back to a secret, and upstream TLS correctly selects between insecure mode, CA file, and CA secret. However, the validation currently only rejects --redis-upstream-tls-insecure together with --redis-upstream-ca-path, while combinations involving --redis-upstream-ca-secret-name (e.g., CA path + CA secret, or insecure + CA secret) are still allowed and silently prefer one mode.

To make configuration failures explicit and align with the stricter server TLS validation, consider validating that at most one upstream TLS mode is set (insecure, CA file, or CA secret):

-				// Validate upstream TLS configuration - insecure and CA path are mutually exclusive
-				if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" {
-					cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path")
-				}
+				// Validate upstream TLS configuration - only one mode may be specified
+				modesSet := 0
+				if redisUpstreamTLSInsecure {
+					modesSet++
+				}
+				if redisUpstreamTLSCAPath != "" {
+					modesSet++
+				}
+				if redisUpstreamTLSCASecretName != "" {
+					modesSet++
+				}
+				if modesSet > 1 {
+					cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name")
+				}

This keeps the existing selection logic intact while preventing ambiguous configurations from being silently accepted.

hack/dev-env/configure-argocd-redis-tls.sh (1)

309-312: Fix replica guard logic to enforce minimum of 1 replica.

The current shell operator precedence causes the replica guards to fail. The expression [ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1" is parsed as cond1 || (cond2 && assignment), which means:

  • If REPO_SERVER_REPLICAS="0", the first test succeeds and short-circuits, so the assignment never executes
  • Only empty values get set to "1"

This contradicts the comment "Ensure we have at least 1 replica" and could lead to components being scaled to 0 replicas during E2E tests.

Apply this diff to fix the logic:

-# Ensure we have at least 1 replica
-[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
-[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
-[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"
+# Ensure we have at least 1 replica
+if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then
+  REPO_SERVER_REPLICAS="1"
+fi
+if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then
+  CONTROLLER_REPLICAS="1"
+fi
+if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then
+  SERVER_REPLICAS="1"
+fi
test/run-e2e.sh (2)

33-45: Validate all required certificate files, not just ca.crt.

The script only checks for ca.crt but does not validate that server.crt and server.key exist. If these files are missing, tests will fail downstream with cryptic TLS errors.

Apply this diff:

 # Check if Redis TLS certificates exist
-if [ ! -f "${REDIS_TLS_DIR}/ca.crt" ]; then
+if [ ! -f "${REDIS_TLS_DIR}/ca.crt" ] || [ ! -f "${REDIS_TLS_DIR}/server.crt" ] || [ ! -f "${REDIS_TLS_DIR}/server.key" ]; then
     echo "ERROR: Redis TLS certificates not found!"
     echo ""
     echo "Redis TLS is REQUIRED for E2E tests (security requirement)."
     echo ""
     echo "Please run the following commands:"
     echo "  ./hack/dev-env/gen-redis-tls-certs.sh"

62-66: Replace text grep with proper JSON parsing for TLS validation.

Using grep -q "tls-port" on JSON output is fragile:

  • Text matching can produce false positives if "tls-port" appears in unexpected locations
  • Does not confirm the field is in the correct location within the deployment spec
  • Provides no debugging information when validation fails

Replace with robust JSON parsing:

-        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | grep -q "tls-port"; then
+        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | jq -e '.spec.template.spec.containers[].ports[] | select(.name == "tls-port")' >/dev/null 2>&1; then
             echo "ERROR: Redis Deployment in ${CONTEXT} is not configured with TLS!"
             echo "Please run: ./hack/dev-env/configure-redis-tls.sh ${CONTEXT}"
             exit 1
         fi
🧹 Nitpick comments (3)
principal/listen.go (2)

174-196: Inconsistent log formatting and unrelated changes.

Several issues with the new logging statements:

  1. Emoji in production logs (line 174): The "🔧" emoji may not render correctly in all log aggregation systems and is non-standard for production logging.
  2. Leading whitespace (lines 176, 196): Messages like " WebSocket is ENABLED" and " gRPC server.Serve() exited" have leading spaces, creating inconsistent formatting compared to other log statements.
  3. Disconnect from PR objectives: This PR is focused on enabling Redis TLS encryption by default, but these changes add WebSocket and gRPC server startup logging, which appears unrelated to the stated objectives.

Apply this diff to fix the formatting issues:

-	log().WithField("enableWebSocket", s.enableWebSocket).Info("🔧 Checking if WebSocket is enabled")
+	log().WithField("enableWebSocket", s.enableWebSocket).Info("Checking if WebSocket is enabled")
 	if s.enableWebSocket {
-		log().Info(" WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
+		log().Info("WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
 		opts := []grpchttp1server.Option{grpchttp1server.PreferGRPCWeb(true)}
 
 		downgradingHandler := grpchttp1server.CreateDowngradingHandler(s.grpcServer, http.NotFoundHandler(), opts...)
@@ -193,7 +193,7 @@
 		go func() {
 			log().Info("Starting gRPC server.Serve() - server is now accepting connections")
 			err = s.grpcServer.Serve(s.listener.l)
-			log().WithError(err).Warn(" gRPC server.Serve() exited")
+			log().WithError(err).Warn("gRPC server.Serve() exited")
 			errch <- err
 		}()
 	}

224-230: Service registration logging appears unrelated to PR objectives.

While these logging statements improve observability during service initialization, they appear disconnected from the PR's core objective of enabling Redis TLS encryption by default. Consider whether these observability improvements belong in a separate PR focused on logging enhancements.

test/e2e/fixture/cluster.go (1)

259-267: Consider explicitly closing Redis connections in cleanup.

The current cleanup simply recreates the map and relies on garbage collection to close connections. While this works, explicitly closing the underlying Redis clients would provide more deterministic cleanup.

Consider this enhancement:

 func CleanupRedisCachedClients() {
 	cachedRedisClientMutex.Lock()
 	defer cachedRedisClientMutex.Unlock()
 
 	fmt.Printf("Cleaning up %d cached Redis clients\n", len(cachedRedisClients))
-	// Clear the cache map - connections will be garbage collected
+	// Note: Explicit close is not available on appstatecache.Cache
+	// Connections will be garbage collected when map is cleared
 	cachedRedisClients = make(map[string]*appstatecache.Cache)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd9cf85 and a349781.

📒 Files selected for processing (28)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • hack/dev-env/start-agent-autonomous.sh
🚧 Files skipped from review as they are similar to previous changes (9)
  • principal/resource.go
  • test/e2e/clusterinfo_test.go
  • principal/tracker/tracking.go
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-principal.sh
  • hack/dev-env/gen-redis-tls-certs.sh
  • test/e2e/README.md
  • hack/dev-env/configure-redis-tls.sh
  • test/e2e/sync_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-e2e.sh
  • hack/dev-env/start-agent-managed.sh
  • Makefile
  • test/run-e2e.sh
  • test/e2e/rp_test.go
🧬 Code graph analysis (5)
test/e2e/fixture/fixture.go (2)
internal/backend/interface.go (1)
  • Namespace (124-127)
test/e2e/fixture/cluster.go (2)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
principal/listen.go (1)
internal/logging/logging.go (2)
  • Info (295-297)
  • Warn (300-302)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/redisproxy/redisproxy.go (2)
internal/logging/logfields/logfields.go (3)
  • Config (127-127)
  • Error (101-101)
  • Client (37-37)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/getting-started/kubernetes/index.md

229-229: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build & cache Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (30)
install/helm-repo/argocd-agent-agent/values.schema.json (3)

302-330: Well-structured Redis TLS schema with clear configuration options.

The redisTLS property cleanly encapsulates TLS-related configuration (enable/disable, CA path/secret, insecure mode). Descriptions are informative, and the anyOf pattern for enabled and insecure allows both native boolean values and string representations for broader compatibility.


332-383: NetworkPolicy schema correctly models Kubernetes policy structure.

The networkPolicy property provides appropriate control over network policies for the agent and Redis. The nested structure for selectors (agentSelector, redisSelector) properly supports Kubernetes label matching while allowing flexibility via additionalProperties: true. The scoping and descriptions are clear.


302-330:

The type inconsistency between redisTLS fields (using anyOf for string/boolean) and networkPolicy.enabled (using boolean only) was already flagged in a prior review and marked as addressed. Skipping duplicate comment; developer can investigate commit 6247404 if clarification is needed on the design rationale.

cmd/argocd-agent/principal.go (3)

90-98: Redis TLS option variables are coherent with intended configuration

The newly added Redis TLS variables are clearly named and match the later flag wiring and option usage; no issues here.


419-441: Redis TLS flags and env wiring look consistent

Flag names, env variable keys, defaults (notably the shared argocd-redis-tls secret), and help strings are all consistent with the new Redis TLS behavior; no issues from a CLI/config surface perspective.


471-471: Increasing resource proxy TLS secret fetch timeout to 30s is reasonable

Extending the timeout to 30 seconds for loading proxy TLS material from Kubernetes is a safe change and should better tolerate slow API servers without introducing new correctness risks.

test/e2e/fixture/argoclient.go (1)

316-336: LGTM! Environment variable override improves test flexibility.

The addition of the ARGOCD_SERVER_ADDRESS environment variable check before falling back to Kubernetes API queries is a good optimization for test environments. The fallback logic is preserved correctly, ensuring backward compatibility.

hack/dev-env/start-agent-managed.sh (1)

37-90: LGTM! Redis TLS and mTLS configuration properly integrated.

The script correctly:

  • Detects Redis TLS certificates and provides helpful guidance when missing
  • Sets appropriate defaults for local development with clear documentation
  • Extracts mTLS certificates from Kubernetes secrets
  • Passes all necessary TLS arguments to the agent startup

The explicit comments about port-forward requirements are particularly helpful for developers.

docs/configuration/redis-tls.md (1)

1-700: Excellent comprehensive Redis TLS documentation.

This documentation provides thorough coverage of Redis TLS configuration including:

  • Clear architecture diagrams and TLS configuration points
  • Step-by-step quick start for development/testing
  • Detailed certificate management guidance
  • Complete Kubernetes installation instructions
  • Comprehensive troubleshooting section with common issues and solutions
  • Security best practices

The documentation is well-structured with a table of contents and clear separation between development/testing and production scenarios.

test/e2e/fixture/fixture.go (2)

109-171: LGTM! Extended timeouts improve resilience for TLS-enabled Redis.

The timeout increase from 60 to 120 seconds in EnsureDeletion and WaitForDeletion is appropriate for TLS-enabled Redis connections, which may have slightly higher latency during connection establishment and teardown.


200-462: LGTM! Non-fatal cleanup warnings prevent cascading test failures.

The changes to log warnings instead of returning errors during cleanup are appropriate for handling transient issues like port-forward failures. Key improvements:

  • Uses DeepCopy() to avoid mutating loop variables (lines 235, 260, 317, 350)
  • Logs warnings for cleanup failures instead of failing the entire test
  • Gracefully handles Redis unavailability during cluster info reset (lines 457-461)

This makes the test suite more robust in environments with flaky port-forwards or temporary connectivity issues.

test/e2e/rp_test.go (1)

162-169: LGTM! Refactoring to fixture helpers improves consistency.

The refactoring to use fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret eliminates code duplication and centralizes the logic for retrieving test credentials. This aligns with the environment variable override capability added to the fixture helpers.

Also applies to: 295-305

docs/getting-started/kubernetes/index.md (2)

159-229: LGTM! Clear Redis TLS setup instructions with proper warnings.

The new section 2.4 provides comprehensive Redis TLS setup guidance:

  • Clear warning that Redis TLS is required
  • Step-by-step certificate generation with appropriate SANs
  • Deployment patching commands
  • Verification steps
  • Note about automatic TLS configuration in manifests

The instructions are well-structured and include all necessary details for setting up Redis TLS on the control plane.


337-381: LGTM! Workload cluster Redis TLS setup mirrors control plane.

Section 4.4 appropriately repeats the Redis TLS setup for workload clusters with a clear note to reuse the same CA from Step 2.4. The instructions maintain consistency with the control plane setup while properly scoping the certificate generation to the workload cluster context.

hack/dev-env/start-e2e.sh (1)

50-61: LGTM! Static localhost addresses enable TLS certificate validation.

The use of static localhost addresses with fixed ports is appropriate for E2E tests because:

  • localhost is included in the Redis TLS certificate SANs
  • Port-forwards (managed by goreman) provide stable local endpoints
  • Enables proper TLS certificate validation during tests

The Redis password retrieval correctly separates assignment from export, addressing the previous shellcheck warning.

hack/dev-env/configure-argocd-redis-tls.sh (1)

1-342: Overall script design is solid for E2E Redis TLS configuration.

The script provides comprehensive Redis TLS configuration for Argo CD components:

  • Idempotent volume and volumeMount additions with existence checks
  • Clear error messages and exit codes
  • Appropriate handling of different cluster contexts (control-plane vs agent)
  • Graceful scaling with rollout status waits

The replica guard logic issue aside, the script structure and approach are well-designed for the E2E test environment.

agent/agent.go (2)

323-343: LGTM! TLS configuration properly implemented.

The TLS config construction for the cluster cache correctly handles:

  • Warning log when InsecureSkipVerify is enabled (matching principal code)
  • CA certificate loading with clear error messages
  • Proper certificate pool validation

445-460: LGTM! Immediate startup update improves observability.

Sending cluster cache info immediately on startup (before the first ticker interval) ensures the principal receives agent state promptly, improving observability and reducing the delay in initial metrics.

internal/argocd/cluster/cluster.go (2)

175-191: LGTM! TLS integration properly implemented.

The signature change to NewClusterCacheInstance and TLS configuration wiring are correct. The TLSConfig is properly passed through to the Redis client options.


135-142: LGTM! Defensive initialization of ConnectionState.

Initializing ConnectionState when absent ensures cluster info is properly set even when the agent sends cache stats before connection status is explicitly set, preventing nil-reference issues.

test/e2e/fixture/cluster.go (2)

181-201: LGTM! InsecureSkipVerify acceptable for E2E tests.

Using InsecureSkipVerify: true in E2E tests is appropriate given the dynamic LoadBalancer addresses in test environments. The PR objectives explicitly mention this trade-off to retain TLS encryption while accommodating test infrastructure limitations.

Based on learnings, test fixtures under test/ directories do not require production-grade security hardening.


298-327: LGTM! Comprehensive address resolution with TLS enforcement.

The multi-level fallback approach (LoadBalancer → spec.loadBalancerIP → ClusterIP) handles various deployment scenarios well. TLS enforcement and environment variable overrides for local development are appropriate for E2E tests.

Also applies to: 359-387

principal/redisproxy/redisproxy.go (3)

98-128: LGTM! Clean and composable TLS configuration API.

The public setter methods provide a clear and flexible API for configuring TLS. Separating in-memory and file-based certificate configuration is appropriate, and validation is deferred to createServerTLSConfig where it's needed.


130-154: LGTM! Robust TLS configuration with proper error handling.

The createServerTLSConfig method correctly handles both file-based and in-memory certificates, with clear error messages and appropriate TLS version constraints (min TLS 1.2).


852-894: LGTM! TLS handshake and upstream connection properly implemented.

The upstream TLS implementation correctly:

  • Handles InsecureSkipVerify with warning log
  • Supports CA certificate pool from memory or file
  • Configures SNI based on hostname
  • Performs explicit handshake with error handling

Note: A past review suggested warning when server TLS is enabled but upstream TLS is not configured (to avoid unencrypted connections within the cluster). This remains a potential enhancement but is not blocking.

test/run-e2e.sh (1)

81-115: LGTM! macOS development support with helpful warnings.

The port-forward detection and environment variable setup provide a good developer experience for local testing. The non-blocking warning allows CI environments (with MetalLB) to proceed normally.

test/e2e/redis_proxy_test.go (4)

120-123: LGTM! Wait period reduces race condition with subscription activation.

The 5-second wait after SSE stream establishment allows Redis SUBSCRIBE commands to fully propagate before the pod is deleted, reducing race conditions in the test.

Also applies to: 326-329


188-208: LGTM! Message draining improves test reliability.

The enhanced message-handling logic drains all available messages before retrying, preventing false negatives when messages arrive in bursts. The extended 120-second timeout accommodates realistic network latency.

Also applies to: 407-427


211-237: LGTM! Retry logic handles transient Redis connection issues.

Wrapping ResourceTree calls in Eventually blocks with logging handles transient EOF errors and Redis connection issues gracefully, improving test stability in distributed environments.

Also applies to: 430-456


642-653: LGTM! HTTP client configuration optimized for SSE streams.

The HTTP transport settings (no overall timeout, extended idle timeout, connection pooling) are appropriate for long-lived SSE streams. InsecureSkipVerify: true is acceptable for E2E tests.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from a349781 to 6b246bf Compare December 5, 2025 12:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
hack/dev-env/configure-argocd-redis-tls.sh (1)

301-311: Replica guard still fails to enforce minimum of 1 replica

The guard:

# Ensure we have at least 1 replica
[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"

is parsed as cond1 || (cond2 && assign), so if a value is "0" the first test succeeds and the assignment never runs. That means components can be scaled back to 0 replicas, contradicting the comment and breaking the “scale back up after TLS config” intent.

Use explicit if blocks so both "0" and empty values are normalized to "1":

-# Ensure we have at least 1 replica
-[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
-[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
-[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"
+# Ensure we have at least 1 replica
+if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then
+  REPO_SERVER_REPLICAS="1"
+fi
+if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then
+  CONTROLLER_REPLICAS="1"
+fi
+if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then
+  SERVER_REPLICAS="1"
+fi

This matches the comment and guarantees components are brought back to at least one replica for TLS-enabled testing.

cmd/argocd-agent/principal.go (1)

90-98: Upstream Redis TLS modes still aren’t fully mutually exclusive

Within the redisTLSEnabled block you only guard against --redis-upstream-tls-insecure being combined with --redis-upstream-ca-path:

if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" {
    cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path")
}

But there are three upstream TLS modes:

  • --redis-upstream-tls-insecure
  • --redis-upstream-ca-path
  • --redis-upstream-ca-secret-name

Users can still specify conflicting combinations such as insecure+secret or ca-path+secret; the current if/else if/else chain will silently prefer one, dropping the others.

To make configuration predictable (and consistent with the strict pairing you already enforce for server cert/key), validate that at most one mode is set before applying the options, e.g.:

if redisTLSEnabled {
    // Validate upstream TLS configuration - only one mode allowed
    modesSet := 0
    if redisUpstreamTLSInsecure {
        modesSet++
    }
    if redisUpstreamTLSCAPath != "" {
        modesSet++
    }
    if redisUpstreamTLSCASecretName != "" {
        modesSet++
    }
    if modesSet > 1 {
        cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name")
    }

    // existing server TLS + upstream TLS selection logic...
}

This prevents ambiguous configurations and aligns the upstream TLS behavior with the rest of the principal’s TLS validation.

Also applies to: 258-288, 419-441

hack/dev-env/start-e2e.sh (1)

50-59: Fail fast if Redis password lookup fails.

If the kubectl get secret or base64 --decode step fails, REDIS_PASSWORD will be empty but the script will still start the E2E stack, leading to confusing Redis auth failures later.

Recommend checking the command result and the value before exporting:

-REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode)
-export REDIS_PASSWORD
+REDIS_PASSWORD=$(kubectl get secret argocd-redis \
+  --context=vcluster-agent-managed \
+  -n argocd \
+  -o jsonpath='{.data.auth}' | base64 --decode)
+if [ -z "${REDIS_PASSWORD}" ]; then
+  echo "Error: Failed to retrieve Redis password from argocd-redis secret in vcluster-agent-managed/argocd" >&2
+  exit 1
+fi
+export REDIS_PASSWORD

This makes Redis auth problems surface immediately when starting E2E.

hack/dev-env/configure-redis-tls.sh (1)

198-215: Fix literal "$(REDIS_PASSWORD)" in Redis args patch (TLS + auth will break).

Inside the JSON patch, "$(REDIS_PASSWORD)" is single-quoted, so the shell never expands the environment variable. Redis will literally be configured with the password $(REDIS_PASSWORD), which won’t match the secret and will break all authenticated connections.

You should expand REDIS_PASSWORD before or during patch construction and (optionally) warn if it’s unset. For example:

-# Update Redis args for TLS
-kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
+# Update Redis args for TLS
+REDIS_PASSWORD="${REDIS_PASSWORD:-}"
+if [ -z "${REDIS_PASSWORD}" ]; then
+    echo "Warning: REDIS_PASSWORD not set; Redis will be configured without a usable password value"
+fi
+
+kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
   {
     "op": "replace",
     "path": "/spec/template/spec/containers/0/args",
     "value": [
       "--save", "",
       "--appendonly", "no",
-      "--requirepass", "$(REDIS_PASSWORD)",
+      "--requirepass", "'"${REDIS_PASSWORD}"'",
       "--tls-port", "6379",
       "--port", "0",
       "--tls-cert-file", "/app/tls/tls.crt",
       "--tls-key-file", "/app/tls/tls.key",
       "--tls-ca-cert-file", "/app/tls/ca.crt",
       "--tls-auth-clients", "no"
     ]
   }
 ]'

You may also want to hard‑fail if REDIS_PASSWORD is empty to avoid silently misconfiguring Redis in dev/e2e.

principal/redisproxy/redisproxy.go (1)

836-897: Warn (or fail) when proxy TLS is enabled but upstream TLS is not, to avoid silent plaintext hops.

In establishConnectionToPrincipalRedis, upstream TLS is only used when:

if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
    // wrap with TLS
}

If tlsEnabled is true but no upstream TLS config is provided, the proxy will:

  • Terminate TLS from Argo CD on the proxy, but
  • Connect to principal Redis over plain TCP,

creating a surprising “TLS‑terminated at proxy only” hop that contradicts the PR goal of “Redis TLS encryption enabled by default for all connections”.

Consider making this mismatch explicit, e.g.:

-    // If TLS is enabled for upstream, wrap the connection with TLS
-    if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+    hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure
+
+    if rp.tlsEnabled && !hasUpstreamTLSConfig {
+        logCtx.Warn("Redis proxy TLS is enabled, but no upstream TLS configuration provided; connection to principal Redis will be unencrypted")
+    }
+
+    // If TLS is enabled for upstream, wrap the connection with TLS
+    if rp.tlsEnabled && hasUpstreamTLSConfig {
         tlsConfig := &tls.Config{
             MinVersion: tls.VersionTLS12,
         }
         // ... existing CA / InsecureSkipVerify logic ...

Optionally, you might also allow hasUpstreamTLSConfig to trigger TLS even when tlsEnabled is false, if you anticipate scenarios where only the proxy→Redis hop should be encrypted.

🧹 Nitpick comments (4)
hack/dev-env/start-agent-managed.sh (1)

63-74: Consider cleanup for temporary credential files.

The mTLS certificates are extracted to /tmp files but never cleaned up. While acceptable for development, consider adding a trap to remove these files on script exit:

 TLS_CERT_PATH="/tmp/agent-managed-tls.crt"
 TLS_KEY_PATH="/tmp/agent-managed-tls.key"
 ROOT_CA_PATH="/tmp/agent-managed-ca.crt"
+
+# Cleanup temp files on exit
+trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXIT
+
 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \

This prevents credential accumulation in /tmp and follows security best practices.

test/e2e/fixture/fixture.go (1)

109-171: Cleanup robustness improvements are reasonable; consider minor hardening

The extended 120s deletion waits and the shift to warning-only errors in CleanUp plus use of DeepCopy() for principal/managed Application and AppProject waits all improve e2e stability without changing production behavior. One small follow-up you might consider (optional) is:

  • Guarding resetManagedAgentClusterInfo against a nil clusterDetails to make it safer if CleanUp is ever reused outside BaseSuite.SetupSuite.
  • If deletion timing keeps growing, factoring the “spin for up to N seconds with 1s sleep” pattern into a helper that can use context deadlines instead of manual counters.

These are non-blocking and the current changes are fine for e2e usage.

Also applies to: 218-291, 295-375, 457-471

hack/dev-env/start-agent-autonomous.sh (1)

37-47: Redis TLS and mTLS wiring in dev script looks correct; consider ephemeral key files

The script correctly:

  • Detects the Redis TLS CA and enables --redis-tls-enabled/--redis-tls-ca-path.
  • Defaults --redis-addr to a localhost port-forward.
  • Extracts agent client cert/key/CA and passes them via --tls-client-cert/--tls-client-key/--root-ca-path.

For local dev this is fine. As an optional hardening, you could write the TLS material to mktemp paths and trap a cleanup (rm) on exit to avoid leaving private keys in /tmp across runs.

Also applies to: 48-62, 63-75, 79-83

test/e2e/redis_proxy_test.go (1)

120-137: SSE stream and Redis proxy e2e reliability improvements look solid

The added 5s post-connect delay, buffered msgChan with “drain all messages” semantics, and the Eventually-wrapped ResourceTree calls with logging should significantly reduce flakes from subscription races and transient EOFs. The SSE client transport is correctly tuned for long-lived streams (no global timeout, longer idle) and using InsecureSkipVerify is acceptable here given these are TLS-only e2e tests, not production code.

Also applies to: 184-237, 326-337, 402-457, 584-670

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a349781 and 6b246bf.

📒 Files selected for processing (28)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • principal/resource.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • principal/listen.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • test/run-e2e.sh
  • internal/argocd/cluster/cluster.go
  • test/e2e/clusterinfo_test.go
  • principal/tracker/tracking.go
  • Makefile
  • test/e2e/README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/rp_test.go
  • hack/dev-env/start-agent-managed.sh
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-e2e.sh
🧬 Code graph analysis (4)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/redisproxy/redisproxy.go (2)
principal/listen.go (1)
  • Listener (54-60)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md

229-229: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build & cache Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (22)
test/e2e/fixture/argoclient.go (1)

316-336: LGTM!

The environment variable check is a clean optimization that avoids unnecessary Kubernetes API calls when the server address is explicitly provided. The fallback logic is preserved, maintaining backward compatibility.

hack/dev-env/start-agent-managed.sh (3)

37-46: LGTM!

The Redis TLS detection logic is clean and provides helpful user guidance. Checking for the CA certificate presence is the right approach to determine whether TLS should be enabled.


48-61: LGTM!

The Redis address handling with sensible defaults and clear port-forward instructions is well-designed for local development workflows.


76-90: LGTM!

The startup command properly integrates the Redis TLS arguments with existing mTLS configuration. The variable expansion and flag ordering are correct.

hack/dev-env/start-principal.sh (2)

23-29: LGTM!

The Redis address defaulting is correctly implemented. As noted in the past review, this script no longer starts its own port-forward, avoiding conflicts with Procfile.e2e while providing a sensible default for TLS-friendly connections.


42-62: LGTM!

The Redis TLS configuration properly checks for all required certificate files and constructs the appropriate arguments. The inline comments about SANs (localhost, rathole-container-internal, local IP) are helpful for understanding the certificate requirements.

agent/agent.go (2)

323-343: LGTM!

The TLS configuration for cluster cache Redis is well-implemented:

  • Proper TLS 1.2 minimum version
  • Warning log for insecure mode (addresses past review feedback)
  • Clean CA certificate loading with descriptive error messages
  • Appropriate error handling

443-460: LGTM!

The updated cluster cache info logic is an improvement:

  • Immediate update on startup provides faster feedback
  • Consistent behavior for both agent modes
  • Proper cleanup with ticker.Stop()
test/e2e/fixture/cluster.go (5)

180-201: LGTM!

Using InsecureSkipVerify: true for E2E tests is acceptable to accommodate dynamic LoadBalancer addresses (as noted in the PR objectives). The TLS encryption is retained, which still provides value for testing the TLS code paths.


206-218: LGTM!

The generous timeout and connection pool settings are appropriate for E2E test environments, especially considering the port-forward latency mentioned in the comments. The retry configuration with exponential backoff is sensible.


298-327: LGTM!

The Redis address resolution with multiple fallbacks (LoadBalancer ingress → LoadBalancerIP → ClusterIP) is robust and handles various cluster configurations. The environment variable override for local development is a good addition. Setting ManagedAgentRedisTLSEnabled = true aligns with the PR objective of Redis TLS being required for E2E tests.


359-387: LGTM!

The principal Redis configuration mirrors the managed agent approach with the same robust fallback chain. Consistent behavior across both configurations is good for maintainability.


226-267: Verify cleanup function is invoked at test suite end.

The Redis client caching infrastructure prevents connection leaks. Confirm that CleanupRedisCachedClients() is called in your test suite's teardown or cleanup phase to ensure cached connections are properly released.

test/e2e/sync_test.go (1)

369-378: LGTM!

The pre-sync hook Job name correction from "pre-post-sync-before" to "before" aligns the test expectations with the actual test fixture. This is a straightforward test maintenance update.

Also applies to: 464-473

docs/configuration/redis-tls.md (4)

1-68: LGTM!

The overview and architecture sections provide clear explanations of the three TLS configuration points (Redis Proxy Server TLS, Upstream Redis TLS, Agent Redis TLS) and how they fit together. The text-based architecture diagram effectively illustrates the flow.


70-247: LGTM!

The quick start and local development sections are comprehensive and practical:

  • Clear statement that Redis TLS is required for E2E tests
  • Well-structured explanations of local vcluster vs. remote vcluster setups
  • Reverse tunnel documentation addresses a real need
  • Manual testing steps align with the provided scripts

306-368: LGTM!

The configuration tables provide excellent reference documentation with all flags, environment variables, and defaults clearly documented. The distinction between Kubernetes/Helm defaults (true) and CLI defaults (false) is important and well-noted.


468-700: LGTM!

The troubleshooting section addresses real-world issues comprehensively:

  • Script output examples help users understand what to expect
  • Common problems (connection refused, certificate verification, port-forward instability) are well-covered
  • Security best practices provide appropriate guidance
  • Solutions are actionable and clear
hack/dev-env/Procfile.e2e (1)

1-7: Verify environment variables are set.

The Procfile uses environment variables MANAGED_AGENT_REDIS_ADDR and AUTONOMOUS_AGENT_REDIS_ADDR on lines 6-7 for agent startup. Ensure these are exported by the parent script (e.g., start-e2e or Makefile) before invoking this Procfile.

test/e2e/rp_test.go (1)

295-305: Fixture-based Argo endpoint and admin secret lookup looks good

Switching to fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret keeps this suite consistent with other e2e tests and centralizes TLS/endpoint logic; the updated NewArgoClient usage is sound and maintains the existing behavior.

hack/dev-env/gen-redis-tls-certs.sh (1)

14-27: Redis TLS cert generation script is sound and addresses earlier concerns

The script now:

  • Lets openssl errors surface (no 2>/dev/null) under set -e.
  • Generates appropriate SANs for control-plane, proxy, and agent Redis instances.
  • Only appends IP.3 = ${LOCAL_IP} when LOCAL_IP is non-empty, avoiding invalid SANs.

This looks correct and is appropriately idempotent for dev/e2e usage.

Also applies to: 60-91, 105-136

docs/getting-started/kubernetes/index.md (1)

159-230: Redis TLS documentation is consistent with the tooling and default behavior

The new sections:

  • Clearly mark Redis TLS as required and walk through CA/server cert generation, argocd-redis-tls secret creation, and TLS enablement on both control-plane and workload clusters.
  • Reuse the same CA for workload Redis and mirror the deployment patches used by the dev/e2e scripts.
  • Cross-link to the dedicated Redis TLS configuration doc.

This aligns with the CLI flags and helper scripts added in this PR and provides a practical, copy-pastable path for users.

Also applies to: 337-381, 646-646

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 2 times, most recently from c546df9 to 65cd0ff Compare December 8, 2025 13:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
test/e2e/fixture/cluster.go (1)

226-267: Cached Redis clients: lock scope and explicit cleanup; note prior concern about closing

The cached client map and mutex give you reuse across calls, but there are a couple of details to consider:

  • getCachedCacheInstance currently holds cachedRedisClientMutex while constructing a new client via getCacheInstance. It’s cheap today, but you could narrow the critical section by computing the cache key and constructing the client outside the lock, only locking around the map access/update.
  • CleanupRedisCachedClients only resets the map; it doesn’t close underlying connections, so the comment “stores Redis clients to prevent connection leaks” is a bit misleading. If appstatecache.Cache ever exposes a Close/Shutdown method, or if you can track the underlying *redis.Client alongside the cache in a small wrapper struct, it would be preferable to call Close() here before dropping references. If that’s not feasible, consider updating the comment to clarify that the fixture relies on process teardown/GC for actual connection cleanup.

This restates an earlier review note about explicit connection closing in CleanupRedisCachedClients.

🧹 Nitpick comments (15)
hack/dev-env/setup-vcluster-env.sh (1)

159-190: Well-structured environment-driven Redis endpoint configuration with clear scenario documentation.

The branching logic correctly handles the three deployment scenarios (in-cluster principal, CI, and local development) and substitutes the appropriate Redis endpoint for each. Comments (lines 159–170) effectively document the control flow and explain why different strategies are needed across environments.

Minor suggestion: Add error handling for IP address discovery in local development scenario.

On lines 182–186, if ipconfig getifaddr en0 or the ip r show default parsing fails, ARGO_AGENT_IPADDR will be empty, causing the sed replacement on line 189 to produce invalid configuration. While this is a dev environment and errors would surface quickly, adding a check would prevent silent misconfiguration:

if [[ "$OSTYPE" == "darwin"* ]]; then
    ARGO_AGENT_IPADDR=$(ipconfig getifaddr en0 2>/dev/null) || true
else
    ARGO_AGENT_IPADDR=$(ip r show default 2>/dev/null | sed -e 's,.*\ src\ ,,' | sed -e 's,\ metric.*$,,' | head -n 1) || true
fi

if [[ -z "$ARGO_AGENT_IPADDR" ]]; then
    echo "WARNING: Failed to resolve local IP address for Redis proxy; using fallback" >&2
    ARGO_AGENT_IPADDR="localhost"  # or handle appropriately for your use case
fi
test/e2e/fixture/argoclient.go (1)

316-334: Consider also checking for Ingress IP.

The current logic captures LoadBalancerIP first, then overrides with Hostname if present. However, if LoadBalancerIP is empty and the Ingress has an IP (not hostname), that IP won't be used.

 	argoEndpoint := srvService.Spec.LoadBalancerIP
 	if len(srvService.Status.LoadBalancer.Ingress) > 0 {
-		if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" {
-			argoEndpoint = hostname
+		ingress := srvService.Status.LoadBalancer.Ingress[0]
+		if ingress.Hostname != "" {
+			argoEndpoint = ingress.Hostname
+		} else if ingress.IP != "" {
+			argoEndpoint = ingress.IP
 		}
 	}
agent/agent.go (1)

323-343: Missing CA configuration when TLS is enabled but no explicit CA is provided.

When redisTLSEnabled is true but redisTLSInsecure is false and redisTLSCAPath is empty, the TLS config is created with only MinVersion set (lines 326-328) and no RootCAs. This means the system CA pool will be used by default, which may or may not be the intended behavior.

Consider adding a log message to clarify this fallback, or explicitly setting RootCAs to the system pool for clarity:

 	if a.redisProxyMsgHandler.redisTLSEnabled {
 		clusterCacheTLSConfig = &tls.Config{
 			MinVersion: tls.VersionTLS12,
 		}
 		if a.redisProxyMsgHandler.redisTLSInsecure {
 			log().Warn("INSECURE: Not verifying Redis TLS certificate for cluster cache")
 			clusterCacheTLSConfig.InsecureSkipVerify = true
 		} else if a.redisProxyMsgHandler.redisTLSCAPath != "" {
 			caCertPEM, err := os.ReadFile(a.redisProxyMsgHandler.redisTLSCAPath)
 			if err != nil {
 				return nil, fmt.Errorf("failed to read CA certificate for cluster cache: %w", err)
 			}
 			certPool := x509.NewCertPool()
 			if !certPool.AppendCertsFromPEM(caCertPEM) {
 				return nil, fmt.Errorf("failed to parse CA certificate for cluster cache from %s", a.redisProxyMsgHandler.redisTLSCAPath)
 			}
 			clusterCacheTLSConfig.RootCAs = certPool
+		} else {
+			log().Debug("Using system CA pool for Redis TLS verification")
 		}
 	}
hack/dev-env/start-e2e.sh (1)

96-102: Cleanup function doesn't restore kubectl context.

Other scripts in the repo (e.g., setup-vcluster-env.sh, configure-redis-tls.sh) save and restore the initial kubectl context in their cleanup functions. This script's cleanup only stops goreman but doesn't restore context, which could leave the environment in an unexpected state if the script switches contexts during execution.

Consider adding context restoration for consistency:

+initial_context=$(kubectl config current-context)
+
 # Function to cleanup on exit
 cleanup() {
     echo "Stopping goreman..."
     kill $GOREMAN_PID 2>/dev/null || true
     wait $GOREMAN_PID 2>/dev/null || true
+    kubectl config use-context ${initial_context} 2>/dev/null || true
 }
principal/redisproxy/redisproxy.go (1)

852-894: Consider adding a warning when server TLS is enabled but upstream TLS is not configured.

When rp.tlsEnabled is true but none of the upstream TLS options are set (line 853 condition is false), the connection to principal Redis will be unencrypted. This creates an asymmetric security posture where incoming connections are encrypted but outgoing connections are not.

While this may be intentional for some deployments, a warning would help operators understand the configuration:

 // If TLS is enabled for upstream, wrap the connection with TLS
-if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure
+
+if rp.tlsEnabled && !hasUpstreamTLSConfig {
+    logCtx.Warn("Redis proxy server has TLS enabled, but no upstream TLS configuration provided. Connection to principal Redis will be unencrypted.")
+}
+
+if rp.tlsEnabled && hasUpstreamTLSConfig {
     tlsConfig := &tls.Config{

This helps operators identify potential security misconfigurations during deployment.

hack/dev-env/start-agent-managed.sh (1)

63-74: Consider adding cleanup of temporary TLS files on script exit.

The script extracts mTLS certificates to temp files (/tmp/agent-managed-tls.{crt,key,ca}), but does not clean them up. If this script is run repeatedly during development, temp files could accumulate. Consider adding a trap to clean up on exit:

trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXIT

Or verify that cleanup happens elsewhere (e.g., in goreman or make targets).

docs/getting-started/kubernetes/index.md (1)

226-229: Fix markdown code-block style issue (indented vs fenced).

Static analysis flags line 229 as using indented code block syntax when fenced syntax should be used. The block starting at line 226 should use fenced backticks (```) instead of indentation to maintain consistency and linting compliance.

principal/listen.go (2)

174-199: Tighten logging and error handling in WebSocket vs native gRPC branches

The added logs are helpful, but a few small tweaks would improve clarity and reduce surprises:

  • Line 174: Consider making this a debug log and dropping the emoji so logs stay machine‑friendly and less noisy in production.
  • Line 176: There’s a leading space in " WebSocket is ENABLED..." which will look odd in log output.
  • Lines 185–189 and 193–198: Using the outer err variable inside goroutines is safe here but non‑idiomatic; it’s clearer to use a local err inside each goroutine and keep the outer error for serveGRPC itself.
  • Line 196: log().WithError(err).Warn(" gRPC server.Serve() exited") will log error=<nil> on a graceful shutdown. It’s usually better to log at Warn only when err != nil, and otherwise log a simple Info for a clean exit. The WebSocket ServeTLS path currently doesn’t log on exit at all, only forwards the error to errch, so you might also want a symmetric exit log there.

For example, you could refactor the goroutines like this to address all of the above while keeping the errch behavior the same:

-	log().WithField("enableWebSocket", s.enableWebSocket).Info("🔧 Checking if WebSocket is enabled")
+	log().WithField("enableWebSocket", s.enableWebSocket).Debug("Checking if WebSocket is enabled")

 	if s.enableWebSocket {
-		log().Info(" WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
+		log().Info("WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
@@
-		go func() {
-			log().Info("Starting WebSocket downgrading server")
-			err = downgradingServer.ServeTLS(s.listener.l, s.options.tlsCertPath, s.options.tlsKeyPath)
-			errch <- err
-		}()
+		go func() {
+			log().Info("Starting WebSocket downgrading server")
+			err := downgradingServer.ServeTLS(s.listener.l, s.options.tlsCertPath, s.options.tlsKeyPath)
+			if err != nil {
+				log().WithError(err).Warn("WebSocket downgrading server exited with error")
+			} else {
+				log().Info("WebSocket downgrading server exited gracefully")
+			}
+			errch <- err
+		}()
 	} else {
@@
-		go func() {
-			log().Info("Starting gRPC server.Serve() - server is now accepting connections")
-			err = s.grpcServer.Serve(s.listener.l)
-			log().WithError(err).Warn(" gRPC server.Serve() exited")
-			errch <- err
-		}()
+		go func() {
+			log().Info("Starting gRPC server.Serve() - server is now accepting connections")
+			err := s.grpcServer.Serve(s.listener.l)
+			if err != nil {
+				log().WithError(err).Warn("gRPC server.Serve() exited with error")
+			} else {
+				log().Info("gRPC server.Serve() exited gracefully")
+			}
+			errch <- err
+		}()
 	}

224-231: Service registration logs look good; consider making them more structured

The new Info logs around service registration are useful and low‑overhead at startup. If you want to make them a bit more compact and query‑friendly, you could use a structured service field instead of separate messages:

-	log().Info("Registering gRPC services on principal")
-	authapi.RegisterAuthenticationServer(s.grpcServer, authSrv)
-	log().Info("Authentication service registered successfully")
-	versionapi.RegisterVersionServer(s.grpcServer, version.NewServer(s.authenticate))
-	log().Info("Version service registered successfully")
-	eventstreamapi.RegisterEventStreamServer(s.grpcServer, eventstream.NewServer(s.queues, s.eventWriters, metrics, s.clusterMgr, eventstream.WithNotifyOnConnect(s.notifyOnConnect)))
-	log().Info("EventStream service registered successfully")
+	log().Info("Registering gRPC services on principal")
+	authapi.RegisterAuthenticationServer(s.grpcServer, authSrv)
+	log().WithField("service", "Authentication").Info("gRPC service registered")
+	versionapi.RegisterVersionServer(s.grpcServer, version.NewServer(s.authenticate))
+	log().WithField("service", "Version").Info("gRPC service registered")
+	eventstreamapi.RegisterEventStreamServer(s.grpcServer, eventstream.NewServer(s.queues, s.eventWriters, metrics, s.clusterMgr, eventstream.WithNotifyOnConnect(s.notifyOnConnect)))
+	log().WithField("service", "EventStream").Info("gRPC service registered")

Not strictly necessary, but it can make log search/aggregation simpler if you start adding more services over time.

test/e2e/fixture/cluster.go (2)

170-218: TLS config and Redis client tuning are reasonable for E2E, scoped by flags

Wiring redis.Options.TLSConfig with MinVersion: tls.VersionTLS12 and InsecureSkipVerify: true behind the *RedisTLSEnabled booleans matches the PR intent for E2E: you get encrypted transport without needing stable LB hostnames. The extended timeouts, pool sizing, and retry/backoff settings are also sane defaults for noisy CI environments. If in the future you want to test full certificate validation, you could optionally add a test‑only env toggle to switch InsecureSkipVerify off and set RootCAs/ServerName, but the current behavior is fine for this fixture.


288-327: Redis address discovery and TLS flags are robust; consider optional TLS override for local dev

The updated getManagedAgentRedisConfig / getPrincipalRedisConfig logic to try LoadBalancer ingress, then spec.LoadBalancerIP, then ClusterIP provides a much more resilient way to find Redis, and the error messages are clear. Setting ManagedAgentRedisTLSEnabled / PrincipalRedisTLSEnabled to true by default, with env vars (MANAGED_AGENT_REDIS_ADDR, ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS) only overriding the address, aligns with the “TLS‑everywhere for tests” goal.

If you later need to support non‑TLS Redis in ad‑hoc local setups, you might add a parallel env knob (e.g. *_REDIS_TLS_DISABLED or a scheme‑based address) to flip the *RedisTLSEnabled flags, but that’s not required for the current E2E path.

Also applies to: 359-387

test/e2e/redis_proxy_test.go (4)

120-124: SSE readiness sleep works but is still inherently racy

The extra 5-second sleep will reduce the race between SSE connection and Redis SUBSCRIBE, but it’s still a fixed guess and may be too short/long on some clusters. If the server reliably emits at least one initial SSE event after subscription, an optional follow-up would be to gate on “first message observed on msgChan” via require.Eventually instead of a hard Sleep, so the test waits just long enough and becomes deterministic.

Also applies to: 326-330


188-208: Channel-drain loops look correct; could be factored into a shared helper

The “drain all available SSE messages” loops correctly avoid blocking (thanks to the default branch) and ensure each Eventually tick processes the full backlog before deciding whether to retry. However, the logic is duplicated across the two tests; consider extracting a small helper like drainSSEUntilPodSeen(t *testing.T, msgChan <-chan string, podName string) bool to centralize this behavior and logging, which will simplify future changes to the drain semantics.

Also applies to: 407-427


211-237: ResourceTree retry logic is solid; duplication could be reduced

Wrapping appClient.ResourceTree in requires.Eventually with explicit logging on transient errors / nil trees is a good way to handle EOFs and Redis hiccups. The almost-identical blocks in the managed vs. autonomous tests (only differing in which Application is referenced) could be pulled into a small helper (e.g., waitForPodInResourceTree) to reduce repetition and keep the Redis/Argo retry policy in one place.

Also applies to: 430-456


588-588: Buffered SSE channel and HTTP/TLS settings are appropriate for tests; consider clarifying test-only TLS behavior

Using a buffered msgChan (size 100) plus a Timeout: 0 client and ResponseHeaderTimeout: 0 is consistent with long-lived SSE streams and avoids reader backpressure in these tests. The explicit *tls.Config{InsecureSkipVerify: true} is acceptable here because this helper lives under test/e2e and you still exercise TLS on the wire, but it would be worth adding a short comment stating that this is intentionally insecure and test-only (due to self-signed certs and dynamic LoadBalancer endpoints) to discourage copy-paste into production paths. If you expect a high reconnection rate, you might also consider hoisting the http.Transport/http.Client construction outside the for loop to avoid reallocating them on every retry, though this is non-critical in test code.

Please double-check against your current Go version’s net/http documentation that Timeout: 0 plus context cancellation behaves as expected for SSE (i.e., no hidden default deadline). For example, verify locally that a hung SSE server causes the request to terminate when the context is canceled, not earlier due to client-side timeouts.

Also applies to: 642-649, 650-653, 661-663

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8812d2 and c546df9.

📒 Files selected for processing (31)
  • .github/workflows/ci.yaml (1 hunks)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/setup-vcluster-env.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • test/e2e/sync_test.go
  • principal/resource.go
  • test/run-e2e.sh
  • hack/dev-env/configure-redis-tls.sh
  • hack/dev-env/start-agent-autonomous.sh
  • test/e2e/clusterinfo_test.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • cmd/argocd-agent/agent.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • Makefile
  • hack/dev-env/Procfile.e2e
  • .github/workflows/ci.yaml
  • hack/dev-env/start-e2e.sh
  • test/e2e/README.md
  • test/e2e/rp_test.go
🧬 Code graph analysis (6)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-271)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
hack/dev-env/start-e2e.sh (2)
hack/dev-env/configure-redis-tls.sh (1)
  • cleanup (50-52)
hack/dev-env/setup-vcluster-env.sh (1)
  • cleanup (39-41)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md

229-229: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (42)
principal/tracker/tracking.go (1)

75-78: LGTM! Valid concurrency fix.

Buffering the channel with size 1 is the correct solution for this single request/response pattern. It prevents the sender from blocking when the receiver goroutine hasn't started yet, eliminating the potential deadlock. The comments clearly explain the rationale.

test/e2e/fixture/argoclient.go (1)

27-27: LGTM!

Adding the os import to support environment variable reading is appropriate for the new functionality.

hack/dev-env/Procfile.e2e (1)

1-7: LGTM!

The port-forward setup correctly exposes Redis services from each vcluster to distinct localhost ports, and the sleep delays ensure they're ready before starting the principal/agents. Using environment variables for Redis addresses aligns with the TLS configuration approach.

internal/argocd/cluster/cluster.go (2)

135-142: LGTM!

Good defensive addition. Initializing ConnectionState when it doesn't exist ensures newly connected agents are properly reflected in the cluster info, avoiding a scenario where cache stats are set but connection status appears unknown.


176-191: LGTM!

The TLS configuration is correctly wired into the Redis client options. Passing nil for tlsConfig maintains backward compatibility with non-TLS connections.

test/e2e/fixture/fixture.go (6)

109-112: LGTM!

Increasing the deletion timeout from the previous value to 120 iterations (2 minutes) provides more headroom for finalizer processing and TLS connection establishment overhead.


229-240: Good use of DeepCopy to avoid mutating loop variables.

Creating a deep copy before modifying the namespace prevents unintended side effects on the original list item. The warning-based error handling is appropriate for cleanup operations that shouldn't fail the entire test.


254-265: Consistent pattern applied here as well.

The same deep copy and warning-based cleanup approach maintains consistency across the cleanup logic.


315-324: Deep copy for AppProject cleanup.

Correctly creates a copy before modifying name and namespace, avoiding mutation of the loop variable.


457-461: Non-fatal Redis cleanup is appropriate.

Logging a warning instead of failing when Redis is unavailable (e.g., port-forward died) ensures cleanup completes and doesn't block subsequent test runs.


465-470: Verify getCachedCacheInstance is defined and properly configured.

The function getCachedCacheInstance is referenced but cannot be verified in the available context. Ensure it is properly implemented in the fixture package and handles TLS configuration support as expected.

test/e2e/rp_test.go (3)

162-169: Good refactoring to use fixture helpers.

Consolidating endpoint and secret retrieval into reusable fixture functions improves maintainability and ensures consistent behavior across tests, especially with the new TLS configuration support.


295-304: Consistent usage of fixture helpers.

Same pattern applied here maintains consistency across the test suite.


509-510: Minor formatting change.

No functional change - the URL is now on a single line.

cmd/argocd-agent/principal.go (4)

89-98: LGTM! Clear Redis TLS configuration variables.

The TLS configuration fields are well-organized, covering server TLS (cert/key from path or secret) and upstream TLS (CA from path, secret, or insecure mode).


258-299: LGTM! Comprehensive TLS configuration with proper validation.

The mutual exclusivity validation (lines 272-286) correctly ensures only one upstream TLS mode is specified. The special handling at line 281 that excludes the default secret name from the count is appropriate—it allows users to explicitly set only insecure mode or CA path without being blocked by the default value.

The configuration flow properly mirrors the existing server cert/key validation pattern (lines 262-266).


430-451: LGTM! Well-documented CLI flags with sensible defaults.

The Redis TLS flags follow existing patterns with environment variable fallbacks. Enabling TLS by default (true at line 432) aligns with the PR objective of "TLS encryption enabled by default."


482-482: Verify the 30-second timeout is appropriate.

The timeout was increased from 2 seconds to 30 seconds. While this accommodates TLS secret retrieval which may take longer, 30 seconds is quite generous and could delay startup failures. Consider whether 10-15 seconds might be sufficient, or document why 30 seconds is needed.

agent/agent.go (1)

445-460: LGTM! Improved cluster cache info update logic.

The refactored goroutine now:

  1. Sends an initial update immediately on startup (line 448)
  2. Uses a single ticker for periodic updates
  3. Works for both managed and autonomous agent modes

This is cleaner than mode-specific goroutines and ensures timely initial synchronization.

hack/dev-env/start-e2e.sh (3)

50-59: LGTM! Clean variable setup with proper declaration.

The Redis password assignment now correctly separates declaration and assignment (lines 58-59), which properly surfaces kubectl failures. The static localhost addresses for Redis endpoints are appropriate for the TLS certificate validation during E2E tests.


104-170: LGTM! Robust readiness check with excellent diagnostics.

The Redis proxy readiness check includes:

  • Fallback from nc to bash TCP redirection (lines 109-121)
  • Progress reporting (lines 123-127)
  • Comprehensive failure diagnostics including goreman status, port checks, and log tails (lines 134-169)

This will significantly aid debugging E2E environment issues.


192-227: LGTM! Proper rollout handling with timeout.

The Argo CD component restart and rollout status checks are well-implemented with:

  • Individual rollout status checks per component
  • 90-second timeout (reasonable for component restarts)
  • Failure aggregation before exit
  • Pod status output on failure for debugging
principal/redisproxy/redisproxy.go (3)

98-128: LGTM! Clean TLS configuration API.

The setter methods provide a clear interface for configuring TLS:

  • Server TLS from certificate/key objects or file paths
  • Upstream TLS CA from pool, file path, or insecure mode

The separation of concerns makes the configuration flexible for different deployment scenarios.


130-154: LGTM! Robust TLS config creation with proper error handling.

The createServerTLSConfig method correctly:

  • Prioritizes path-based loading over in-memory certificates
  • Properly constructs tls.Certificate from raw cert/key
  • Sets minimum TLS version to 1.2
  • Returns descriptive errors on failure

156-200: LGTM! Clean TLS listener implementation.

The Start() method properly branches between TLS and non-TLS listeners with appropriate logging to indicate which mode is active.

Makefile (1)

59-79: Add error handling to TLS configuration scripts to fail fast on errors.

The TLS setup steps (lines 59-79) execute multiple scripts sequentially without error handling. If any script fails, the Makefile continues to the next step, potentially leaving the E2E environment in a partially configured state. Add || exit 1 after each script invocation to stop execution immediately if a step fails.

docs/configuration/redis-tls.md (3)

226-230: Confirm that hack/dev-env/reverse-tunnel/README.md exists.

Line 162 references hack/dev-env/reverse-tunnel/README.md for detailed reverse-tunnel setup. Ensure this documentation file is included in the PR or update the link if it's located elsewhere.


1-700: Excellent comprehensive Redis TLS documentation.

This is a well-structured, thorough guide covering overview, architecture, certificate management, configuration, Kubernetes installation, troubleshooting, and security best practices. The examples are clear and practical, and the documentation aligns well with the actual TLS implementation across the codebase. The table structures for CLI flags and environment variables are particularly helpful for users.


31-49: Resolve remaining markdownlint fenced-code-block language tags (MD040).

The documentation uses text tags for most code blocks, but the static analysis tool flags remaining bare code fences at lines 150, 475, 486, and 504. Ensure all architecture diagrams and script output sections are tagged with text to satisfy linting.

hack/dev-env/start-agent-managed.sh (1)

37-110: LGTM: Redis TLS configuration and argument passing look correct.

The Redis TLS detection, certificate extraction, address defaulting, and dual-path invocation (dist vs go run) are all properly implemented. The script provides clear user guidance when certificates are missing, and TLS arguments are consistently passed to both binary paths.

hack/dev-env/start-principal.sh (2)

23-86: LGTM: Principal TLS startup configuration is well-structured.

The Redis TLS detection, certificate checks, and dual-path argument passing are correct. The script properly handles the default Redis address (localhost:6380) and provides good comments about certificate SANs and reverse tunnel support.


44-62: Verify certificate file naming consistency between cert generation and usage scripts.

The script checks for redis-proxy.crt, redis-proxy.key, and ca.crt files (lines 46-48). Confirm that the gen-redis-tls-certs.sh script generates files with these exact names. If the naming differs between the generation and startup scripts, update one to match the other or document the intentional difference.

docs/getting-started/kubernetes/index.md (1)

159-230: Excellent Redis TLS setup instructions for Kubernetes.

The sections provide clear, step-by-step guidance for configuring Redis TLS on both control-plane and workload clusters, including certificate generation, secret creation, Redis patching, and verification. The instructions are well-organized and include helpful commands. One minor note: the patches use JSON array append syntax (- in the path), which should work correctly for idempotent re-runs when arrays already exist.

Also applies to: 337-381

hack/dev-env/gen-redis-tls-certs.sh (1)

1-150: LGTM: Certificate generation script is well-structured and idempotent.

The script properly generates Redis TLS certificates for all required components (CA, control-plane, proxy, autonomous, managed) with appropriate SANs including local IP detection, localhost, cluster DNS, and reverse-tunnel hostname. Error handling uses set -e, and temporary files are cleaned up. The idempotent checks for existing keys/certs make the script safe to re-run.

test/e2e/README.md (2)

83-107: Document the E2E_READY marker output requirement for make start-e2e.

The CI workflow (.github/workflows/ci.yaml) waits for an E2E_READY: marker in the logs from the make start-e2e step. This README should document that the start-e2e target (or its underlying script) must output this marker after all components are ready, so that CI's readiness check functions correctly.


1-137: LGTM: Clear and well-organized E2E test documentation.

The multi-terminal workflow is clearly explained with proper step numbering, and the Redis TLS requirement is prominently documented. The addition of the reverse-tunnel section for remote clusters is excellent, and the note about InsecureSkipVerify in test fixtures appropriately clarifies that TLS encryption is still enabled. The environment auto-detection guidance (local vs CI) is helpful.

hack/dev-env/configure-argocd-redis-tls.sh (3)

37-57: Verify redis.server configuration logic for control-plane vs agent clusters.

The script skips redis.server configuration for vcluster-control-plane (line 52), assuming it uses the Redis proxy. For agent clusters, it sets redis.server to argocd-redis:6379 (line 41). Verify that this logic matches the actual cluster configuration and that the control-plane's redis.server is correctly set by other means (e.g., setup-vcluster-env.sh).


59-304: Verify that the secret name argocd-redis-tls is used consistently across all setup scripts.

Lines 80, 98, 174, etc., reference the secret argocd-redis-tls with the ca.crt key. Confirm that this matches:

  1. The secret created by hack/dev-env/configure-redis-tls.sh
  2. The secret created in Kubernetes installation docs (section 2.4)
  3. Any other TLS configuration in the codebase

1-354: LGTM: Comprehensive and idempotent Argo CD TLS configuration script.

The script robustly configures Redis TLS across all Argo CD components (server, repo-server, application-controller) with proper idempotency checks for volumes, mounts, and arguments. Error handling is explicit, and replica scaling logic has been fixed to correctly ensure a minimum of 1 replica. The use of conditional component checks makes the script flexible for clusters with different deployments.

.github/workflows/ci.yaml (1)

199-214: Verify that E2E_READY: marker is generated during E2E setup.

The readiness logic now waits for an event-driven marker instead of a fixed delay, which is a solid improvement. However, this depends on start-e2e.sh (or one of its components) outputting the E2E_READY: marker to /tmp/e2e-argocd-agent.log. Confirm this marker is consistently generated as part of the E2E setup flow and that logs are properly redirected to this file.

test/e2e/fixture/cluster.go (1)

42-52: TLS flags and cached Redis usage in cluster info helpers look consistent

The added ManagedAgentRedisTLSEnabled / PrincipalRedisTLSEnabled fields and the switch to getCachedCacheInstance in GetManagedAgentClusterInfo / GetPrincipalClusterInfo are consistent with the TLS‑enabled E2E design, and the extra logging around principal lookups should make Redis connectivity issues easier to debug in tests. Ensure getClusterConfigurations is always called before these helpers so the addresses/TLS flags are populated on ClusterDetails.

test/e2e/redis_proxy_test.go (1)

184-184: Extended pod replacement windows are reasonable

Bumping the requires.Eventually wait duration for new pod creation from 30s to 60s is a pragmatic way to accommodate slower schedulers without changing semantics, and looks safe for both managed and autonomous flows.

Also applies to: 402-402

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 2 times, most recently from b199f13 to 6fb6d33 Compare December 8, 2025 14:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
hack/dev-env/configure-redis-tls.sh (1)

68-70: Add error handling for context switch.

If kubectl config use-context fails (e.g., context doesn't exist), the script continues and may operate on the wrong cluster. This was flagged in a previous review but not yet addressed.

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }
hack/dev-env/configure-argocd-redis-tls.sh (1)

240-255: Same volumes array handling issue.

Like argocd-repo-server, argocd-application-controller assumes the volumes array exists. Apply the same defensive approach as suggested for argocd-repo-server to handle cases where the volumes array might not exist initially.

🧹 Nitpick comments (9)
docs/configuration/redis-tls.md (2)

150-156: Add language specifier to fenced code block.

This code block is still missing a language identifier, which triggers markdownlint MD040. The past review comment indicated this was addressed, but the current code still shows a bare fence.

-  ```
+  ```text
   Argo CD Server (remote vcluster) 
       → rathole Deployment (remote) 
       → rathole Container (local Mac) 
       → Principal process (local Mac)
-  ```
+  ```

475-520: Add language specifiers to remaining code blocks.

Several code blocks in the "Understanding Script Output" section are missing language specifiers (markdownlint MD040). Tag them as text for consistency:

 **gen-redis-tls-certs.sh:**
-```
+```text
 Generating Redis TLS certificates in hack/dev-env/creds/redis-tls...
 ...
 **configure-redis-tls.sh:**
-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ...
 **configure-argocd-redis-tls.sh:**
-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ...
docs/getting-started/kubernetes/index.md (1)

207-211: Clarify $(REDIS_PASSWORD) is a Redis environment variable reference.

The $(REDIS_PASSWORD) syntax in the Redis args may confuse users who might think it's shell variable expansion. Consider adding a brief note that this is how Redis references its internal environment variable, or ensure the existing Argo CD Redis deployment already has this env var defined.

Consider adding a note:

!!! note "Redis Password"
    The `$(REDIS_PASSWORD)` syntax references the Redis container's environment variable, which is typically set from the `argocd-redis` secret.
test/e2e/README.md (1)

83-108: Accurate Redis TLS documentation with proper script references.

The Redis TLS section correctly documents the automatic setup and provides manual reconfiguration steps using the scripts added in this PR. The note about InsecureSkipVerify in test fixtures appropriately explains the trade-off for testing convenience.

Optional: Consider adding a comma after "SANs)" in line 107 for improved readability:

-...localhost port-forwards (which match the certificate SANs). TLS encryption is fully enabled...
+...localhost port-forwards (which match the certificate SANs), TLS encryption is fully enabled...
test/e2e/fixture/fixture.go (1)

108-172: Extended deletion timeouts are reasonable for E2E usage

Bumping the deletion/wait loops from 60 to 120 seconds (in both EnsureDeletion and WaitForDeletion) is a pragmatic way to reduce flakiness under slow CI; the polling logic and error handling remain sane. If this ever needs tuning per-suite, consider lifting the 120 into a shared constant, but it’s fine as-is.

hack/dev-env/start-e2e.sh (1)

206-266: Consider waiting for application-controller rollout as well

When Argo CD needs Redis reconfiguration, you restart argocd-server and argocd-repo-server and wait for their rollouts, but only check those two. Since the application controller also depends on Redis, you might want to add a rollout wait for argocd-application-controller too to catch early failures:

-    kubectl --context vcluster-control-plane -n argocd rollout restart statefulset argocd-application-controller 2>/dev/null || true
+    kubectl --context vcluster-control-plane -n argocd rollout restart statefulset argocd-application-controller 2>/dev/null || true
+    if ! kubectl --context vcluster-control-plane -n argocd rollout status statefulset argocd-application-controller --timeout=$ROLLOUT_TIMEOUT 2>/dev/null; then
+        echo "  ERROR: argocd-application-controller rollout timed out"
+        kubectl --context vcluster-control-plane -n argocd get pods -l app.kubernetes.io/name=argocd-application-controller
+        ROLLOUT_FAILED=true
+    fi
test/e2e/fixture/cluster.go (1)

170-223: Redis client TLS config for tests favors simplicity over strict verification

Enabling TLS with MinVersion: TLS1.2 and InsecureSkipVerify: true for principal/managed Redis in getCacheInstance matches the PR description of “TLS-on but skip verification” for E2E. This is acceptable for test harness code, but it means tests won’t catch CA/SAN misconfigurations. If you later want stricter coverage, consider:

  • Allowing an env flag to turn verification on (and wiring a CA pool), while keeping the current behavior as the default; or
  • At least logging when InsecureSkipVerify is active so it’s obvious in test logs.
principal/redisproxy/redisproxy.go (2)

136-148: Consider documenting configuration precedence.

When both file paths and in-memory certificates are configured, the file paths take precedence (lines 136-141 execute first). Consider documenting this behavior in the method comment or in the struct field comments to make the priority explicit for callers.


150-153: Consider TLS 1.3 as minimum version for new implementations.

The code sets MinVersion: tls.VersionTLS12. For new implementations handling sensitive data, TLS 1.3 (tls.VersionTLS13) provides stronger security guarantees and is widely supported. TLS 1.2 is acceptable but consider upgrading if compatibility allows.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c546df9 and 6fb6d33.

📒 Files selected for processing (47)
  • .github/workflows/ci.yaml (1 hunks)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • agent/inbound_redis.go (3 hunks)
  • agent/options.go (1 hunks)
  • agent/outbound_test.go (1 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/setup-vcluster-env.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/README.md (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (2 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.yaml (1 hunks)
  • install/kubernetes/agent/agent-deployment.yaml (3 hunks)
  • install/kubernetes/agent/agent-params-cm.yaml (1 hunks)
  • install/kubernetes/principal/principal-deployment.yaml (3 hunks)
  • install/kubernetes/principal/principal-params-cm.yaml (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • internal/argocd/cluster/cluster_test.go (3 hunks)
  • internal/argocd/cluster/informer_test.go (6 hunks)
  • internal/argocd/cluster/manager.go (3 hunks)
  • internal/argocd/cluster/manager_test.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/options.go (2 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/server.go (3 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (22)
  • test/e2e/fixture/argoclient.go
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml
  • test/e2e/rp_test.go
  • test/run-e2e.sh
  • internal/argocd/cluster/manager.go
  • .github/workflows/ci.yaml
  • test/e2e/clusterinfo_test.go
  • agent/inbound_redis.go
  • test/e2e/redis_proxy_test.go
  • principal/listen.go
  • hack/dev-env/start-agent-autonomous.sh
  • install/kubernetes/agent/agent-deployment.yaml
  • agent/outbound_test.go
  • install/kubernetes/principal/principal-deployment.yaml
  • hack/dev-env/gen-redis-tls-certs.sh
  • hack/dev-env/start-agent-managed.sh
  • cmd/argocd-agent/agent.go
  • install/kubernetes/agent/agent-params-cm.yaml
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
  • principal/tracker/tracking.go
  • install/kubernetes/principal/principal-params-cm.yaml
  • principal/resource.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • Makefile
  • hack/dev-env/start-e2e.sh
  • install/helm-repo/argocd-agent-agent/values.yaml
  • hack/dev-env/Procfile.e2e
  • test/e2e/README.md
🧬 Code graph analysis (11)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
hack/dev-env/start-e2e.sh (2)
hack/dev-env/setup-vcluster-env.sh (2)
  • apply (94-271)
  • cleanup (39-41)
hack/dev-env/configure-redis-tls.sh (1)
  • cleanup (50-52)
agent/options.go (2)
principal/options.go (1)
  • WithRedisTLSEnabled (493-498)
agent/agent.go (2)
  • AgentOption (136-136)
  • Agent (65-117)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-271)
principal/redisproxy/redisproxy.go (1)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
internal/argocd/cluster/manager_test.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
principal/options.go (5)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/apis/auth/auth.go (2)
  • ServerOption (56-56)
  • Server (33-39)
principal/mocks/ServerOption.go (1)
  • ServerOption (11-13)
principal/server.go (1)
  • Server (72-164)
internal/tlsutil/kubernetes.go (2)
  • TLSCertFromSecret (44-66)
  • X509CertPoolFromSecret (106-128)
🪛 LanguageTool
docs/configuration/redis-tls.md

[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...

(AI_HYDRA_LEO_MISSING_COMMA)

test/e2e/README.md

[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md

229-229: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build & cache Go code
  • GitHub Check: Run unit tests
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (37)
install/helm-repo/argocd-agent-agent/values.yaml (1)

138-162: Verify that referenced secrets and container paths are properly mounted in deployment templates.

The configuration references the secret argocd-redis-tls and container path /app/config/redis-tls/ca.crt for Redis TLS. Ensure that:

  1. The secret argocd-redis-tls is created (either by the Helm templates or installation scripts).
  2. The Helm templates include volumeMounts to mount this secret at the specified path.
  3. The pod selectors used in networkPolicy (e.g., app.kubernetes.io/name: argocd-redis and app.kubernetes.io/name: argocd-agent-agent) match the actual pod labels in the deployments.
internal/argocd/cluster/manager_test.go (2)

57-57: LGTM!

The NewManager call correctly matches the updated signature with the new redisCompressionType and tlsConfig parameters. Passing nil for tlsConfig is appropriate for unit tests that don't require TLS encryption.


78-78: Consistent test initialization.

The test correctly uses the same pattern as Test_StartStop, maintaining consistency across test functions.

docs/configuration/redis-tls.md (1)

1-49: Comprehensive and well-structured TLS documentation.

The document provides excellent coverage of Redis TLS architecture, configuration points, and the relationship between principal/agent components. The architecture diagram clearly illustrates the three TLS configuration points.

hack/dev-env/configure-redis-tls.sh (2)

81-121: Good implementation of component scaling during TLS transition.

The script properly scales down ArgoCD components before enabling TLS on Redis, preventing SSL errors during the transition. Storing replica counts in a ConfigMap for later restoration is a thoughtful approach.


138-196: Well-implemented idempotency checks.

The script correctly checks for existing volumes and volume mounts before patching, making it safe to run multiple times. The handling of both empty arrays and existing arrays is thorough.

hack/dev-env/setup-vcluster-env.sh (2)

159-190: Clear environment-specific Redis configuration.

The branching logic for different environments (in-cluster, CI, local development) is well-documented and handles each scenario appropriately. The comments explaining why each configuration is needed are helpful.


182-186: Verify IP address detection robustness on Linux.

The ip r show default parsing may not work reliably on all Linux distributions or network configurations (e.g., multiple default routes, VPNs). Consider adding fallback logic or error handling to ensure the IP address detection fails gracefully when this command doesn't produce expected output.

docs/getting-started/kubernetes/index.md (3)

159-230: Comprehensive Redis TLS setup documentation.

The new section provides clear, step-by-step instructions for setting up Redis TLS on the control plane, including certificate generation, secret creation, deployment patching, and verification. The warning admonition properly emphasizes that TLS is required.


337-381: Good parallel structure for workload cluster TLS setup.

The section correctly instructs users to reuse the same CA from Step 2.4, ensuring certificate chain consistency. The commands mirror the control plane setup appropriately.


646-646: Good addition of cross-reference.

Adding the Redis TLS Configuration link to Related Documentation helps users find detailed TLS information.

hack/dev-env/configure-argocd-redis-tls.sh (3)

16-57: LGTM! Clean context-aware Redis configuration.

The script correctly differentiates between control-plane (which uses Redis proxy) and agent clusters (which connect to local Redis), with proper error handling and informative messaging.


59-158: Robust idempotent patching with proper error handling.

The configuration logic correctly handles both missing and existing volumes arrays, includes clear error messages, and ensures idempotency. The assumption that container index 0 is the main container aligns with standard Argo CD deployment structure.


306-355: Well-structured scaling and cleanup logic.

The replica guard logic correctly ensures at least 1 replica using explicit if statements (addressing the past review comment). The cleanup of the temporary ConfigMap is a good practice. Rollout status checks with timeouts provide proper feedback.

test/e2e/README.md (1)

21-82: Clear and comprehensive workflow documentation.

The step-by-step E2E test workflow is well-structured, with excellent coverage of local vs. remote cluster scenarios, reverse tunnel setup, and the distinction between port-forward and direct LoadBalancer access.

Makefile (1)

59-79: TLS configuration properly integrated with Make's default error handling.

The sequential TLS setup steps rely on Make's default behavior to stop on the first non-zero exit code, which is standard practice. Combined with set -e in the individual scripts, this provides adequate error handling.

internal/argocd/cluster/cluster_test.go (1)

31-44: Test correctly updated for new NewManager signature.

The test setup appropriately passes nil for the new tlsConfig parameter, which is suitable for test scenarios using miniredis.

agent/options.go (1)

111-133: Well-structured Redis TLS configuration options.

The new option functions follow the established pattern, include appropriate documentation, and correctly note that WithRedisTLSInsecure is for testing only. The implementation properly sets fields on the redisProxyMsgHandler.

internal/argocd/cluster/informer_test.go (1)

17-126: Consistent test updates for extended function signatures.

All test cases properly updated to pass the compression type and nil TLS config. The changes maintain test functionality while accommodating the new signature requirements.

hack/dev-env/start-principal.sh (3)

23-29: Properly delegates port-forward to external process.

The script now correctly expects an external port-forward (from Procfile.e2e or manual setup) rather than creating its own, avoiding the port conflict issue flagged in the previous review.


44-62: Robust TLS certificate validation and user guidance.

The script properly checks for required TLS certificates and provides helpful guidance when they're missing. The TLS arguments correctly cover both server-side TLS (cert/key) and upstream TLS (CA path), with appropriate SANs noted in comments.


64-86: TLS arguments consistently propagated across execution modes.

The dual execution path (pre-built binary vs. go run) ensures TLS arguments are applied regardless of the execution method, supporting both CI and local development workflows seamlessly.

internal/argocd/cluster/cluster.go (2)

135-142: Sensible ConnectionState initialization for new agent connections.

The initialization provides appropriate defaults when ConnectionState doesn't exist, preventing nil values and ensuring consistent status reporting when cache stats are first received from a newly connected agent.


176-191: Clean TLS configuration wiring into Redis client.

The TLS config is properly integrated into the Redis client options, following the standard go-redis pattern. The nullable tlsConfig parameter correctly supports both TLS and non-TLS configurations.

test/e2e/fixture/fixture.go (2)

219-375: Non-fatal cleanup errors and deep-copy usage look appropriate

Switching the various EnsureDeletion / WaitForDeletion failures to fmt.Printf warnings while continuing cleanup matches the goal of not failing tests due to residual resources, and the use of DeepCopy() when changing namespace/name on loop variables avoids subtle aliasing issues. Just be aware that leaked resources will now only show up in logs, not as hard test failures.

Would you like a small helper to aggregate and surface a summary of cleanup warnings at the end of the suite, so persistent leaks are easier to spot without failing every run?


487-501: Graceful handling of Redis unavailability during cleanup

Treating resetManagedAgentClusterInfo failures as a warning instead of a hard error is a good trade-off for E2E runs where Redis port-forwards may already be gone. The error wrapping in resetManagedAgentClusterInfo also gives clearer diagnostics when debugging Redis-related issues.

principal/server.go (1)

349-372: Redis proxy and cluster manager TLS wiring is consistent and robust

The new Redis TLS wiring in NewServer looks solid: server TLS is configured from either file paths or secrets, upstream TLS supports insecure mode or CA from file/pool, and the same upstream options are reused for the cluster manager via clusterMgrRedisTLSConfig with MinVersion: TLS1.2. The explicit warning for InsecureSkipVerify is also helpful. No changes needed here.

Also applies to: 400-428

agent/agent.go (1)

323-345: Cluster cache Redis TLS configuration matches principal-side behavior

Reusing the Redis TLS options for clusterCacheTLSConfig (with TLS 1.2 minimum, optional insecure mode, and CA loading from path) keeps agent-side cluster cache consistent with the proxy/upstream config. The warning when running insecure is a good touch. Looks good.

hack/dev-env/start-e2e.sh (1)

50-123: Localhost Redis endpoint wiring and in-/out-of-cluster proxy bridge look good

Using fixed localhost ports (6380/6381/6382) with goreman-managed port-forwards, plus the vcluster argocd-agent-redis-proxy Service/Endpoints bridge for out-of-cluster mode, is a clear and predictable setup for TLS-enabled Redis in E2E. The REDIS_PASSWORD export and Redis pod readiness checks are also straightforward.

cmd/argocd-agent/principal.go (2)

258-299: Redis TLS configuration and upstream mode validation are well-structured

The Redis TLS block cleanly separates server TLS (file vs secret) from upstream TLS (insecure vs CA file vs CA secret), and the modesSet mutual-exclusivity check prevents conflicting upstream modes with a clear fatal message. Skipping counting the default CA secret name avoids spurious errors while still allowing an explicit override. This is a solid configuration surface.


430-452: CLI flags and resource-proxy TLS timeout align with the new TLS surface

The new Redis TLS flags (enabled-by-default, server cert/key or secret, upstream CA path/secret, and insecure flag) match the principal.ServerOption API and default to secure behavior. Switching getResourceProxyTLSConfigFromKube to a 30s timeout avoids potential hangs on secret reads without changing semantics. No issues here.

Also applies to: 482-490

hack/dev-env/Procfile.e2e (1)

1-7: Procfile port-forwards and startup gating match the new Redis topology

The added pf-* entries and the principal/agent processes that wait for local Redis ports (6380/6381/6382) before starting line up cleanly with the localhost-based TLS endpoints configured in start-e2e.sh. This should make Redis startup ordering much more deterministic in CI.

test/e2e/fixture/cluster.go (1)

226-267: Redis cache client caching and test-focused config look fine

The cachedRedisClients map with a mutex and per-address cache key is a reasonable way to avoid reconnect churn in tests. Given appstatecache.Cache doesn’t expose a close API, resetting the map in CleanupRedisCachedClients and relying on GC is an acceptable compromise for short-lived E2E runs—just ensure the suite calls this cleanup once at the end. The Redis address discovery and env overrides (MANAGED_AGENT_REDIS_ADDR, ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS) also align with the new port-forwarded TLS setup.

Also applies to: 320-387

principal/options.go (1)

80-88: Redis TLS option fields and helpers integrate cleanly with existing ServerOptions

Adding Redis TLS fields into ServerOptions and exposing WithRedisTLSEnabled, WithRedisServerTLSFromPath/Secret, and WithRedisUpstreamTLSCAFromFile/Secret/Insecure follows the existing options pattern and keeps TLS concerns encapsulated. Using tlsutil for secret-based loading is consistent with other TLS helpers. This API surface looks well-designed for both CLI wiring and tests.

Also applies to: 492-548

principal/redisproxy/redisproxy.go (3)

65-128: LGTM! Well-structured TLS configuration API.

The TLS fields and setter methods are cleanly designed, allowing flexible configuration via both in-memory objects and file paths. The separation between server TLS (incoming connections) and upstream TLS (outgoing connections) is clear and appropriate.


157-200: LGTM! Clean TLS listener setup.

The Start() method correctly branches between TLS and non-TLS listener creation with appropriate error handling and logging. The distinction between modes is clear at startup.


853-894: Note: Upstream TLS mismatch concern previously flagged.

A past review comment flagged that when rp.tlsEnabled is true but no upstream TLS configuration is provided, the connection to principal Redis will be unencrypted. This creates a potential security gap where the proxy accepts encrypted connections from Argo CD but forwards traffic in plaintext to Redis.

The past review suggested adding a warning log when this mismatch occurs. Consider addressing this feedback to improve the security posture.

Based on learnings, past review comments should be referenced when similar concerns are identified.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 4 times, most recently from fec1462 to bf0d4f8 Compare December 8, 2025 15:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
Makefile (1)

59-79: Error handling still missing between TLS configuration steps.

The sequential script calls lack error handling—if any script fails (cert generation, Redis config, or ArgoCD config), subsequent steps still execute, potentially leaving the environment partially configured. The past review suggested chaining commands with || to fail fast, but this appears unaddressed in the current code.

Apply error handling to fail fast:

 	@echo ""
 	@echo "Configuring Redis TLS (required for E2E)..."
-	./hack/dev-env/gen-redis-tls-certs.sh
+	./hack/dev-env/gen-redis-tls-certs.sh || (echo "ERROR: Certificate generation failed" && exit 1)
 	@echo ""
 	@echo "Configuring each cluster for Redis TLS (Redis + ArgoCD components together)"
 	@echo "Note: Redis and ArgoCD components are configured together per-cluster to avoid"
 	@echo "      connection errors during the transition period."
 	@echo ""
 	@echo "=== Control Plane ==="
-	./hack/dev-env/configure-redis-tls.sh vcluster-control-plane
-	./hack/dev-env/configure-argocd-redis-tls.sh vcluster-control-plane
+	./hack/dev-env/configure-redis-tls.sh vcluster-control-plane || (echo "ERROR: Redis TLS config failed for control-plane" && exit 1)
+	./hack/dev-env/configure-argocd-redis-tls.sh vcluster-control-plane || (echo "ERROR: ArgoCD TLS config failed for control-plane" && exit 1)
 	@echo ""
 	@echo "=== Agent Managed ==="
-	./hack/dev-env/configure-redis-tls.sh vcluster-agent-managed
-	./hack/dev-env/configure-argocd-redis-tls.sh vcluster-agent-managed
+	./hack/dev-env/configure-redis-tls.sh vcluster-agent-managed || (echo "ERROR: Redis TLS config failed for agent-managed" && exit 1)
+	./hack/dev-env/configure-argocd-redis-tls.sh vcluster-agent-managed || (echo "ERROR: ArgoCD TLS config failed for agent-managed" && exit 1)
 	@echo ""
 	@echo "=== Agent Autonomous ==="
-	./hack/dev-env/configure-redis-tls.sh vcluster-agent-autonomous
-	./hack/dev-env/configure-argocd-redis-tls.sh vcluster-agent-autonomous
+	./hack/dev-env/configure-redis-tls.sh vcluster-agent-autonomous || (echo "ERROR: Redis TLS config failed for agent-autonomous" && exit 1)
+	./hack/dev-env/configure-argocd-redis-tls.sh vcluster-agent-autonomous || (echo "ERROR: ArgoCD TLS config failed for agent-autonomous" && exit 1)
hack/dev-env/configure-redis-tls.sh (2)

68-70: Verify context switch succeeded before proceeding.

If kubectl config use-context fails (context doesn't exist or kubectl error), the script continues and may operate on the wrong cluster. This is dangerous in a multi-cluster setup.

Add error checking:

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }

198-206: Fail when Redis password secret is missing.

Continuing with an empty password when the argocd-redis secret is missing will cause Argo CD components to fail with NOAUTH errors. For E2E environments, the password secret should exist before Redis TLS configuration. Fail fast to surface the missing prerequisite.

Apply this fix:

 # Get the Redis password from the secret
 REDIS_PASSWORD=$(kubectl -n ${NAMESPACE} get secret argocd-redis -o jsonpath='{.data.auth}' | base64 --decode 2>/dev/null || echo "")
 
 if [ -z "$REDIS_PASSWORD" ]; then
-    echo "Warning: Redis password not found in secret argocd-redis"
-    echo "Redis will be configured without password authentication"
-    REDIS_PASSWORD=""
+    echo "Error: Redis password not found in secret argocd-redis"
+    echo "Redis password is required for secure configuration"
+    exit 1
 fi
agent/agent.go (1)

445-460: Guard against zero cacheRefreshInterval before creating ticker.

time.NewTicker(a.cacheRefreshInterval) will panic if cacheRefreshInterval is zero or negative ("non-positive interval for NewTicker"). If no AgentOption sets this field, the goroutine will crash at runtime.

Add validation:

+	// Validate cache refresh interval
+	interval := a.cacheRefreshInterval
+	if interval <= 0 {
+		interval = 30 * time.Second
+		log().Warnf("cacheRefreshInterval not set, using default: %v", interval)
+	}
+
 	// Send initial update immediately on startup (don't wait for first ticker)
 	a.addClusterCacheInfoUpdateToQueue()
 
-	ticker := time.NewTicker(a.cacheRefreshInterval)
+	ticker := time.NewTicker(interval)
principal/redisproxy/redisproxy.go (1)

846-850: Note: Timeout concerns previously flagged.

Past reviews identified missing timeouts on the TCP dial (lines 846-850) and TLS handshake (lines 886-890). These remain valid concerns but have already been documented.

Also applies to: 886-890

🧹 Nitpick comments (1)
cmd/argocd-agent/principal.go (1)

272-286: Consider clarifying the default secret name exclusion logic.

The mutual exclusivity validation at line 281 excludes the default secret name "argocd-redis-tls" from the count. This is necessary because the flag has a default value (line 447), but the logic is subtle and could confuse future maintainers.

Consider adding a comment explaining this:

         // Only count non-default secret name to allow default value
         if redisUpstreamTLSCASecretName != "" && redisUpstreamTLSCASecretName != "argocd-redis-tls" {
+            // Note: We skip counting the default value because the flag always has a value
+            // (see line 447). This allows users to specify --redis-upstream-ca-path
+            // without explicitly clearing the default secret name.
             modesSet++
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fb6d33 and bf0d4f8.

📒 Files selected for processing (29)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/setup-vcluster-env.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • hack/dev-env/setup-vcluster-env.sh
🚧 Files skipped from review as they are similar to previous changes (11)
  • principal/listen.go
  • test/e2e/rp_test.go
  • principal/resource.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • hack/dev-env/start-e2e.sh
  • test/e2e/fixture/argoclient.go
  • hack/dev-env/start-agent-autonomous.sh
  • cmd/argocd-agent/agent.go
  • hack/dev-env/start-principal.sh
  • hack/dev-env/gen-redis-tls-certs.sh
  • test/e2e/redis_proxy_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/run-e2e.sh
  • Makefile
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-agent-managed.sh
  • test/e2e/README.md
🧬 Code graph analysis (6)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (2)
  • cleanup (39-41)
  • apply (94-247)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
  • Event (112-115)
internal/logging/logfields/logfields.go (1)
  • Event (34-34)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (4)
  • HasConnectionStatus (60-74)
  • AgentManagedName (37-37)
  • ClusterDetails (42-56)
  • AgentAutonomousName (38-38)
🪛 LanguageTool
docs/configuration/redis-tls.md

[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...

(AI_HYDRA_LEO_MISSING_COMMA)

test/e2e/README.md

[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

test/e2e/README.md

32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run unit tests
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Lint Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (26)
principal/tracker/tracking.go (1)

75-78: Good concurrency fix—verify deadlock resolution and single-response guarantee.

Buffering the channel with size 1 is a valid pattern to decouple sender and receiver timing, preventing deadlock when the send occurs before the receive is ready. This is appropriate for the request/response tracking pattern.

Please verify:

  1. The deadlock is actually resolved with this change (e.g., through stress testing or reproduction steps).
  2. The usage pattern guarantees exactly one response per tracked request—if multiple events could be sent to the same channel, the second would block (or be lost), since the buffer size is 1.
internal/argocd/cluster/cluster.go (2)

135-142: LGTM!

Defensive initialization of ConnectionState when it doesn't exist yet is appropriate. This ensures cluster info always has a valid connection state after cache stats are updated.


176-185: LGTM!

TLS configuration is cleanly threaded through to the Redis client. The signature change is backward-compatible (callers can pass nil for non-TLS connections), and the implementation properly wires TLS into Redis options.

hack/dev-env/configure-redis-tls.sh (1)

61-66: LGTM!

Certificate validation comprehensively checks all required TLS files (CA, certificate, and key) before proceeding. This prevents kubectl secret creation from failing with unclear errors.

hack/dev-env/start-agent-managed.sh (2)

37-61: LGTM!

The script gracefully handles TLS configuration with clear user guidance. The Redis address defaults are appropriate for local development, and the port-forward requirements are well-documented. This developer-friendly approach helps prevent common setup mistakes.


63-91: LGTM!

Certificate extraction and agent invocation properly wired for TLS. The /tmp storage for certificates is acceptable for development environments, and all necessary TLS flags are passed to the agent.

test/e2e/fixture/fixture.go (3)

110-156: LGTM!

Extended timeouts (60s → 120s) appropriately accommodate TLS handshake overhead and slower cluster operations in E2E environments. This prevents spurious timeout failures while maintaining reasonable bounds.


201-492: LGTM!

Converting cleanup errors to warnings (rather than failing fast) ensures E2E test teardown completes as much as possible even when resources are partially unavailable. This is particularly appropriate when Redis connections may be unstable (e.g., port-forward died), preventing orphaned resources from accumulating across test runs.


236-267: LGTM!

Using DeepCopy() before adjusting namespaces prevents mutation of loop variables—a classic Go pitfall. This ensures each wait operation acts on an independent copy with the correct namespace.

test/e2e/README.md (1)

83-108: LGTM!

The Redis TLS documentation accurately describes the automatic setup and manual reconfiguration procedures. The scripts referenced (gen-redis-tls-certs.sh, configure-redis-tls.sh, configure-argocd-redis-tls.sh) are provided in this PR. The explanation of InsecureSkipVerify usage in test fixtures versus proper TLS validation in agents/principal is clear and appropriate.

Note: Past review comments flagged these scripts as non-existent, but they're introduced in this PR.

agent/agent.go (1)

323-349: LGTM!

TLS configuration for cluster cache follows established patterns and includes proper error handling. CA loading validates the certificate pool, and the warning log for insecure mode (Line 330) provides appropriate security awareness, matching the principal implementation.

hack/dev-env/Procfile.e2e (1)

1-7: LGTM!

Process orchestration properly coordinates TLS-enabled E2E testing. Port-forwards enable Redis TLS connections via localhost (matching certificate SANs), and staggered startup delays (3s for principal, 5s for agents) ensure proper initialization ordering. Environment variables correctly pass cluster-specific Redis addresses to agent startup scripts.

cmd/argocd-agent/principal.go (1)

430-452: LGTM! Well-structured Redis TLS flag definitions.

The flag definitions follow consistent patterns with appropriate defaults (TLS enabled by default) and clear help text. The environment variable mappings align with the codebase conventions.

test/e2e/fixture/cluster.go (3)

182-201: LGTM! Appropriate TLS configuration for E2E tests.

The use of InsecureSkipVerify: true is well-documented in the comments and aligns with the PR objectives, which note this accommodation for dynamic LoadBalancer addresses in E2E tests while preserving TLS encryption.


206-224: LGTM! Well-tuned connection pool settings for E2E tests.

The timeout and pool size configuration is appropriately tuned for E2E test conditions with port-forward latency. The comments clearly explain the rationale for increased values, making future adjustments easier.


232-257: LGTM! Effective Redis client caching for E2E tests.

The caching mechanism prevents connection leaks by reusing Redis clients across test operations. The cache key construction (source + address) is appropriate for distinguishing between different Redis instances.

test/e2e/clusterinfo_test.go (1)

108-115: LGTM! Appropriate timeout adjustments for port-forward latency.

The increased timeouts (30s→60s, 1s→2s polling) are well-justified by the inline comments explaining port-forward latency in long test runs. The consistent application across related assertions ensures reliable test behavior.

Also applies to: 123-129, 136-142

docs/configuration/redis-tls.md (1)

1-700: LGTM! Comprehensive and well-structured Redis TLS documentation.

This documentation provides excellent coverage of Redis TLS configuration across development, E2E testing, and production scenarios. The architecture diagrams, troubleshooting guidance, and security best practices sections are particularly valuable.

Note: The markdown linting issues (missing language tags on fenced code blocks at lines 150, 475, 486, 504) were already addressed in previous commits per past review comments.

docs/getting-started/kubernetes/index.md (1)

159-230: LGTM! Clear and actionable Redis TLS setup instructions.

The step-by-step Redis TLS configuration for both control plane and workload clusters is well-structured. The inclusion of verification commands and the cross-reference to the detailed Redis TLS configuration guide provides good user experience.

Also applies to: 337-381

test/run-e2e.sh (2)

62-73: LGTM! Robust TLS validation using jq.

The JSON parsing with jq (lines 64-65) properly validates both the TLS port argument and TLS volume configuration, addressing the past review concern about fragile text grep. The error messages provide clear diagnostics and next steps.


89-122: LGTM! Helpful port-forward detection for local development.

The macOS-specific port-forward detection provides clear guidance for local development scenarios while allowing CI environments with MetalLB to proceed without warnings. The environment variable setup is well-documented.

hack/dev-env/configure-argocd-redis-tls.sh (2)

316-325: LGTM! Clear replica count validation.

The explicit if statements properly ensure at least 1 replica for each component, addressing the past review concern about operator precedence in compound expressions. The logic is now clear and unambiguous.


37-57: LGTM! Appropriate redis.server configuration logic.

The conditional logic correctly sets redis.server for agent clusters while preserving the Redis proxy configuration for the control plane. The debug output helps troubleshoot configuration issues.

principal/redisproxy/redisproxy.go (3)

98-128: LGTM! Clean TLS configuration API.

The new public methods provide a clear interface for configuring Redis proxy TLS. The separation of server TLS (SetServerTLS, SetServerTLSFromPath) and upstream TLS (SetUpstreamTLSCA, SetUpstreamTLSCAPath, SetUpstreamTLSInsecure) is well-designed.


130-154: LGTM! Robust TLS config creation with proper error handling.

The createServerTLSConfig method handles both file-based and in-memory certificate sources with clear error messages. Setting MinVersion: tls.VersionTLS12 is a good security baseline.


162-183: LGTM! Clear conditional TLS listener setup.

The TLS-enabled listener setup is well-structured with appropriate logging distinguishing TLS from non-TLS modes. The error handling provides clear diagnostics.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from bf0d4f8 to e4b8ca8 Compare December 8, 2025 16:26
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/clusterinfo_test.go (1)

150-156: Inconsistent timeout for autonomous agent re-connection check.

The re-connection check for the autonomous agent still uses 30*time.Second, 1*time.Second, while the managed agent re-connection check (Lines 108-115) was increased to 60*time.Second, 2*time.Second. This asymmetry may cause flaky tests in TLS-enabled environments.

Apply this diff for consistency:

 	// Verify that connection status is updated again when agent is re-connected
 	requires.Eventually(func() bool {
 		return fixture.HasConnectionStatus(fixture.AgentAutonomousName, appv1.ConnectionState{
 			Status:     appv1.ConnectionStatusSuccessful,
 			Message:    fmt.Sprintf(message, fixture.AgentAutonomousName, "connected"),
 			ModifiedAt: &metav1.Time{Time: time.Now()},
 		}, clusterDetail)
-	}, 30*time.Second, 1*time.Second)
+	}, 60*time.Second, 2*time.Second)
 }
♻️ Duplicate comments (5)
principal/redisproxy/redisproxy.go (2)

845-850: Add timeout to TCP dial operation.

The net.DialTCP call has no timeout, which can cause connection attempts to hang indefinitely if the upstream Redis is unresponsive. This blocks the goroutine handling the Argo CD connection.

Apply this diff to add a dial timeout:

-	// Dial the resolved address
-	conn, err := net.DialTCP("tcp", nil, addr)
+	// Dial the resolved address with timeout
+	dialer := &net.Dialer{
+		Timeout: 30 * time.Second,
+	}
+	connTmp, err := dialer.Dial("tcp", addr.String())
 	if err != nil {
 		logCtx.WithError(err).WithField("redisAddress", rp.principalRedisAddress).Error("Connection error")
 		return nil, fmt.Errorf("unable to connect to redis '%s': %w", rp.principalRedisAddress, err)
 	}
+	conn := connTmp.(*net.TCPConn)

886-890: Add timeout to TLS handshake.

The TLS handshake has no timeout, which can cause connections to hang indefinitely if the upstream Redis TLS endpoint is unresponsive during negotiation.

Apply this diff to add a handshake timeout:

+	// Set deadline for handshake
+	if err := conn.SetDeadline(time.Now().Add(30 * time.Second)); err != nil {
+		conn.Close()
+		return nil, fmt.Errorf("failed to set handshake deadline: %w", err)
+	}
+
 	tlsConn := tls.Client(conn, tlsConfig)
 	if err := tlsConn.Handshake(); err != nil {
 		conn.Close()
 		return nil, fmt.Errorf("TLS handshake failed: %w", err)
 	}
+
+	// Clear deadline after successful handshake
+	if err := tlsConn.SetDeadline(time.Time{}); err != nil {
+		tlsConn.Close()
+		return nil, fmt.Errorf("failed to clear handshake deadline: %w", err)
+	}
hack/dev-env/configure-redis-tls.sh (1)

202-206: Empty Redis password proceeds without authentication.

This continues with an empty password when the secret is missing. While this provides dev flexibility, ArgoCD components may fail with NOAUTH errors if they expect authentication.

Consider failing fast if password authentication is required in your environment, or document this behavior clearly.

test/e2e/fixture/cluster.go (1)

259-267: Cleanup still doesn't explicitly close Redis connections.

This concern from the previous review remains unaddressed. The function only clears the map and relies on garbage collection to clean up connections. As noted in the previous review, appstatecache.Cache may not expose a Close() method, making explicit cleanup difficult without tracking the underlying redis.Client instances separately.

If explicit connection cleanup is needed, consider:

  1. Verifying whether appstatecache.Cache or the underlying Redis client can be closed
  2. Tracking *redis.Client instances alongside cache instances and closing them explicitly
  3. Documenting that garbage collection handles cleanup if explicit close isn't feasible
agent/agent.go (1)

445-460: Guard against zero cacheRefreshInterval before creating ticker.

This concern from the previous review remains unaddressed. If cacheRefreshInterval is not set by any AgentOption, time.NewTicker(a.cacheRefreshInterval) at line 450 will panic with non-positive interval for NewTicker.

Consider applying the previously suggested fix:

-    ticker := time.NewTicker(a.cacheRefreshInterval)
+    interval := a.cacheRefreshInterval
+    if interval <= 0 {
+        interval = 30 * time.Second
+    }
+    ticker := time.NewTicker(interval)

Alternatively, initialize a.cacheRefreshInterval to a sensible default in NewAgent.

🧹 Nitpick comments (5)
principal/redisproxy/redisproxy.go (2)

853-854: Consider warning when server TLS enabled without upstream TLS.

The condition if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) means if the Redis proxy server has TLS enabled but no upstream TLS configuration is provided, it will connect to the principal's Redis over plain TCP. This could expose sensitive data in transit within the cluster.

Consider adding a warning when this mismatch occurs:

+	// Warn if server TLS is enabled but no upstream TLS configured
+	hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure
+	if rp.tlsEnabled && !hasUpstreamTLSConfig {
+		logCtx.Warn("Redis proxy server has TLS enabled, but no upstream TLS configuration provided. Connection to principal Redis will be unencrypted.")
+	}
+
 	// If TLS is enabled for upstream, wrap the connection with TLS
-	if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+	if rp.tlsEnabled && hasUpstreamTLSConfig {

858-877: Consider warning if CA configured but ignored due to InsecureSkipVerify.

The else if structure means if rp.upstreamTLSInsecure is true, any configured CA pool or CA path is silently ignored. While this may be intentional for test environments, it could lead to confusion.

Consider logging a warning if CA configuration is provided but ignored:

 	if rp.upstreamTLSInsecure {
 		logCtx.Warn("INSECURE: Not verifying upstream Redis TLS certificate")
 		tlsConfig.InsecureSkipVerify = true
+		if rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" {
+			logCtx.Warn("CA configuration provided but ignored due to InsecureSkipVerify=true")
+		}
hack/dev-env/configure-redis-tls.sh (1)

68-70: Context switch error handling relies on set -e.

While set -e will cause the script to exit on failure, adding explicit error handling provides clearer feedback when the context doesn't exist.

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }
test/e2e/redis_proxy_test.go (2)

120-123: Sleep as race condition workaround.

The 5-second sleep addresses a race between SSE stream establishment and Redis subscription activation. While pragmatic, a more robust approach would be to verify subscription is active before proceeding (e.g., receiving an initial heartbeat or confirmation message).

Consider adding a mechanism to confirm subscription is active rather than relying on a fixed delay, which may be insufficient under high load or too long in fast environments.


187-208: Duplicate drain logic between test methods.

The SSE message drain logic is nearly identical between Test_RedisProxy_ManagedAgent_Argo and Test_RedisProxy_AutonomousAgent_Argo. Consider extracting to a helper function.

func drainChannelForPod(t *testing.T, msgChan chan string, podName string) bool {
    messagesDrained := false
    for {
        select {
        case msg := <-msgChan:
            messagesDrained = true
            t.Logf("Processing SSE message (looking for pod %s)", podName)
            if strings.Contains(msg, podName) {
                t.Logf("Found new pod name in SSE stream: %s", podName)
                return true
            }
        default:
            if messagesDrained {
                t.Log("Drained all available messages, pod not found yet, will retry...")
            }
            return false
        }
    }
}

Also applies to: 406-427

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf0d4f8 and e4b8ca8.

📒 Files selected for processing (30)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/setup-vcluster-env.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/fixture/toxyproxy.go (1 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • principal/resource.go
  • principal/tracker/tracking.go
  • principal/listen.go
  • hack/dev-env/setup-vcluster-env.sh
  • test/e2e/fixture/argoclient.go
  • test/run-e2e.sh
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • test/e2e/rp_test.go
  • hack/dev-env/start-e2e.sh
  • Makefile
  • cmd/argocd-agent/agent.go
  • hack/dev-env/configure-argocd-redis-tls.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-principal.sh
  • hack/dev-env/start-agent-managed.sh
  • test/e2e/README.md
  • hack/dev-env/Procfile.e2e
🧬 Code graph analysis (5)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (2)
  • cleanup (39-41)
  • apply (94-247)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (4)
  • HasConnectionStatus (60-74)
  • AgentManagedName (37-37)
  • ClusterDetails (42-56)
  • AgentAutonomousName (38-38)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
cmd/argocd-agent/principal.go (4)
principal/options.go (6)
  • WithInformerSyncTimeout (439-444)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (3)
  • DurationWithDefault (168-177)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
🪛 LanguageTool
test/e2e/README.md

[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...

(AI_HYDRA_LEO_MISSING_COMMA)

docs/configuration/redis-tls.md

[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...

(AI_HYDRA_LEO_MISSING_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build & cache Go code
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (17)
internal/argocd/cluster/cluster.go (2)

135-142: Good defensive initialization of ConnectionState.

This ensures that when SetClusterCacheStats is called before SetAgentConnectionStatus (e.g., agent just connected and sends cache stats first), the ConnectionState is properly initialized rather than left empty.


176-185: TLS configuration properly wired to Redis client.

The signature change and TLSConfig wiring are appropriate. The go-redis library handles nil TLSConfig gracefully (no TLS when nil).

hack/dev-env/start-agent-autonomous.sh (1)

85-85: --insecure-tls skips server certificate verification.

This is acceptable for local development but should not be used in production. The flag is appropriately placed for the dev script.

hack/dev-env/gen-redis-tls-certs.sh (1)

1-150: Well-structured certificate generation script.

The script addresses previous review feedback (stderr handling, LOCAL_IP conditional). It's idempotent, uses appropriate validity periods, and has clear separation for each certificate type.

test/e2e/fixture/fixture.go (4)

110-113: Timeout increases appropriate for TLS environments.

Doubling the wait iterations from 60 to 120 seconds accommodates the additional latency from TLS handshakes and port-forward operations in the test environment.


232-241: Warning-and-continue pattern for cleanup resilience.

Changing from hard failures to warnings during cleanup improves test stability, especially in TLS environments where transient connection issues are more common. However, this may mask legitimate issues.

Ensure test logs are monitored for recurring warnings that might indicate systemic problems rather than transient issues.


317-325: Good use of DeepCopy to avoid modifying loop variables.

Using DeepCopy() before modifying the namespace/name prevents unintended side effects on the original loop variable.


487-491: Graceful handling of Redis unavailability during cleanup.

Logging a warning instead of failing when Redis is unavailable (e.g., port-forward died) prevents cleanup failures from cascading to test failures.

test/e2e/redis_proxy_test.go (3)

586-588: Buffered channel prevents message loss.

The buffer size of 100 is reasonable for preventing message loss when the consumer is temporarily slow. This aligns with the drain-and-retry pattern used in the tests.


211-237: ResourceTree retry pattern handles transient errors.

Wrapping the ResourceTree call in Eventually with proper error logging handles transient Redis connection issues (EOF errors) gracefully.


642-653: HTTP client properly configured for SSE streams.

Setting Timeout: 0 and ResponseHeaderTimeout: 0 is correct for SSE streams that are long-lived. The IdleConnTimeout: 300s helps maintain connection pools.

hack/dev-env/Procfile.e2e (1)

1-7: LGTM! Port-forward and startup configuration supports TLS setup.

The port-forward entries and updated startup commands with sleep delays properly support the TLS-enabled Redis configuration. The sleep delays ensure proper startup ordering (principal starts first, then agents), and the environment variable overrides for Redis addresses align with the TLS configuration changes.

agent/agent.go (1)

323-343: LGTM! TLS configuration correctly mirrors Redis proxy setup.

The TLS configuration for the cluster cache Redis client is well-structured:

  • Warning log added when using InsecureSkipVerify (line 330), addressing the previous review feedback
  • CA certificate loading correctly reads the file, creates a cert pool, and parses the PEM data
  • Error handling appropriately reports read and parse failures

The implementation aligns with the Redis proxy TLS logic and properly integrates with the updated NewClusterCacheInstance signature.

test/e2e/fixture/cluster.go (4)

181-201: LGTM! TLS configuration appropriate for E2E testing.

The TLS configuration correctly enables encryption for both principal and managed-agent Redis connections when the respective TLS flags are set. The use of InsecureSkipVerify: true is intentional for E2E tests (as noted in the PR description) to accommodate dynamic LoadBalancer addresses while preserving TLS encryption.


206-217: LGTM! Generous timeouts and pool settings appropriate for E2E tests.

The extended timeouts (ReadTimeout: 30s, DialTimeout: 10s) and connection pool configuration (PoolSize: 10, retry settings) are well-justified by the inline comments. These settings accommodate port-forward latency and concurrent test operations, improving E2E test stability.


226-257: LGTM! Caching mechanism prevents connection proliferation.

The caching mechanism with mutex-protected map access correctly prevents creating multiple Redis clients for the same source and address. The cache key construction properly distinguishes between principal and managed-agent clients.


298-327: LGTM! Redis address resolution with appropriate fallbacks.

Both getManagedAgentRedisConfig and getPrincipalRedisConfig implement sensible fallback logic:

  • Primary: LoadBalancer ingress (IP or Hostname)
  • Secondary: spec.LoadBalancerIP (for local vcluster development)
  • Tertiary: ClusterIP (last resort)

The environment variable overrides (MANAGED_AGENT_REDIS_ADDR, ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS) provide flexibility for local development with port-forward scenarios, while TLS is appropriately enabled by default for E2E tests.

Also applies to: 359-387

Comment on lines 63 to 74
# Extract mTLS client certificates and CA from Kubernetes secret for agent authentication
echo "Extracting mTLS client certificates and CA from Kubernetes..."
TLS_CERT_PATH="/tmp/agent-autonomous-tls.crt"
TLS_KEY_PATH="/tmp/agent-autonomous-tls.key"
ROOT_CA_PATH="/tmp/agent-autonomous-ca.crt"
kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
-o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}"
kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
-o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}"
kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \
-o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}"
echo " mTLS client certificates and CA extracted"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Certificate extraction lacks error handling and uses world-readable paths.

The kubectl commands extract sensitive TLS credentials to /tmp without error checking. If the secrets don't exist, the script continues with empty files, causing cryptic TLS errors later.

Consider adding error handling:

 # Extract mTLS client certificates and CA from Kubernetes secret for agent authentication
 echo "Extracting mTLS client certificates and CA from Kubernetes..."
 TLS_CERT_PATH="/tmp/agent-autonomous-tls.crt"
 TLS_KEY_PATH="/tmp/agent-autonomous-tls.key"
 ROOT_CA_PATH="/tmp/agent-autonomous-ca.crt"
-kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
-  -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}"
-kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
-  -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}"
-kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \
-  -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}"
+kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
+  -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" || { echo "Failed to extract TLS cert"; exit 1; }
+kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
+  -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}" || { echo "Failed to extract TLS key"; exit 1; }
+kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \
+  -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}" || { echo "Failed to extract CA cert"; exit 1; }
 echo " mTLS client certificates and CA extracted"
🤖 Prompt for AI Agents
In hack/dev-env/start-agent-autonomous.sh around lines 63 to 74, the kubectl
extraction of TLS certs writes sensitive data to predictable, world-readable
/tmp paths and lacks error handling; update the script to create secure temp
files (use mktemp), restrict permissions to 600 (or set umask) before writing,
check each kubectl command's exit status and verify the resulting file is
non-empty, print a clear error and exit non-zero on failure, and add a trap to
securely remove the temp files on script exit.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from e4b8ca8 to 5147959 Compare December 8, 2025 17:10
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/fixture/argoclient.go (1)

316-338: Potential silent failure when endpoint is empty.

When ARGOCD_SERVER_ADDRESS is not set and the Service has neither LoadBalancerIP nor Ingress hostname, argoEndpoint will be an empty string, yet the function returns nil error. Callers may not expect an empty string endpoint.

Consider validating before returning:

+	if argoEndpoint == "" {
+		return "", fmt.Errorf("unable to determine argocd-server endpoint: no LoadBalancerIP or Ingress hostname found")
+	}
+
 	return argoEndpoint, nil
 }
♻️ Duplicate comments (6)
hack/dev-env/configure-argocd-redis-tls.sh (1)

164-182: Inconsistent volume array handling for argocd-repo-server.

Unlike argocd-server (lines 67-108) which checks if the volumes array exists before deciding whether to create or append, argocd-repo-server directly uses "path": "/spec/template/spec/volumes/-" which will fail if the volumes array doesn't exist. The past review comment flagged this as addressed, but the current code still shows the inconsistency.

Consider applying the same defensive pattern used for argocd-server:

     if ! kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes[?(@.name=="redis-tls-ca")]}' | grep -q "redis-tls-ca"; then
         echo "  Adding redis-tls-ca volume..."
+        
+        # Check if volumes array exists
+        VOLUMES_EXIST=$(kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes}' 2>/dev/null || echo "")
+        
+        if [ -z "$VOLUMES_EXIST" ] || [ "$VOLUMES_EXIST" = "null" ]; then
+            # Create volumes array with first element
+            if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[
+              {
+                "op": "add",
+                "path": "/spec/template/spec/volumes",
+                "value": [{ ... }]
+              }
+            ]'; then
+                echo "  ERROR: Failed to create volumes array"
+                exit 1
+            fi
+        else
             if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[
test/e2e/fixture/cluster.go (1)

259-267: CleanupRedisCachedClients doesn't explicitly close connections.

This was flagged in a past review. The appstatecache.Cache wraps a Redis client that should ideally be closed explicitly. If appstatecache.Cache doesn't expose a Close() method, consider tracking the underlying redis.Client separately for proper cleanup, or verify that garbage collection handles this appropriately for test scenarios.

#!/bin/bash
# Check if appstatecache.Cache or its underlying types expose a Close method
ast-grep --pattern 'func ($RECV *Cache) Close() $_'

# Also check the redis client interface
rg -n "type.*Client.*interface" --type go -A 20 | head -50
agent/agent.go (1)

445-460: Guard against zero cacheRefreshInterval before creating ticker.

The goroutine uses time.NewTicker(a.cacheRefreshInterval) without validating that the interval is positive. If cacheRefreshInterval is not set via options, this will panic with "non-positive interval for NewTicker".

Apply a guard before creating the ticker:

 go func() {
     // Send initial update immediately on startup (don't wait for first ticker)
     a.addClusterCacheInfoUpdateToQueue()

+    interval := a.cacheRefreshInterval
+    if interval <= 0 {
+        interval = 10 * time.Second // Default to 10 seconds if not configured
+    }
-    ticker := time.NewTicker(a.cacheRefreshInterval)
+    ticker := time.NewTicker(interval)
     defer ticker.Stop()
#!/bin/bash
# Check if cacheRefreshInterval has a default value set in options or NewAgent
rg -n "cacheRefreshInterval" --type go -B 2 -A 2
cmd/argocd-agent/principal.go (1)

277-291: Clarify upstream TLS mode validation when using the default CA secret

The mutual-exclusivity check intentionally ignores the default argocd-redis-tls secret name, so combinations like:

  • --redis-upstream-ca-path=... with the default --redis-upstream-ca-secret-name
  • --redis-upstream-tls-insecure=true with the default secret

do not trip modesSet > 1 even though two “modes” are effectively configured, and the secret is silently ignored. That can be surprising for users relying on the default secret.

Consider either:

  • Counting any non‑empty redisUpstreamTLSCASecretName (including the default), or
  • Detecting whether the flag was explicitly set (via c.Flags().Changed("redis-upstream-ca-secret-name")) and only incrementing modesSet when the user actually chose it.

This would make the validation behavior match the “only one mode” promise more closely and avoid silently dropping a configured CA.

principal/redisproxy/redisproxy.go (1)

839-897: Add dial + handshake timeouts and clarify insecure upstream TLS behavior

establishConnectionToPrincipalRedis currently:

  • Uses net.DialTCP with no timeout, and
  • Performs tlsConn.Handshake() with no deadline,

so a slow or blackholed upstream Redis can block this goroutine indefinitely. In addition, when upstreamTLSInsecure is true, any configured CA (pool or path) is silently ignored.

Consider:

  • Replacing net.DialTCP with a net.Dialer (or net.DialTimeout) using a reasonable connect timeout, and
  • Setting a deadline on conn (or tlsConn) before Handshake() and clearing it afterwards, so both connect and handshake failures fail fast instead of hanging.
  • Optionally logging a warning if upstreamTLSInsecure is true while upstreamTLSCA or upstreamTLSCAPath is also set, to make it clear that CA config is being ignored.

This materially improves robustness under network issues and makes insecure mode behavior more transparent.

hack/dev-env/configure-redis-tls.sh (1)

198-207: Fail fast when Redis password is missing instead of silently configuring empty auth

If .data.auth on the argocd-redis secret is missing or empty, the script logs a warning and proceeds with:

REDIS_PASSWORD="""--requirepass", "'"${REDIS_PASSWORD}"'",

so Redis is configured with an empty password. This diverges from typical Argo CD expectations (components usually assume a non‑empty password when the secret exists) and can lead to confusing NOAUTH or auth mismatch errors.

Given this script is part of the dev/E2E setup path, it would be safer to:

  • Treat a missing/empty auth value as a hard error (print a clear message and exit 1), or
  • Explicitly document and require a no‑auth Redis configuration instead of silently falling back.

That keeps the TLS setup deterministic and avoids subtle runtime failures later.

🧹 Nitpick comments (7)
test/e2e/fixture/toxyproxy.go (1)

119-133: LGTM - reasonable timeout adjustment for TLS-enabled principal readiness.

The extended timeout for principal (180s) appropriately accounts for the informer sync timeout (120s) mentioned in the comment. The dynamic timeout approach is clean.

Consider extracting these timeout values as named constants if they're used elsewhere or likely to change:

const (
    defaultReadinessTimeout   = 120 * time.Second
    principalReadinessTimeout = 180 * time.Second
)
cmd/argocd-agent/agent.go (1)

184-199: Redis TLS configuration validation is well-structured.

The mutual exclusivity check between --redis-tls-insecure and --redis-tls-ca-path is appropriate. When TLS is enabled without either flag, the system CA pool will be used (via the default tls.Config behavior in agent/agent.go), which is a reasonable default.

One consideration: when TLS is enabled but neither insecure nor CA path is specified, there's no log message indicating the default behavior. Consider adding an informational log for clarity.

 			if redisTLSInsecure {
 				logrus.Warn("INSECURE: Not verifying Redis TLS certificate")
 				agentOpts = append(agentOpts, agent.WithRedisTLSInsecure(true))
 			} else if redisTLSCAPath != "" {
 				logrus.Infof("Loading Redis CA certificate from file %s", redisTLSCAPath)
 				agentOpts = append(agentOpts, agent.WithRedisTLSCAPath(redisTLSCAPath))
+			} else {
+				logrus.Info("Redis TLS enabled with system CA pool")
 			}
test/e2e/redis_proxy_test.go (1)

120-123: Sleep-based synchronization for SSE stream establishment.

The 5-second sleep is a pragmatic workaround for Redis subscription race conditions in E2E tests. While not ideal, this is acceptable for test reliability. Consider extracting this as a named constant for clarity.

+const sseStreamEstablishmentWait = 5 * time.Second
+
 // Wait for SSE stream to fully establish and Redis SUBSCRIBE to propagate
 // This prevents a race condition where the pod is deleted before the subscription is active
 t.Log("Waiting for SSE stream to fully establish...")
-time.Sleep(5 * time.Second)
+time.Sleep(sseStreamEstablishmentWait)
cmd/argocd-agent/principal.go (1)

259-261: Align informer-sync-timeout default behavior with help text

The flag is wired with a default of 0 and only applied when informerSyncTimeout > 0, while the help text says “default: 60s”. In practice this means “0 = use server default (likely 60s)”, but argocd-agent principal --help will show 0 as the CLI default.

Either:

  • Set the env default to 60s and always pass it through, or
  • Clarify in the description that 0 means “use built‑in default (60s)” instead of stating a literal 60s default.

This avoids confusing operators reading the CLI help.

Also applies to: 434-436

docs/configuration/redis-tls.md (2)

149-156: Tag remaining fenced code blocks with a language

The “How the tunnel works” block and the script output examples (gen-redis-tls-certs.sh, configure-redis-tls.sh, configure-argocd-redis-tls.sh) still use bare triple‑backtick fences, which markdownlint flags (MD040).

Recommend tagging them as plain text, e.g.:

-  ```
+  ```text
   Argo CD Server (remote vcluster) 
   …
-  ```
+  ```

and similarly for the three script output sections. This keeps content unchanged while satisfying linting.

Also applies to: 475-520


331-340: Align documented principal Redis flags/defaults with the actual CLI

In the “All Principal Redis TLS Options” table:

  • The flag is listed as --redis-addr, but the principal command actually exposes --redis-server-address (see cmd/argocd-agent/principal.go).
  • The default for --redis-tls-enabled is documented as true (Kubernetes/Helm), false (CLI), while the code uses env.BoolWithDefault("ARGOCD_PRINCIPAL_REDIS_TLS_ENABLED", true), so the CLI default is effectively true as well unless overridden.

To avoid confusing users, please:

  • Rename the documented flag to --redis-server-address (or explicitly mention both if you decide to add an alias), and
  • Update the default column for --redis-tls-enabled to reflect the actual behavior (e.g., “true (enabled by default for all deployments)” or similar).
hack/dev-env/start-agent-managed.sh (1)

37-46: Clarify when it’s acceptable to run the managed agent without Redis TLS

The script enables Redis TLS when creds/redis-tls/ca.crt exists and otherwise logs a warning and runs without TLS:

if [ -f "${SCRIPTPATH}/creds/redis-tls/ca.crt" ]; thenelse
    echo "Redis TLS certificates not found, running without TLS"
fi

Given the docs state that Redis TLS is required for all E2E tests, this silent fallback to plaintext could mask misconfigured dev/E2E environments.

Consider:

  • Failing fast when TLS creds are missing in E2E flows (e.g., when make setup-e2e has been run or under a guard env var), or
  • Explicitly documenting that this script allows non‑TLS Redis only for ad‑hoc local development and that E2E runs must ensure TLS creds exist.

The mTLS client cert/CA extraction and wiring into go run ... agent otherwise look solid.

Also applies to: 49-62, 63-75, 76-86

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4b8ca8 and 5147959.

📒 Files selected for processing (31)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/setup-vcluster-env.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/application_test.go (2 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (3 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/fixture/toxyproxy.go (1 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • principal/resource.go
  • Makefile
  • hack/dev-env/start-agent-autonomous.sh
  • hack/dev-env/start-principal.sh
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • principal/tracker/tracking.go
  • test/e2e/clusterinfo_test.go
  • principal/listen.go
  • hack/dev-env/setup-vcluster-env.sh
  • hack/dev-env/Procfile.e2e
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/rp_test.go
  • test/run-e2e.sh
  • hack/dev-env/start-agent-managed.sh
  • test/e2e/application_test.go
  • test/e2e/README.md
  • hack/dev-env/start-e2e.sh
🧬 Code graph analysis (7)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • cleanup (39-41)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (316-338)
  • GetInitialAdminSecret (303-314)
  • NewArgoClient (53-67)
test/e2e/fixture/argoclient.go (1)
test/e2e/fixture/kubeclient.go (1)
  • KubeClient (67-73)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
cmd/argocd-agent/agent.go (4)
agent/options.go (3)
  • WithRedisTLSEnabled (112-117)
  • WithRedisTLSInsecure (128-133)
  • WithRedisTLSCAPath (120-125)
principal/options.go (1)
  • WithRedisTLSEnabled (493-498)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
test/e2e/application_test.go (1)
test/e2e/fixture/argoclient.go (1)
  • IsArgoCDRepoServerReady (492-505)
🪛 LanguageTool
test/e2e/README.md

[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...

(AI_HYDRA_LEO_MISSING_COMMA)

docs/configuration/redis-tls.md

[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Lint Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (30)
test/e2e/fixture/argoclient.go (1)

489-505: LGTM - clean readiness helper.

The function correctly checks for at least one available replica and appropriately returns false on any error, which is suitable for readiness checks in test fixtures.

test/e2e/rp_test.go (3)

158-169: LGTM - good refactoring to use centralized fixture helpers.

Replacing inline K8s API calls with fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret improves consistency and maintainability across tests.


294-306: LGTM - consistent with other test refactoring.

Same pattern applied here as in Test_ResourceProxy_Argo, using the centralized fixture helpers for endpoint and credential retrieval.


509-510: LGTM - minor formatting improvement.

No behavioral change, just cleaner request construction.

internal/argocd/cluster/cluster.go (2)

135-142: LGTM - sensible default initialization for ConnectionState.

Initializing ConnectionState with Successful status when receiving cache stats (and no prior state exists) is logically correct—receiving cache info implies the agent is connected.


176-191: LGTM - TLS configuration correctly wired to Redis client.

The tlsConfig *tls.Config parameter is properly passed to redis.Options.TLSConfig. When tlsConfig is nil, the go-redis client will use non-TLS connections, which maintains backward compatibility.

Ensure that all existing callers of NewClusterCacheInstance have been updated to pass the new tlsConfig parameter.

test/e2e/fixture/fixture.go (5)

110-113: LGTM - appropriate timeout increase for TLS-enabled environment.

Doubling the deletion timeout from 60s to 120s accommodates potential TLS handshake overhead and slower Redis connections in the new TLS-enabled infrastructure.


230-241: LGTM - improved cleanup robustness and fixed potential mutation bug.

Two important improvements:

  1. Using DeepCopy() before modifying namespace prevents mutating the loop variable
  2. Warning-based error handling prevents cleanup failures from cascading test failures

255-267: LGTM - consistent pattern with autonomous agent cleanup.

Same DeepCopy and warning-based error handling pattern applied correctly here.


310-326: LGTM - AppProject cleanup follows the same robust pattern.

DeepCopy usage for principalAppProject and warning-based error handling correctly applied.


494-500: LGTM - improved error wrapping.

Using %w for error wrapping provides better error chain for debugging when Redis cache operations fail.

cmd/argocd-agent/agent.go (1)

241-250: Redis TLS flags correctly wired with secure defaults.

The flags use sensible defaults:

  • redis-tls-enabled defaults to true (secure by default)
  • redis-tls-insecure defaults to false

This aligns with the PR objective to enable Redis TLS by default.

hack/dev-env/configure-argocd-redis-tls.sh (1)

316-325: Replica guard logic correctly implemented.

The fix from the past review comment has been properly applied using explicit if statements, ensuring both empty and "0" values are correctly handled.

test/e2e/redis_proxy_test.go (4)

186-208: SSE message draining logic is well-structured.

The drain-all-then-retry pattern correctly handles the buffered channel without blocking indefinitely. The messagesDrained flag ensures proper logging behavior. The 120-second timeout with 5-second intervals provides reasonable resilience for E2E tests.


210-237: ResourceTree retry logic handles transient Redis errors gracefully.

Wrapping the ResourceTree call in Eventually with proper nil and error checks addresses the transient EOF errors mentioned in the comments. The 30-second timeout with 2-second intervals is appropriate for this verification step.


642-653: HTTP transport configuration appropriate for SSE streams.

The transport settings are well-suited for long-lived SSE connections:

  • Timeout: 0 allows indefinite streaming
  • IdleConnTimeout: 300s keeps connections alive
  • InsecureSkipVerify: true is documented as intentional for E2E tests with dynamic LoadBalancer addresses

588-588: Buffered channel size is reasonable for E2E tests.

A buffer of 100 messages should handle typical SSE event bursts. If tests become flaky due to message loss, consider increasing this or adding overflow detection.

test/e2e/fixture/cluster.go (3)

180-201: TLS configuration for E2E tests is appropriate.

InsecureSkipVerify: true is acceptable for E2E tests where certificate validation complexity would add friction. The inline comments clearly document this is for E2E tests only.


206-217: Connection pool and timeout settings are generous for E2E stability.

The increased timeouts (30s read, 10s dial/write) and pool settings (size 10, retry backoff) help handle E2E test latency and concurrent operations. These are reasonable for test environments.


319-326: Environment variable override for local development is a good addition.

Allowing MANAGED_AGENT_REDIS_ADDR override enables local development with port-forward while defaulting to the discovered address for E2E tests.

agent/agent.go (2)

323-343: TLS configuration for cluster cache is correctly implemented.

The TLS config construction properly handles:

  • Insecure mode with warning log (line 330)
  • CA certificate loading with error handling
  • Consistency with Redis proxy TLS logic

The warning message now aligns with the principal code pattern.


19-23: New imports for TLS support are appropriate.

The added imports (crypto/tls, crypto/x509, os) are necessary for TLS configuration and CA certificate loading.

test/e2e/application_test.go (1)

3-6: Repo-server readiness gate in SetupSuite looks good

Waiting up to 120s with Require().Eventually on IsArgoCDRepoServerReady before creating the Argo client is a solid way to reduce test flakiness when the repo-server is slow to become available. No issues spotted.

Also applies to: 28-35

hack/dev-env/gen-redis-tls-certs.sh (1)

14-27: Redis TLS certificate generation script looks solid

The script cleanly generates a CA and per‑component Redis certificates with appropriate SANs, avoids suppressing OpenSSL errors, and conditionally adds the local IP to the proxy certificate. Cleanup of temporary CSR/EXT/SRL files at the end is also a nice touch. No changes needed from my side.

Also applies to: 34-58, 60-103, 105-135

test/run-e2e.sh (3)

32-45: Verify certificate validation completeness.

The validation checks only for ca.crt on the host filesystem. The past review requested validation of all three certificate files (ca.crt, server.crt, server.key). Clarify whether server.crt and server.key are:

  • Expected on the host and should be validated here, or
  • Deployed to the pod by Kubernetes and thus validated only through the deployment check on lines 62–77.

If they should be present on the host, update the validation to check all three files.


62-77: Robust TLS detection using jq.

The TLS configuration validation properly uses jq to check for both the --tls-port argument and redis-tls volume, with clear per-condition error messages. This addresses the prior concern about fragile text-based grep matching.


88-122: macOS port-forward detection and environment configuration.

The script properly detects the macOS environment, checks for required port-forwards, and sets appropriate Redis address environment variables for local development. The warning+continue approach allows for both local and CI scenarios.

hack/dev-env/start-e2e.sh (1)

50-59: Static Redis addresses and proper export handling.

The script now uses static localhost addresses with explicit ports for all Redis endpoints, which simplifies TLS certificate validation and aligns with the port-forward strategy in test/run-e2e.sh. The REDIS_PASSWORD assignment and export are properly separated (lines 58–59), addressing the shellcheck SC2155 concern about masking return values.

test/e2e/README.md (2)

21-88: Clear multi-step workflow documentation.

The README provides a well-structured, multi-step workflow covering setup, optional reverse tunnel configuration, process startup, and test execution. The optional reverse tunnel scenario (Step 1b) is clearly marked and well-explained, and the environment detection (local vs. CI) addresses platform-specific concerns. The documentation properly reflects the TLS-mandatory requirement stated on line 29.


94-105: Verify existence of referenced Redis TLS scripts in hack/dev-env/.

The README documents manual reconfiguration using three scripts:

  • ./hack/dev-env/gen-redis-tls-certs.sh
  • ./hack/dev-env/configure-redis-tls.sh
  • ./hack/dev-env/configure-argocd-redis-tls.sh

Past reviews flagged these scripts as non-existent. Confirm that these scripts are present in hack/dev-env/ and that they implement the documented behavior: certificate generation, Redis TLS configuration, and Argo CD component reconfiguration for each vcluster variant (control-plane, agent-managed, agent-autonomous).

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 5147959 to 3b0283f Compare December 8, 2025 17:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (9)
principal/redisproxy/redisproxy.go (4)

846-850: Add timeout to TCP dial operation.

The net.DialTCP call still lacks a timeout, which can cause the connection attempt to hang indefinitely if the upstream Redis is unresponsive. This blocks the goroutine handling the Argo CD connection.

This issue was previously flagged and remains unresolved. See the past review comment on lines 846-850 for the suggested fix using net.Dialer with a timeout.


853-853: Ensure TLS is required for upstream when server TLS is enabled.

The upstream TLS connection is only established when both rp.tlsEnabled is true AND upstream TLS configuration is provided (Line 853). If the Redis proxy server has TLS enabled but no upstream TLS configuration is provided, it will connect to the principal's Redis over plain TCP, potentially exposing sensitive data in transit.

This security issue was previously flagged and remains unresolved. See the past review comment on lines 853-894 for the suggested fix to enforce upstream TLS or log a warning when this configuration mismatch occurs.


858-877: InsecureSkipVerify takes precedence over CA configuration.

The if/else if structure means that when rp.upstreamTLSInsecure is true, any configured CA pool or CA path is silently ignored. While this may be intentional for test environments, it could be unexpected behavior.

This issue was previously flagged and remains unresolved. See the past review comment on lines 858-877 for the suggested warning log when CA is configured but ignored due to insecure mode.


886-890: Add timeout to TLS handshake.

The TLS handshake has no timeout, which can cause the connection to hang indefinitely if the upstream Redis TLS endpoint is unresponsive during negotiation.

This issue was previously flagged and remains unresolved. See the past review comment on lines 886-890 for the suggested fix using conn.SetDeadline() before and after the handshake.

docs/configuration/redis-tls.md (1)

149-156: Tag remaining fenced blocks with a language (text) to satisfy markdownlint

The “How the tunnel works” diagram and the three script output examples (gen-redis-tls-certs.sh, configure-redis-tls.sh, configure-argocd-redis-tls.sh) still use bare triple‑backtick fences, triggering MD040. Consider tagging them as plain text:

-  ```
+  ```text
   Argo CD Server (remote vcluster) 
   ...
-  ```
+  ```

and similarly for the script output sections around lines 475–520.

Also applies to: 475-483, 485-501, 503-520

docs/getting-started/kubernetes/index.md (1)

205-212: Fix $(REDIS_PASSWORD) in JSON patch examples (no expansion inside single quotes)

In both Redis TLS patch examples, --requirepass uses "$(REDIS_PASSWORD)" inside a single‑quoted -p='[...]' argument, so the shell never expands it and Redis ends up with the literal string "$(REDIS_PASSWORD)" as the password.

Consider either:

  • Using a clear placeholder, e.g. "--requirepass", "<redis-password>", and explaining how to obtain it from the argocd-redis secret, or
  • Showing an interpolated pattern, e.g.:
REDIS_PASSWORD="$(kubectl -n argocd get secret argocd-redis -o jsonpath='{.data.auth}' | base64 -d)"

kubectl patch deployment argocd-redis -n argocd --context <context> --type='json' -p="$(
  cat <<EOF
[
  {"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": [
    "--save", "", "--appendonly", "no", "--requirepass", "$REDIS_PASSWORD",
    "--tls-port", "6379", "--port", "0",
    "--tls-cert-file", "/app/tls/tls.crt", "--tls-key-file", "/app/tls/tls.key",
    "--tls-ca-cert-file", "/app/tls/ca.crt", "--tls-auth-clients", "no"
  ]}
]
EOF
)"

and apply the same fix in both Step 2.4 and Step 4.4.

Also applies to: 372-378

test/e2e/fixture/fixture.go (1)

487-491: Minor: extra leading space in warning message.

Line 489 has a leading space in the format string: " Warning: Failed...". This is inconsistent with other warning messages that start without a leading space.

-		fmt.Printf(" Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)
+		fmt.Printf("Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)
agent/agent.go (1)

445-460: Guard against zero cacheRefreshInterval before creating ticker.

The goroutine uses time.NewTicker(a.cacheRefreshInterval) without ensuring the interval is > 0. If no AgentOption sets cacheRefreshInterval, this will panic at runtime with "non-positive interval for NewTicker".

 	go func() {
 		// Send initial update immediately on startup (don't wait for first ticker)
 		a.addClusterCacheInfoUpdateToQueue()

+		interval := a.cacheRefreshInterval
+		if interval <= 0 {
+			interval = 30 * time.Second // Default fallback
+		}
-		ticker := time.NewTicker(a.cacheRefreshInterval)
+		ticker := time.NewTicker(interval)
 		defer ticker.Stop()
test/e2e/fixture/cluster.go (1)

259-267: CleanupRedisCachedClients doesn't explicitly close connections.

The cleanup function only clears the map, relying on garbage collection to close connections. For proper resource management, the underlying Redis clients should be explicitly closed.

Since appstatecache.Cache doesn't expose the underlying Redis client for closing, consider either:

  1. Tracking redis.Client instances separately alongside the cache
  2. Verifying through testing that GC properly closes connections

This may be acceptable for E2E tests but is worth monitoring for connection leaks during test runs.

🧹 Nitpick comments (3)
test/e2e/fixture/argoclient.go (1)

489-513: Consider refactoring the return type for idiomatic Go.

The (bool, string) return pattern is unconventional. Idiomatic Go typically uses (bool, error) or just error to distinguish between "not ready" states and actual failures (e.g., permission errors, deployment doesn't exist).

Current behavior treats API errors the same as "deployment exists but isn't ready," which may mask actual problems in wait loops. While this might be intentional for test resilience with transient conditions, the semantic distinction would be clearer with an error type.

Consider this refactor:

-func IsArgoCDRepoServerReady(k8sClient KubeClient, namespace string) (bool, string) {
+func IsArgoCDRepoServerReady(k8sClient KubeClient, namespace string) (bool, error) {
 	ctx := context.Background()
 
 	// Try to get the repo-server deployment
 	deployment := &appsv1.Deployment{}
 	key := types.NamespacedName{Name: "argocd-repo-server", Namespace: namespace}
 	err := k8sClient.Get(ctx, key, deployment, metav1.GetOptions{})
 	if err != nil {
-		return false, fmt.Sprintf("Failed to get deployment: %v", err)
+		return false, fmt.Errorf("failed to get deployment: %w", err)
 	}
 
 	// Check if the deployment has at least one available replica
 	if deployment.Status.AvailableReplicas > 0 {
-		return true, ""
+		return true, nil
 	}
 
 	// Return diagnostic information about why it's not ready
-	return false, fmt.Sprintf("Replicas: %d/%d available, Conditions: %v",
+	return false, fmt.Errorf("not ready - replicas: %d/%d available, conditions: %v",
 		deployment.Status.AvailableReplicas,
 		deployment.Status.Replicas,
 		deployment.Status.Conditions)
 }

This preserves diagnostic information while providing clearer error semantics for callers.

test/e2e/redis_proxy_test.go (1)

120-124: SSE stream robustness changes look good; keep InsecureSkipVerify test‑only

The added wait before pod deletion, buffered SSE channel, “drain all messages then retry” logic, and Eventually wrappers around ResourceTree calls should all help eliminate race‑based flakiness in these Redis proxy tests.

The SSE client’s http.Transport uses &tls.Config{InsecureSkipVerify: true}, which is acceptable here since this code lives under test/e2e and exists purely for test connectivity to dynamically addressed endpoints. Just ensure this pattern stays confined to test code and doesn’t leak into production clients.

Also applies to: 186-209, 326-330, 406-456, 588-653

test/run-e2e.sh (1)

49-77: Redis TLS preflight checks look robust—consider documenting jq as a test prerequisite

The per‑context checks for the argocd-redis-tls secret plus --tls-port arg and redis-tls volume on the argocd-redis Deployment are a solid way to enforce Redis TLS before running e2e tests.

Since this now relies on jq for JSON inspection, it would be helpful to ensure jq is listed as a prerequisite for running make test-e2e (e.g., in contributor docs or a comment near the top of this script) so failures due to a missing jq binary are less surprising.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5147959 and 3b0283f.

📒 Files selected for processing (31)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/setup-vcluster-env.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/application_test.go (2 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (3 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/fixture/toxyproxy.go (1 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (14)
  • principal/resource.go
  • principal/tracker/tracking.go
  • test/e2e/application_test.go
  • hack/dev-env/start-agent-autonomous.sh
  • test/e2e/fixture/toxyproxy.go
  • principal/listen.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • hack/dev-env/configure-argocd-redis-tls.sh
  • hack/dev-env/start-principal.sh
  • cmd/argocd-agent/agent.go
  • test/e2e/rp_test.go
  • hack/dev-env/setup-vcluster-env.sh
  • test/e2e/clusterinfo_test.go
  • Makefile
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/run-e2e.sh
  • test/e2e/README.md
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-agent-managed.sh
🧬 Code graph analysis (4)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (2)
  • cleanup (39-41)
  • apply (94-247)
test/e2e/fixture/argoclient.go (1)
test/e2e/fixture/kubeclient.go (1)
  • KubeClient (67-73)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/redisproxy/redisproxy.go (1)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
🪛 LanguageTool
test/e2e/README.md

[uncategorized] ~107-~107: Possible missing comma found.
Context: ...host port-forwards (which match the certificate SANs). TLS encryption is fully enabled...

(AI_HYDRA_LEO_MISSING_COMMA)

docs/configuration/redis-tls.md

[duplication] ~115-~115: Possible typo: you repeated a word.
Context: ... vclusters (Recommended) - Description: vclusters run on local microk8s/k3d/kind on you...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~178-~178: Possible missing comma found.
Context: ...ey}`) - For principal's Redis proxy - Automatically includes your Mac's local I...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run unit tests
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Lint Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (32)
hack/dev-env/gen-redis-tls-certs.sh (1)

1-150: Script looks solid—previous issues have been resolved.

The two previously flagged issues are now addressed:

  1. Error suppression in OpenSSL commands has been removed (lines 17, 20, 25).
  2. Empty LOCAL_IP is handled gracefully with conditional append (lines 87–90).

The script is well-structured: it's idempotent (checks for existing keys/certs before regenerating), uses set -e for error handling, cleans up temporary files, and handles both macOS and Linux for local IP detection. The graceful fallback to empty LOCAL_IP with conditional SAN entry is correct.

hack/dev-env/configure-redis-tls.sh (6)

198-206: Verify password requirement for Redis configuration.

A past review (lines 202-206) explicitly requested that the script "fail hard" when the Redis password secret is missing, citing that ArgoCD components expect authentication. However, the current code (lines 202-206) issues a warning and continues with an empty password.

Clarify the intended behavior:

  • If Redis authentication is required for E2E tests, the script should fail when argocd-redis secret is missing.
  • If graceful degradation to unauthenticated Redis is acceptable, document this assumption and update the warning message to reflect the impact.

136-196: Volume and volumeMount patching logic is sound and idempotent.

The conditional checks (lines 139, 169) and JSON patch operations correctly handle cases where volumes/volumeMounts may or may not already exist. Re-running the script safely skips redundant patches. The logic is correct.


18-54: Error handling structure is solid.

The combination of set -e, trap-based cleanup, and explicit error checks on critical operations provides good defense-in-depth. Early validation (lines 61-76) catches issues before state mutations. The cleanup trap (line 54) ensures the initial context is restored regardless of exit path.

Also applies to: 61-76


85-96: Replica count storage is idempotent and safe.

The use of --dry-run=client -o yaml | kubectl apply correctly handles both creation and update, ensuring the pattern is re-entrant. If this step fails, the script exits and cleanup restores the initial context. Downstream scripts reading this ConfigMap should handle potential missing data gracefully.


239-253: Verification section appropriately informational.

Post-rollout verification provides helpful feedback without blocking on transient states. The earlier rollout status command (line 231) enforces correctness, while this final check (lines 239-253) is user-friendly diagnostics.


68-71: Add explicit error check for context switch.

Although set -e provides implicit safety (script exits on failure), explicit error checks with clear messages improve debuggability and document intent.

Apply this diff to add explicit error handling:

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }
test/e2e/fixture/argoclient.go (3)

27-27: LGTM! Imports are correctly added.

The new imports (os and appsv1) are properly used in the added functionality.

Also applies to: 30-30


317-320: LGTM! Good optimization to avoid unnecessary K8s API calls.

The environment variable check provides a simple override mechanism and improves test performance.


322-322: LGTM! Formatting and defensive checks improve code quality.

The added comment clarifies the fallback logic, and the hostname check is good defensive programming.

Also applies to: 330-335

principal/redisproxy/redisproxy.go (6)

21-23: LGTM!

The new imports for TLS support (crypto/tls, crypto/x509, os) are appropriate and necessary for the TLS functionality added in this file.

Also applies to: 27-27


65-75: LGTM!

The TLS configuration fields are well-structured, clearly separating server-side and upstream TLS concerns. Supporting both in-memory certificates and path-based loading provides good flexibility.


98-128: LGTM!

The TLS configuration setters provide a clean API surface. The comment on SetUpstreamTLSInsecure appropriately warns that it's for testing only.


130-154: LGTM!

The TLS configuration builder correctly handles both path-based and in-memory certificates, with appropriate error handling. Setting MinVersion to TLS 1.2 provides a good balance between security and compatibility.


157-200: LGTM!

The Start() method cleanly integrates TLS support with clear branching between TLS and plaintext modes. Error handling and logging are appropriate for both paths.


221-221: LGTM!

Converting the connection establishment to a method call enables access to the TLS configuration stored in the RedisProxy instance.

hack/dev-env/start-agent-managed.sh (1)

37-75: Redis TLS and mTLS wiring in dev agent startup looks consistent

TLS detection via creds/redis-tls/ca.crt, defaulting the Redis address to localhost:6381, and extracting client cert/CA from Kubernetes secrets into /tmp all look correct for the dev/E2E workflow and align with the documented Redis TLS setup.

Also applies to: 76-90

internal/argocd/cluster/cluster.go (1)

135-142: TLS‑enabled cluster cache wiring and connection state initialization look correct

Passing *tls.Config into redis.Options.TLSConfig in NewClusterCacheInstance is the right way to enable Redis TLS for the cluster cache, and the logic in SetClusterCacheStats to initialize ConnectionState when none exists avoids empty status for newly reporting agents while preserving any existing state.

Also applies to: 175-191

hack/dev-env/start-e2e.sh (1)

50-59: Localhost address exports and Redis password wiring align with TLS setup

Exporting the principal, agent, and Argo CD server addresses as localhost with fixed ports (6380/6381/6382/8444) matches the documented certificate SANs and simplifies the dev/e2e environment. Fetching REDIS_PASSWORD from the managed agent’s argocd-redis secret once and exporting it is also a clean way to keep the Redis auth in sync with the cluster.

test/e2e/fixture/fixture.go (4)

110-113: Timeout increases look reasonable for TLS-enabled Redis.

The increased timeouts from 60s to 120s accommodate the additional latency that TLS handshakes and encrypted operations may introduce, especially during cleanup operations. This is a sensible adjustment for the TLS-enabled environment.

Also applies to: 143-144, 161-161


232-241: Good resilience improvement: continue cleanup despite individual failures.

Converting hard errors to warnings during cleanup prevents a single failing deletion from blocking the entire cleanup process. This is especially useful in TLS-enabled environments where transient connection issues may occur.

Also applies to: 255-266, 276-279, 288-292


236-238: Correct use of DeepCopy to avoid mutating loop variables.

Using DeepCopy() before modifying namespace ensures the original loop variable isn't mutated, which could cause subtle bugs in subsequent iterations. This is a proper fix.

Also applies to: 261-263, 317-321, 350-353


497-499: LGTM: Proper error wrapping and cache instance switching.

Using getCachedCacheInstance aligns with the caching strategy in cluster.go, and wrapping the error with %w enables proper error chain inspection.

agent/agent.go (2)

323-343: LGTM: TLS configuration for cluster cache is well-implemented.

The TLS setup properly:

  • Sets minimum TLS version to 1.2
  • Logs a warning when using insecure mode (line 330)
  • Loads and validates CA certificates from the filesystem
  • Returns clear error messages for failure cases

345-349: Correct integration with updated NewClusterCacheInstance signature.

The TLS config is properly passed to the cluster cache constructor, maintaining consistency with the Redis proxy's TLS configuration.

hack/dev-env/Procfile.e2e (2)

1-7: LGTM: Procfile properly sets up port-forwards and process dependencies.

The configuration correctly:

  • Sets up Redis port-forwards on distinct ports (6380-6382) to avoid conflicts
  • Uses appropriate delays to ensure port-forwards are established before starting dependent processes
  • Passes Redis addresses via environment variables for flexibility

6-7: Ensure environment variables are set before running goreman.

The agents depend on $MANAGED_AGENT_REDIS_ADDR and $AUTONOMOUS_AGENT_REDIS_ADDR environment variables. Verify these are exported by start-e2e or documented in the README for manual execution.

test/e2e/fixture/cluster.go (4)

180-201: Appropriate use of InsecureSkipVerify for E2E tests.

Using InsecureSkipVerify: true in test fixtures is acceptable given the documented rationale in README.md. The tests need to connect via dynamic addresses (port-forwards, LoadBalancer IPs) that may not match certificate SANs.


206-217: Good defensive configuration for E2E test stability.

The generous timeouts and connection pool settings appropriately handle:

  • Port-forward latency (10s dial, 30s read)
  • Concurrent test operations (pool size 10)
  • Connection lifecycle management (idle timeouts, retries)

232-257: LGTM: Redis client caching prevents connection leaks.

The caching mechanism correctly:

  • Uses a mutex for thread-safe access
  • Creates cache keys based on source and address
  • Reuses existing clients instead of creating new ones per call

308-327: Robust fallback chain for Redis address resolution.

The address resolution order (LoadBalancer ingress → spec.loadBalancerIP → ClusterIP) with environment variable override provides flexibility for:

  • CI environments with MetalLB (LoadBalancer)
  • Local development (port-forwards via env override)
  • Fallback scenarios

Also applies to: 369-387

test/e2e/README.md (1)

83-105: Confirm that all three Redis TLS configuration scripts are included in this PR.

The manual reconfiguration section references three scripts:

  • ./hack/dev-env/gen-redis-tls-certs.sh
  • ./hack/dev-env/configure-redis-tls.sh
  • ./hack/dev-env/configure-argocd-redis-tls.sh

These scripts were previously flagged as missing. Ensure they are committed as part of this PR, or remove this section if they are not available.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 2 times, most recently from 97e15ae to 291dd51 Compare December 9, 2025 13:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/clusterinfo_test.go (1)

150-156: Inconsistent timeout in re-connect assertion.

This assertion uses 30s/1s while the earlier checks in Test_ClusterInfo_Autonomous (lines 124-129, 136-142) use the increased 60s/2s timeouts. For consistency and to avoid flaky tests, consider aligning this timeout with the others.

Apply this diff:

 	requires.Eventually(func() bool {
 		return fixture.HasConnectionStatus(fixture.AgentAutonomousName, appv1.ConnectionState{
 			Status:     appv1.ConnectionStatusSuccessful,
 			Message:    fmt.Sprintf(message, fixture.AgentAutonomousName, "connected"),
 			ModifiedAt: &metav1.Time{Time: time.Now()},
 		}, clusterDetail)
-	}, 30*time.Second, 1*time.Second)
+	}, 60*time.Second, 2*time.Second)
 }
♻️ Duplicate comments (6)
hack/dev-env/start-agent-autonomous.sh (1)

63-74: Add error handling for certificate extraction.

The kubectl commands extract TLS credentials without error checking. If secrets are missing or extraction fails, the script continues with empty files, causing TLS failures at runtime.

Apply this diff to add error handling:

 # Extract mTLS client certificates and CA from Kubernetes secret for agent authentication
 echo "Extracting mTLS client certificates and CA from Kubernetes..."
 TLS_CERT_PATH="/tmp/agent-autonomous-tls.crt"
 TLS_KEY_PATH="/tmp/agent-autonomous-tls.key"
 ROOT_CA_PATH="/tmp/agent-autonomous-ca.crt"
 kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
-  -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}"
+  -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" || { echo "Failed to extract TLS cert"; exit 1; }
 kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
-  -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}"
+  -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}" || { echo "Failed to extract TLS key"; exit 1; }
 kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \
-  -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}"
+  -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}" || { echo "Failed to extract CA cert"; exit 1; }
 echo "✅ mTLS client certificates and CA extracted"
hack/dev-env/configure-redis-tls.sh (1)

68-76: Verify context switch succeeded before proceeding.

Line 70 switches the kubectl context without checking for errors. If the context doesn't exist or the switch fails, subsequent kubectl commands may target the wrong cluster, risking unintended configuration changes.

Apply this diff to add error handling:

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || { 
+    echo "Error: Failed to switch to context ${CONTEXT}"
+    echo "Please verify the context exists: kubectl config get-contexts"
+    exit 1
+}
 
 # Check Redis Deployment exists
 if ! kubectl get deployment argocd-redis -n ${NAMESPACE} &>/dev/null; then
test/e2e/fixture/cluster.go (1)

309-317: CleanupRedisCachedClients doesn't explicitly close connections.

The cleanup function only clears the map, relying on garbage collection. The appstatecache.Cache wraps a Redis client that ideally should be explicitly closed for deterministic resource cleanup in tests.

This was flagged in a previous review. If appstatecache.Cache doesn't expose a Close() method, this may be acceptable, but worth tracking as technical debt.

cmd/argocd-agent/principal.go (2)

285-291: Validation still allows conflicting upstream TLS modes when using the default secret name.

The mutual exclusivity check excludes the default secret name "argocd-redis-tls" from the mode count (line 286-287). This means a user can specify --redis-upstream-ca-path=/some/path while --redis-upstream-ca-secret-name remains at its default, and the validation won't catch this conflict. The if-else chain will silently prefer the CA path.

This was flagged in a previous review. Consider either:

  1. Counting all non-empty values regardless of default, or
  2. Tracking whether the flag was explicitly set vs. using the default

434-436: informer-sync-timeout help text is misleading.

The help text says "(0 = use default of 60s)" but the flag's actual default via env.DurationWithDefault is 0. The description should clarify whether 0 means "use internal default" or if there's no timeout.

This was flagged in a previous review. The help text should accurately reflect the behavior.

test/e2e/fixture/fixture.go (1)

487-491: Minor: extra leading space in warning message still present.

Line 489 still has a leading space in the format string: " Warning: Failed...". This was flagged in a previous review.

-		fmt.Printf(" Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)
+		fmt.Printf("Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)
🧹 Nitpick comments (6)
docs/configuration/redis-tls.md (1)

150-150: Optional: tag remaining fenced blocks to satisfy markdownlint.

A few fenced code blocks still lack language tags (triggering MD040). Consider tagging them as text:

  • Line 150 ("How the tunnel works" architecture block)
  • Lines 475-520 (script output examples)

Example:

-```
+```text
 Argo CD Server (remote vcluster)
 ...

This is a low-priority linting issue; the content is already clear.

Also applies to: 475-520

hack/dev-env/configure-argocd-redis-tls.sh (2)

164-182: Consider defensive volume array handling for consistency.

The argocd-repo-server configuration (lines 167-182) directly appends to /spec/template/spec/volumes/- without first checking if the volumes array exists. While this may work in practice (if repo-server always has pre-existing volumes), it's inconsistent with the defensive pattern used for argocd-server (lines 68-108) that handles the case where the array might not exist.

For consistency and robustness, consider applying the same defensive check used for argocd-server. This ensures the script handles edge cases uniformly across all components.


237-255: Consider defensive volume array handling for StatefulSet.

Similar to repo-server, the argocd-application-controller configuration directly appends to the volumes array without checking if it exists. While this may work in practice, the defensive pattern from argocd-server (lines 68-108) would make the script more robust and consistent across all components.

test/e2e/fixture/cluster.go (1)

183-256: Consider extracting TLS configuration into a helper function to reduce duplication.

The TLS configuration logic for PrincipalName (lines 185-216) and AgentManagedName (lines 225-256) is nearly identical. This duplication increases maintenance burden.

Extract a helper function:

func buildRedisTLSConfig(enabled bool, caPath string) *tls.Config {
    if !enabled {
        return nil
    }
    tlsConfig := &tls.Config{
        MinVersion: tls.VersionTLS12,
    }
    if caPath != "" {
        if _, err := os.Stat(caPath); err == nil {
            caCertPEM, err := os.ReadFile(caPath)
            if err != nil {
                panic(fmt.Sprintf("failed to read Redis CA certificate: %v", err))
            }
            certPool := x509.NewCertPool()
            if !certPool.AppendCertsFromPEM(caCertPEM) {
                panic(fmt.Sprintf("failed to parse Redis CA certificate from %s", caPath))
            }
            tlsConfig.RootCAs = certPool
        } else {
            fmt.Printf("Warning: Redis CA certificate not found at %s, skipping verification\n", caPath)
            tlsConfig.InsecureSkipVerify = true
        }
    } else {
        tlsConfig.InsecureSkipVerify = true
    }
    return tlsConfig
}
test/e2e/fixture/argoclient.go (1)

330-335: LoadBalancer IP fallback may miss Ingress IP.

The logic sets argoEndpoint = srvService.Spec.LoadBalancerIP first, then only overwrites with hostname from Ingress. If Ingress[0].IP is set (not hostname), it won't be used. Consider checking both IP and Hostname from Ingress:

 	argoEndpoint := srvService.Spec.LoadBalancerIP
 	if len(srvService.Status.LoadBalancer.Ingress) > 0 {
-		if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" {
+		ingress := srvService.Status.LoadBalancer.Ingress[0]
+		if ingress.IP != "" {
+			argoEndpoint = ingress.IP
+		} else if ingress.Hostname != "" {
-			argoEndpoint = hostname
+			argoEndpoint = ingress.Hostname
 		}
 	}
test/e2e/redis_proxy_test.go (1)

120-124: Using time.Sleep for synchronization is fragile.

While the 5-second sleep helps mitigate a race condition between SSE stream establishment and Redis SUBSCRIBE propagation, this approach is timing-dependent and may still be flaky under load or in slower environments.

Consider implementing a more deterministic synchronization mechanism, such as waiting for a specific initial SSE message or heartbeat that confirms the subscription is active.

Also applies to: 326-329

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0283f and 97e15ae.

📒 Files selected for processing (32)
  • Makefile (1 hunks)
  • agent/agent.go (4 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/agent/configuration.md (1 hunks)
  • docs/configuration/agent/pki-certificates.md (1 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/setup-vcluster-env.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/application_test.go (2 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (3 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (12 hunks)
  • test/e2e/fixture/toxyproxy.go (1 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • docs/configuration/agent/configuration.md
  • docs/configuration/agent/pki-certificates.md
🚧 Files skipped from review as they are similar to previous changes (10)
  • hack/dev-env/setup-vcluster-env.sh
  • hack/dev-env/gen-redis-tls-certs.sh
  • hack/dev-env/start-principal.sh
  • test/e2e/rp_test.go
  • test/e2e/application_test.go
  • test/e2e/README.md
  • hack/dev-env/Procfile.e2e
  • Makefile
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • test/e2e/fixture/toxyproxy.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-agent-autonomous.sh
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/start-agent-managed.sh
  • test/run-e2e.sh
🧬 Code graph analysis (6)
test/e2e/fixture/argoclient.go (1)
test/e2e/fixture/kubeclient.go (1)
  • KubeClient (67-73)
cmd/argocd-agent/agent.go (4)
agent/options.go (3)
  • WithRedisTLSEnabled (112-117)
  • WithRedisTLSInsecure (128-133)
  • WithRedisTLSCAPath (120-125)
principal/options.go (1)
  • WithRedisTLSEnabled (493-498)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
agent/agent.go (3)
internal/manager/manager.go (1)
  • NewDeletionTracker (253-257)
internal/cache/resource_cache.go (1)
  • NewSourceCache (32-38)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
  • Event (112-115)
internal/logging/logfields/logfields.go (1)
  • Event (34-34)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (3)
  • HasConnectionStatus (63-77)
  • AgentManagedName (38-38)
  • ClusterDetails (43-59)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

157-157: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


178-178: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


202-202: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


211-211: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


228-228: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


250-250: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


271-271: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


285-285: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


292-292: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


299-299: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


338-338: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


367-367: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/getting-started/kubernetes/index.md

178-178: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


202-202: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


211-211: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


228-228: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


367-367: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build and push image
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Lint Go code
  • GitHub Check: Analyze (go)
🔇 Additional comments (36)
docs/getting-started/kubernetes/index.md (2)

204-216: ✓ Redis password variable expansion properly fixed.

The previous review flagged that $(REDIS_PASSWORD) would not expand within single-quoted patches. You've correctly addressed this by:

  • Extracting the password into a shell variable first (lines 205–206, 376–377)
  • Using double-quoted -p="..." syntax for variable interpolation (lines 209, 380)
  • Employing ${REDIS_PASSWORD} syntax inside the JSON patch (lines 211, 382)
  • Adding clarifying comments (lines 208, 379)

The fix prevents subtle authentication misconfiguration and makes the instructions accurate for users.

Also applies to: 375-387


166-228: Static analysis warnings appear to be false positives.

The markdownlint alerts about missing language specifiers are incorrect; all code blocks in section 2.4 have explicit bash language specified (lines 166, 197, 221). The flagged lines (178, 202, 211, 228) are interior lines within bash blocks, and the linter is likely confused by heredoc syntax or JSON patch escaping. No code block syntax corrections are needed.

principal/tracker/tracking.go (1)

75-78: Buffered channel prevents deadlock—verify broader synchronization.

The change to a buffered channel (capacity 1) is appropriate for async event delivery between goroutines and prevents blocking when the sender and receiver are not synchronized.

Verify the following:

  1. Proper synchronization exists to prevent send-on-closed-channel panics. Since StopTracking() closes the channel (Line 90), ensure that processRedisEventResponse (or any sender) cannot attempt a send after the channel is closed.
  2. Buffer size of 1 is sufficient—confirm that only one event is sent per tracked request and no events are lost.
#!/bin/bash
# Verify synchronization between channel send and close operations

# Find all sends to evCh channel (tracked request wrappers)
echo "=== Finding sends to tracked event channels ==="
ast-grep --pattern 'evCh <- $_'

echo ""
echo "=== Finding processRedisEventResponse function (sender) ==="
ast-grep --pattern $'func processRedisEventResponse($$$) {
  $$$
}'

echo ""
echo "=== Finding calls to StopTracking (closes channel) ==="
rg -n -C3 'StopTracking\('

echo ""
echo "=== Finding sendSynchronousRedisMessageToAgent function (receiver) ==="
ast-grep --pattern $'func sendSynchronousRedisMessageToAgent($$$) {
  $$$
}'
principal/resource.go (1)

39-39: LGTM—timeout increase appropriate for TLS overhead.

Tripling the timeout from 10s to 30s is reasonable given the additional latency introduced by TLS handshakes in Redis connections.

test/run-e2e.sh (1)

124-124: LGTM—timeout increase appropriate for TLS-enabled tests.

Doubling the E2E test timeout from 30m to 60m is reasonable given the additional overhead from TLS handshakes and certificate validation across multiple vclusters.

hack/dev-env/start-e2e.sh (1)

50-61: LGTM—static addresses simplify TLS certificate validation.

The shift to static localhost addresses for Redis endpoints (6380, 6381, 6382) is a good simplification. It eliminates dynamic IP detection complexity and ensures TLS certificates can include localhost in their SANs, making local development and E2E testing more reliable.

test/e2e/fixture/cluster.go (2)

370-374: Hardcoded CA certificate path is test-specific.

The path hack/dev-env/creds/redis-tls/ca.crt is hardcoded for E2E tests. This is acceptable for test fixtures but consider adding a comment explaining this is intentional for the dev environment setup.

The hardcoded path aligns with the dev-env scripts mentioned in the PR objectives.


261-267: Generous timeouts are appropriate for E2E tests with TLS overhead.

The extended timeouts (DialTimeout: 10s, ReadTimeout: 30s) appropriately account for TLS handshake latency and port-forward operations in E2E test environments.

cmd/argocd-agent/principal.go (2)

263-275: Redis TLS server certificate configuration is well-validated.

The validation ensures both cert and key are provided together (lines 270-271), and gracefully falls back to loading from a Kubernetes secret when paths aren't specified. This follows the same pattern used for gRPC TLS configuration.


438-459: Redis TLS enabled by default is a good security posture.

Enabling TLS by default (env.BoolWithDefault("ARGOCD_PRINCIPAL_REDIS_TLS_ENABLED", true)) aligns with the PR objective and security best practices. The flags provide appropriate flexibility for different deployment scenarios.

test/e2e/clusterinfo_test.go (1)

108-115: Timeout increases are appropriate for TLS-enabled E2E tests.

The increased timeouts (60s/2s) account for additional latency from TLS handshakes and potential port-forward delays. The inline comments explaining the rationale are helpful.

test/e2e/fixture/argoclient.go (2)

316-338: Environment variable override for ArgoCD server endpoint improves local development experience.

Checking ARGOCD_SERVER_ADDRESS first avoids unnecessary K8s API calls and provides flexibility for local testing. The fallback logic is preserved for cluster deployments.


489-513: IsArgoCDRepoServerReady helper is well-implemented.

The function provides useful diagnostics when the repo-server isn't ready, including replica counts and conditions. This aids debugging E2E test failures.

internal/argocd/cluster/cluster.go (2)

176-191: TLS configuration cleanly integrated into Redis client initialization.

The signature change to accept *tls.Config is well-designed - callers can pass nil when TLS is not required, and the config is directly assigned to redis.Options.TLSConfig. This maintains backward compatibility while enabling TLS support.


135-142: Initializing ConnectionState on first cache stats update improves UX.

When SetClusterCacheStats is called but no ConnectionState exists yet (agent just connected), initializing it with a successful status ensures the connection info is populated promptly rather than waiting for a separate connection status update.

cmd/argocd-agent/agent.go (3)

184-199: Redis TLS configuration logic is correctly implemented.

The validation ensures mutual exclusivity between --redis-tls-insecure and --redis-tls-ca-path, and the configuration is only applied when TLS is enabled. The warning for insecure mode is appropriate.


241-250: Redis TLS enabled by default aligns with security objectives.

The default true for ARGOCD_AGENT_REDIS_TLS_ENABLED ensures TLS encryption is used by default, matching the PR objective and the principal's configuration.


184-199: Agent lacks secret-based CA loading option available in principal.

The principal supports loading Redis upstream CA from a Kubernetes secret (--redis-upstream-ca-secret-name), but the agent only supports file-based CA (--redis-tls-ca-path). This asymmetry may be intentional (agent runs in a different context), but worth verifying whether secret-based CA loading should be added for feature parity in Kubernetes deployments.

test/e2e/redis_proxy_test.go (4)

588-588: Buffered channel size of 100 looks reasonable.

The buffered channel helps prevent message loss during SSE stream processing. The size of 100 provides adequate headroom for burst scenarios while the consumer drains messages.


642-653: HTTP transport configuration improvements for SSE streams.

The transport settings are appropriate for long-lived SSE connections:

  • IdleConnTimeout: 300s keeps connections alive
  • ResponseHeaderTimeout: 0 and client Timeout: 0 are correct for SSE streams that may take time to produce events
  • InsecureSkipVerify: true is acceptable in E2E tests per PR description

188-208: Drain-and-retry logic is well-structured.

The message draining approach correctly processes all available messages before returning false to retry, preventing missed messages due to timing issues. The logging provides good visibility into test progress.

Also applies to: 407-427


210-237: ResourceTree retry logic handles transient Redis connection issues.

The Eventually wrapper with error handling for EOF and nil results provides resilience against transient connection issues during TLS-enabled Redis operations. The 30-second timeout with 2-second intervals is appropriate.

Also applies to: 430-456

agent/agent.go (3)

141-146: Default initialization addresses potential ticker panic.

Setting cacheRefreshInterval: 30 * time.Second as a default in the Agent struct initialization prevents the time.NewTicker panic that could occur with a zero duration. This addresses the previous review concern.


324-344: TLS configuration for cluster cache is well-implemented.

The TLS setup correctly:

  • Sets MinVersion: tls.VersionTLS12
  • Logs a warning when using insecure mode (line 331)
  • Properly reads and parses CA certificate from path
  • Returns descriptive errors for CA loading failures

446-461: Unified cluster cache info update goroutine is cleaner.

The refactored goroutine sends an immediate update on startup and then uses a single ticker for periodic updates. This consolidates the previous mode-specific logic and ensures both managed and autonomous agents send cluster cache info.

test/e2e/fixture/fixture.go (3)

110-113: Extended timeouts for deletion operations are appropriate.

Increasing the deletion wait timeouts from 60s to 120s accommodates TLS handshake overhead and potential Redis connection delays in the TLS-enabled environment.

Also applies to: 144-144, 161-161


236-241: DeepCopy pattern prevents loop variable mutation.

Using DeepCopy() before modifying namespace/name ensures the original loop variable isn't mutated, which could cause subtle bugs in subsequent iterations or when the list is reused.

Also applies to: 261-266


232-233: Cleanup now logs warnings instead of failing tests.

Converting cleanup errors to warnings with fmt.Printf and continuing execution improves test resilience. This prevents cascading test failures when non-critical cleanup operations fail (e.g., due to transient Redis unavailability).

Also applies to: 240-241, 257-258, 265-266, 278-279, 291-292, 312-314, 323-325, 345-347, 355-357, 372-374

principal/redisproxy/redisproxy.go (8)

98-128: TLS configuration setters are well-designed.

The setter methods provide a clean API for configuring TLS:

  • Separation between in-memory cert/key and file paths
  • Upstream TLS can use CA pool, CA path, or insecure mode
  • Clear method naming indicates purpose

130-154: Server TLS configuration handles both cert sources correctly.

createServerTLSConfig properly prioritizes file paths over in-memory certificates and sets MinVersion: tls.VersionTLS12. The error messages are descriptive.


162-183: TLS listener initialization is correct.

The Start() method properly branches between TLS and non-TLS listeners, with appropriate logging for each mode.


846-854: TCP dial timeout prevents indefinite hangs.

Using net.Dialer with a 30-second timeout addresses the previous review concern about the connection attempt hanging indefinitely.


861-863: Security warning for unencrypted upstream connection.

The warning when server TLS is enabled but upstream TLS is not configured helps operators identify potential security gaps. The message clearly indicates the data exposure risk.


871-877: InsecureSkipVerify warns when CA config is ignored.

The additional warning when CA configuration is provided but ignored due to InsecureSkipVerify=true addresses the previous review concern about silent configuration conflicts.


903-919: TLS handshake timeout prevents indefinite hangs.

Setting a deadline before the handshake and clearing it afterward addresses the previous review concern. The 30-second timeout is consistent with the dial timeout.


896-901: SNI hostname extraction is correct.

Extracting the hostname from the address for SNI configuration ensures proper TLS server name verification when connecting to upstream Redis.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
agent/agent.go (1)

108-109: Guard against zero/negative cacheRefreshInterval before creating ticker

NewAgent now initializes cacheRefreshInterval to 30 seconds, which is good, but AgentOptions can still overwrite it with 0 or a negative value. In that case, time.NewTicker(a.cacheRefreshInterval) will panic at runtime (“non-positive interval for NewTicker”).

To keep this safe even when options misconfigure the interval, validate it at ticker creation time:

-    go func() {
-        // Send initial update immediately on startup (don't wait for first ticker)
-        a.addClusterCacheInfoUpdateToQueue()
-
-        ticker := time.NewTicker(a.cacheRefreshInterval)
+    go func() {
+        // Send initial update immediately on startup (don't wait for first ticker)
+        a.addClusterCacheInfoUpdateToQueue()
+
+        interval := a.cacheRefreshInterval
+        if interval <= 0 {
+            interval = 30 * time.Second
+        }
+        ticker := time.NewTicker(interval)
         defer ticker.Stop()
         for {
             select {
             case <-ticker.C:
                 a.addClusterCacheInfoUpdateToQueue()
             case <-a.context.Done():
                 return
             }
         }
     }()

Optionally factor 30 * time.Second into a const defaultCacheRefreshInterval to avoid duplication with the constructor.

Also applies to: 141-146, 447-461

test/e2e/fixture/argoclient.go (1)

316-338: Fix LoadBalancer endpoint fallback to handle IP-only ingress

The new env override is great, but the Kubernetes fallback currently does:

argoEndpoint := srvService.Spec.LoadBalancerIP
if len(srvService.Status.LoadBalancer.Ingress) > 0 {
    if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" {
        argoEndpoint = hostname
    }
}

On many providers (e.g., bare-metal + MetalLB, some cloud setups), Ingress[0].IP is populated while Spec.LoadBalancerIP and Ingress[0].Hostname are empty. In that case, this function now returns an empty endpoint and callers will fail.

Consider handling both hostname and IP, and erroring explicitly if still empty:

 func GetArgoCDServerEndpoint(k8sClient KubeClient) (string, error) {
     // Check environment variable first (avoids unnecessary K8s API call)
     if envAddr := os.Getenv("ARGOCD_SERVER_ADDRESS"); envAddr != "" {
         return envAddr, nil
     }

-    // Fall back to querying K8s service
+    // Fall back to querying K8s service
     srvService := &corev1.Service{}
     err := k8sClient.Get(context.Background(),
         types.NamespacedName{Namespace: "argocd", Name: "argocd-server"}, srvService, metav1.GetOptions{})
     if err != nil {
         return "", err
     }

-    argoEndpoint := srvService.Spec.LoadBalancerIP
-    if len(srvService.Status.LoadBalancer.Ingress) > 0 {
-        if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" {
-            argoEndpoint = hostname
-        }
-    }
+    argoEndpoint := srvService.Spec.LoadBalancerIP
+    if len(srvService.Status.LoadBalancer.Ingress) > 0 {
+        ingress := srvService.Status.LoadBalancer.Ingress[0]
+        if ingress.Hostname != "" {
+            argoEndpoint = ingress.Hostname
+        } else if ingress.IP != "" {
+            argoEndpoint = ingress.IP
+        }
+    }
+    if argoEndpoint == "" {
+        return "", fmt.Errorf("argocd-server service has no LoadBalancer IP or hostname")
+    }

     return argoEndpoint, nil
 }

This restores compatibility with IP-only LoadBalancers while keeping the env override behavior.

♻️ Duplicate comments (6)
hack/dev-env/start-agent-autonomous.sh (1)

63-75: Use secure temp files and cleanup for extracted TLS keys

The script currently writes the client cert, key, and CA to predictable /tmp/agent-autonomous-*.{crt,key} paths. On most systems those files will be world-readable by default (depending on umask), and they aren’t removed after the process exits. Given these are long-lived TLS credentials, it’s better to:

  • Use mktemp to create unique temp files.
  • Restrict permissions to 600 (or rely on a tighter umask).
  • Add a trap to delete them on exit.

For example:

-echo "Extracting mTLS client certificates and CA from Kubernetes..."
-TLS_CERT_PATH="/tmp/agent-autonomous-tls.crt"
-TLS_KEY_PATH="/tmp/agent-autonomous-tls.key"
-ROOT_CA_PATH="/tmp/agent-autonomous-ca.crt"
+echo "Extracting mTLS client certificates and CA from Kubernetes..."
+TLS_CERT_PATH="$(mktemp /tmp/agent-autonomous-tls.crt.XXXXXX)"
+TLS_KEY_PATH="$(mktemp /tmp/agent-autonomous-tls.key.XXXXXX)"
+ROOT_CA_PATH="$(mktemp /tmp/agent-autonomous-ca.crt.XXXXXX)"
+chmod 600 "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"
+trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXIT
 kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
   -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}"
 kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-client-tls \
   -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}"
 kubectl --context vcluster-agent-autonomous -n argocd get secret argocd-agent-ca \
   -o jsonpath='{.data.ca\.crt}' | base64 -d > "${ROOT_CA_PATH}"

This keeps the dev script behavior the same while tightening handling of sensitive material.

Also applies to: 79-81

hack/dev-env/start-agent-managed.sh (1)

63-75: Harden temp file handling for extracted TLS keys

As in the autonomous script, the managed-agent script writes the client cert, key, and CA to predictable /tmp/agent-managed-*.{crt,key} files and leaves them behind. To better protect these credentials, especially on shared dev boxes, it’s worth:

  • Using mktemp to generate unique paths.
  • Locking down permissions (e.g., chmod 600).
  • Cleaning them up with a trap on exit.

For example:

 echo "Extracting mTLS client certificates and CA from Kubernetes..."
-TLS_CERT_PATH="/tmp/agent-managed-tls.crt"
-TLS_KEY_PATH="/tmp/agent-managed-tls.key"
-ROOT_CA_PATH="/tmp/agent-managed-ca.crt"
+TLS_CERT_PATH="$(mktemp /tmp/agent-managed-tls.crt.XXXXXX)"
+TLS_KEY_PATH="$(mktemp /tmp/agent-managed-tls.key.XXXXXX)"
+ROOT_CA_PATH="$(mktemp /tmp/agent-managed-ca.crt.XXXXXX)"
+chmod 600 "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"
+trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXIT

The rest of the extraction logic can remain the same.

Also applies to: 79-81

docs/configuration/redis-tls.md (1)

149-155: Tag remaining fenced code blocks with a language to satisfy markdownlint (MD040)

markdownlint is still flagging a few bare code fences here (“How the tunnel works” and the three script-output examples). To quiet MD040 and keep formatting explicit, please tag them as plain text, e.g.:

-**How the tunnel works:**
-  ```
+**How the tunnel works:**
+  ```text
   Argo CD Server (remote vcluster) 
       → rathole Deployment (remote) 
       → rathole Container (local Mac) 
       → Principal process (local Mac)

And similarly for the script output sections:

```diff
-**gen-redis-tls-certs.sh:**
-```
+**gen-redis-tls-certs.sh:**
+```text
 ...
-**configure-redis-tls.sh:**
-```
+**configure-redis-tls.sh:**
+```text
 ...
-**configure-argocd-redis-tls.sh:**
-```
+**configure-argocd-redis-tls.sh:**
+```text
 ...

This keeps content unchanged while making the markdown linter happy.

Also applies to: 475-483, 485-501, 503-520

hack/dev-env/configure-argocd-redis-tls.sh (2)

29-31: Optional: add explicit error handling for context switch

set -e will stop the script if kubectl config use-context ${CONTEXT} fails, but the user only sees a generic kubectl error. Wrapping it with a short, explicit message would make failures clearer:

-echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+echo "Switching to context: ${CONTEXT}"
+kubectl config use-context "${CONTEXT}" || {
+  echo "Error: Failed to switch to context ${CONTEXT}" >&2
+  exit 1
+}

This keeps the safety while improving debuggability.


160-183: Harden repo-server and app-controller volume patches like argocd-server

For argocd-repo-server and argocd-application-controller, the JSON patches assume /spec/template/spec/volumes already exists and append with "/volumes/-". This works with current upstream Argo CD manifests but will fail if those Deployments/StatefulSets are ever created without a volumes array.

To mirror the more defensive pattern used for argocd-server, consider checking for the existence of /spec/template/spec/volumes and creating it when missing before appending:

-    if ! kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes[?(@.name=="redis-tls-ca")]}' | grep -q "redis-tls-ca"; then
-        echo "  Adding redis-tls-ca volume..."
-        if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[
-          {
-            "op": "add",
-            "path": "/spec/template/spec/volumes/-",
-            "value": {
-              "name": "redis-tls-ca",
-              "secret": {
-                "secretName": "argocd-redis-tls",
-                "items": [{"key": "ca.crt", "path": "ca.crt"}]
-              }
-            }
-          }
-        ]'; then
+    if ! kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes[?(@.name=="redis-tls-ca")]}' | grep -q "redis-tls-ca"; then
+        echo "  Adding redis-tls-ca volume..."
+
+        VOLUMES_EXIST=$(kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes}' 2>/dev/null || echo "")
+        if [ -z "$VOLUMES_EXIST" ] || [ "$VOLUMES_EXIST" = "null" ]; then
+            # Create volumes array with first element
+            if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[
+              {
+                "op": "add",
+                "path": "/spec/template/spec/volumes",
+                "value": [{
+                  "name": "redis-tls-ca",
+                  "secret": {
+                    "secretName": "argocd-redis-tls",
+                    "items": [{"key": "ca.crt", "path": "ca.crt"}]
+                  }
+                }]
+              }
+            ]'; then
+                echo "  ERROR: Failed to create volumes array and add redis-tls-ca volume to argocd-repo-server"
+                exit 1
+            fi
+        else
+            # Append to existing volumes array
+            if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[
+              {
+                "op": "add",
+                "path": "/spec/template/spec/volumes/-",
+                "value": {
+                  "name": "redis-tls-ca",
+                  "secret": {
+                    "secretName": "argocd-redis-tls",
+                    "items": [{"key": "ca.crt", "path": "ca.crt"}]
+                  }
+                }
+              }
+            ]'; then
+                echo "  ERROR: Failed to add redis-tls-ca volume to argocd-repo-server"
+                exit 1
+            fi
+        fi

Apply the same pattern to the StatefulSet block for argocd-application-controller to keep behavior consistent and robust.

Also applies to: 237-252

hack/dev-env/configure-redis-tls.sh (1)

68-71: Optional: improve error message on context switch

As with the other script, set -e will abort if kubectl config use-context ${CONTEXT} fails, but a short explicit message would make failures easier to diagnose:

-echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+echo "Switching to context: ${CONTEXT}"
+kubectl config use-context "${CONTEXT}" || {
+  echo "Error: Failed to switch to context ${CONTEXT}" >&2
+  exit 1
+}

Functionality is already safe; this is just UX polish.

🧹 Nitpick comments (6)
test/e2e/fixture/toxyproxy.go (1)

119-124: Dynamic readiness timeout for principal avoids informer-sync flakes

Using a 120s default and extending to 180s for compName == "principal" is a pragmatic way to account for the principal’s longer informer sync time and should reduce test flakiness. If you ever see drift with ARGOCD_PRINCIPAL_INFORMER_SYNC_TIMEOUT, consider deriving this timeout from that value instead of hard-coding, but it’s fine as-is.

Also applies to: 126-134

test/run-e2e.sh (1)

89-122: macOS port-forward detection is reasonable as a soft check

The lsof-based check and the explanatory warning around make start-e2e provide a helpful signal for local development without blocking CI. It only guarantees that some of the required forwards are running (union of 6380/6381/6382), but since it’s advisory and not fatal, that trade-off is fine for now.

test/e2e/fixture/fixture.go (1)

487-501: Optional: guard against nil clusterDetails in cluster-info reset

resetManagedAgentClusterInfo assumes clusterDetails is non-nil when calling getCachedCacheInstance(AgentManagedName, clusterDetails). Today that’s true for the existing BaseSuite usage, but a future caller could accidentally pass nil and trigger a panic during cleanup.

A small defensive check would make this safer:

func resetManagedAgentClusterInfo(clusterDetails *ClusterDetails) error {
-    // Reset cluster info in redis cache
-    if err := getCachedCacheInstance(AgentManagedName, clusterDetails).SetClusterInfo(AgentClusterServerURL, &argoapp.ClusterInfo{}); err != nil {
+    if clusterDetails == nil {
+        return fmt.Errorf("resetManagedAgentClusterInfo: clusterDetails is nil")
+    }
+    // Reset cluster info in redis cache
+    if err := getCachedCacheInstance(AgentManagedName, clusterDetails).SetClusterInfo(AgentClusterServerURL, &argoapp.ClusterInfo{}); err != nil {
         return fmt.Errorf("resetManagedAgentClusterInfo: %w", err)
     }
     return nil
}

Not required for current tests, but it future-proofs the helper.

test/e2e/fixture/cluster.go (1)

43-60: Redis TLS wiring, timeouts, and address resolution look solid; consider logging for implicit InsecureSkipVerify.

The overall shape here looks good: TLS is enabled by default for both principal and managed-agent Redis in E2E, with a CA-path override, sane tls.Config{MinVersion: tls.VersionTLS12}, generous dial/read/write timeouts, and clear LoadBalancer → spec.LoadBalancerIPClusterIP fallbacks plus env overrides for local runs. This should significantly reduce flakiness in tests.

One small ergonomics improvement: when no CA path is specified you silently set InsecureSkipVerify = true (Lines 210-213, 251-253), whereas the “file missing” case logs a warning. For misconfigurations, an explicit warning in the “no CA path” branch would make it much easier to spot that certificate verification isn’t happening, without changing behavior.

For example:

-		} else {
-				// No CA path specified, skip verification
-				tlsConfig.InsecureSkipVerify = true
-		}
+		} else {
+			// No CA path specified, skip verification
+			tlsConfig.InsecureSkipVerify = true
+			fmt.Printf("Warning: Principal Redis CA certificate path not specified, skipping verification\n")
+		}
...
-		} else {
-			// No CA path specified, skip verification
-			tlsConfig.InsecureSkipVerify = true
-		}
+		} else {
+			// No CA path specified, skip verification
+			tlsConfig.InsecureSkipVerify = true
+			fmt.Printf("Warning: Managed agent Redis CA certificate path not specified, skipping verification\n")
+		}

Given this is test-only wiring, this remains a low‑risk, nice‑to‑have for operator visibility rather than a functional change.

Also applies to: 173-217, 224-267, 338-400, 402-463

cmd/argocd-agent/principal.go (1)

90-99: Redis TLS CLI wiring and informer sync timeout semantics are consistent; only minor UX nits possible.

The new Redis TLS surface looks coherent:

  • redis-tls-enabled is on by default and fed through WithRedisTLSEnabled, with server TLS sourced either from explicit cert/key paths or a secret, with proper paired‑flag validation.
  • Upstream TLS “modes” (insecure, CA file, CA secret) are mutually exclusive for explicit configurations via the modesSet count, while still allowing the default secret name to act as a fallback when no mode is chosen, which is a sensible behavior.
  • The upstream wiring (WithRedisUpstreamTLSInsecure, WithRedisUpstreamTLSCAFromFile, WithRedisUpstreamTLSCAFromSecret) lines up with that validation, so you won’t silently drop user‑specified upstream TLS settings.

informer-sync-timeout now clearly documents 0 = use default of 60s and is only applied when > 0, which matches the help text and avoids surprising behavior for existing installs.

The increased 30s timeout in getResourceProxyTLSConfigFromKube is also a good call for slow or loaded clusters.

If you want to polish further, a small optional improvement would be to log a warning when redis-tls-enabled=false but any of the TLS‑specific flags (server cert/key, upstream CA path/secret, insecure) are set, to surface misconfigurations that are currently silently ignored.

Also applies to: 259-305, 434-459, 490-510

principal/redisproxy/redisproxy.go (1)

65-76: RedisProxy server & upstream TLS implementation is robust; only minor configurability tweaks are optional.

The new TLS support in RedisProxy looks well‑structured:

  • Server‑side TLS:

    • SetTLSEnabled, SetServerTLS, and SetServerTLSFromPath cleanly separate concerns between enabling TLS and configuring certificate sources.
    • createServerTLSConfig correctly prefers explicit paths when present, falls back to in‑memory *x509.Certificate/crypto.PrivateKey, and enforces MinVersion: tls.VersionTLS12.
  • Upstream TLS:

    • establishConnectionToPrincipalRedis now uses a net.Dialer with a 30s timeout and adds a 30s deadline around the TLS handshake, which should prevent the previous “hang forever on connect/handshake” failure mode.
    • hasUpstreamTLSConfig plus the rp.tlsEnabled gate ensures you only wrap the upstream connection when both sides expect TLS, and the explicit Warn when server TLS is on but upstream TLS isn’t configured is a good safety net.
    • CA handling from either an in‑memory pool or a file is correct, and the warning when CA config is present but ignored due to InsecureSkipVerify=true is helpful from a security‑visibility standpoint.
    • SNI ServerName derived via net.SplitHostPort from principalRedisAddress is appropriate for hostname‑based certs while still working with IP‑SAN certs.

Overall, this is a nice, self‑contained TLS upgrade of the proxy. If, in the future, you decide you need “upstream TLS even when the proxy listens in plaintext” for phased migrations, you could relax the if rp.tlsEnabled && hasUpstreamTLSConfig condition to key only on hasUpstreamTLSConfig and control the two halves independently, but that’s a design choice rather than a requirement for this PR.

Also applies to: 98-155, 159-183, 836-926

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97e15ae and 291dd51.

📒 Files selected for processing (31)
  • Makefile (1 hunks)
  • agent/agent.go (4 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/agent/configuration.md (1 hunks)
  • docs/configuration/agent/pki-certificates.md (1 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/application_test.go (2 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (3 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (12 hunks)
  • test/e2e/fixture/toxyproxy.go (1 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/configuration/agent/configuration.md
🚧 Files skipped from review as they are similar to previous changes (11)
  • principal/resource.go
  • Makefile
  • principal/tracker/tracking.go
  • hack/dev-env/gen-redis-tls-certs.sh
  • test/e2e/clusterinfo_test.go
  • test/e2e/application_test.go
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/Procfile.e2e
  • test/e2e/rp_test.go
  • cmd/argocd-agent/agent.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-agent-managed.sh
  • docs/configuration/agent/pki-certificates.md
  • hack/dev-env/start-agent-autonomous.sh
  • hack/dev-env/configure-argocd-redis-tls.sh
  • test/run-e2e.sh
  • test/e2e/README.md
  • hack/dev-env/start-principal.sh
🧬 Code graph analysis (6)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (2)
internal/cache/resource_cache.go (1)
  • NewSourceCache (32-38)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
test/e2e/fixture/argoclient.go (1)
test/e2e/fixture/kubeclient.go (1)
  • KubeClient (67-73)
principal/redisproxy/redisproxy.go (4)
internal/logging/logfields/logfields.go (2)
  • Config (127-127)
  • Error (101-101)
principal/listen.go (1)
  • Listener (54-60)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
internal/env/env.go (1)
  • String (96-107)
test/e2e/fixture/fixture.go (3)
internal/logging/logfields/logfields.go (1)
  • Name (59-59)
internal/backend/interface.go (1)
  • Namespace (124-127)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (43-59)
  • AgentManagedName (38-38)
  • AgentClusterServerURL (40-40)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md

233-233: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Analyze (go)
🔇 Additional comments (23)
docs/configuration/agent/pki-certificates.md (1)

568-576: Clarified default certificate validation wording looks good

Stating explicitly that certificate validation is the default behavior matches the CLI/env defaults and improves guidance without changing semantics. No further changes needed.

internal/argocd/cluster/cluster.go (2)

129-142: ConnectionState initialization on first cache stats update is reasonable

Preserving an existing ConnectionState while initializing it to ConnectionStatusSuccessful when it was previously zeroed gives clusters a sensible “connected” state once they start reporting stats, without overwriting prior status. This behavior is consistent with the rest of the manager logic.


175-185: Redis cluster cache now honors TLS configuration—verify updated call sites

Accepting a *tls.Config in NewClusterCacheInstance and wiring it through to redis.Options{ TLSConfig: tlsConfig } is the correct way to enable TLS for the go-redis client while allowing nil to mean "no TLS". Please double-check that:

  • All callers of NewClusterCacheInstance have been updated to pass the new tlsConfig parameter.
  • Callers pass nil when Redis TLS is disabled, and a correctly populated *tls.Config (including CA roots / InsecureSkipVerify as appropriate) when enabled.
docs/configuration/redis-tls.md (1)

1-701: Comprehensive Redis TLS documentation aligns with the new default behavior

This document clearly explains that Redis TLS is enabled by default, how principal/upstream/agent TLS fit together, and how the dev scripts and Kubernetes manifests interact. The flag/env/ConfigMap examples match the described behavior and give a practical path from dev/E2E to production.

docs/getting-started/kubernetes/index.md (1)

159-229: Redis TLS setup steps are consistent and fix the previous password interpolation issue

The new Redis TLS sections for control-plane (Step 2.4) and workload cluster (Step 4.4) are clear and aligned: you generate CA/server certs, create the argocd-redis-tls secret, patch the deployment for TLS, and verify with redis-cli --tls. Reading REDIS_PASSWORD from the existing argocd-redis secret and using a double-quoted JSON patch correctly interpolates the password instead of treating it as a literal. The cross-links to the dedicated Redis TLS configuration doc tie the getting-started flow into the more detailed reference nicely.

Also applies to: 341-390, 655-655

hack/dev-env/start-agent-autonomous.sh (1)

37-46: Redis TLS and Redis address defaults for autonomous agent are wired correctly

Detecting the local Redis TLS CA under creds/redis-tls, building --redis-tls-enabled/--redis-tls-ca-path args, and defaulting ARGOCD_AGENT_REDIS_ADDRESS to localhost:6382 (with explicit port-forward guidance) keeps the autonomous agent E2E startup behavior consistent with the Redis TLS docs and the managed-agent script. Passing $REDIS_TLS_ARGS and $REDIS_ADDRESS_ARG into the agent invocation preserves flexibility for overriding via env while maintaining secure defaults.

Also applies to: 48-62, 79-83

hack/dev-env/start-principal.sh (1)

23-29: Principal dev script now cleanly expects Redis port-forward and wires Redis TLS correctly

Defaulting ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS to localhost:6380 (and relying on an external port-forward) avoids the prior double port-forward problem, while still matching the SANs used in the Redis TLS certificates. The new ARGOCD_PRINCIPAL_INFORMER_SYNC_TIMEOUT default of 120s lines up with the extended readiness timeout in tests, and the REDIS_TLS_ARGS block correctly enables Redis TLS and passes server cert, key, and upstream CA path into the principal. The overall startup flow looks consistent with the Redis TLS docs and E2E expectations.

Also applies to: 42-44, 47-65, 73-74

hack/dev-env/start-agent-managed.sh (1)

37-46: Managed agent Redis TLS and Redis address defaults are consistent with the autonomous script

Checking for the Redis TLS CA under creds/redis-tls, enabling --redis-tls-enabled/--redis-tls-ca-path when present, and defaulting ARGOCD_AGENT_REDIS_ADDRESS to localhost:6381 (with clear port-forward instructions) align this script with both the Redis TLS docs and the autonomous-agent startup. Injecting $REDIS_TLS_ARGS and $REDIS_ADDRESS_ARG into the agent invocation gives secure-by-default behavior while allowing overrides via env.

Also applies to: 48-62, 79-83

hack/dev-env/configure-argocd-redis-tls.sh (1)

310-347: Replica restoration and cleanup flow looks solid

Reading replica counts from argocd-redis-tls-replicas, enforcing a minimum of 1 for each component, scaling back up only if the resources exist, and finally deleting the temporary ConfigMap matches the intended “scale down for TLS cutover, then restore” flow. No issues here.

hack/dev-env/configure-redis-tls.sh (1)

61-66: Redis TLS configuration script is robust

Nice job on:

  • Validating all required cert files, including ca.crt.
  • Capturing and persisting replica counts before scaling down Argo CD components.
  • Safely adding redis-tls volumes and mounts even when the arrays are initially missing.
  • Reading the Redis password from the argocd-redis secret and failing fast with a clear message if it’s missing.
  • Replacing the Redis args with a TLS-only configuration and guarding the patch with explicit error handling.

This should give very predictable TLS cutovers in dev/e2e.

Also applies to: 123-131, 135-193, 198-229

test/e2e/redis_proxy_test.go (3)

105-124: SSE establishment wait effectively removes subscribe race

Adding a short delay after the SSE stream is established before manipulating pods is a pragmatic way to avoid the “delete before SUBSCRIBE active” race that intermittently broke the tests. The 5-second sleep is reasonable given the overall 5-minute Eventually window.


186-208: Buffered channel + draining loops make SSE verification resilient

The combination of:

  • A buffered msgChan (make(chan string, 100)), and
  • The inner loops that drain all currently available SSE messages before returning false to Eventually

greatly reduces the chance of missing the pod-name event due to bursty traffic or timing. The non-blocking select { case msg := <-msgChan ... default: ... } pattern keeps the Eventually closures fast and avoids deadlocks.

Looks good for stabilizing these Redis proxy tests.

Also applies to: 406-427, 588-589


210-237: ResourceTree retries and SSE HTTP client tuning are appropriate

Wrapping the ResourceTree calls in Requires.Eventually with logging on errors and nil trees is a good way to cope with transient Redis/SSE issues while still asserting that the new pod eventually appears.

The dedicated HTTP transport with IdleConnTimeout, disabled compression, and no overall timeout is appropriate for long-lived SSE connections in tests; using InsecureSkipVerify: true here is acceptable given this is e2e-only code and the surrounding README calls out the TLS model.

No changes needed.

Also applies to: 430-456, 642-653

agent/agent.go (2)

141-146: Good initialization of Agent internals and cache refresh default

Initializing version, deletions, sourceCache, and a sane default cacheRefreshInterval (30s) directly in NewAgent makes the Agent more self-contained and predictable. It also sets a clear baseline for the periodic cluster cache info updates.


324-344: Cluster cache TLS config correctly mirrors Redis proxy settings

The new clusterCacheTLSConfig wiring:

  • Enables TLS only when redisTLSEnabled is true.
  • Logs a clear warning when redisTLSInsecure is set and flips InsecureSkipVerify accordingly.
  • Loads and validates the CA from redisTLSCAPath into a CertPool and assigns it to RootCAs.

Passing this TLS config into cluster.NewClusterCacheInstance ensures the cluster cache talks to Redis with the same security posture as the proxy. Error handling on CA read/parse and cache creation is appropriate.

Also applies to: 346-351

test/e2e/fixture/argoclient.go (1)

489-513: Repo-server readiness helper is simple and useful

IsArgoCDRepoServerReady’s “available replica > 0” check with a diagnostic string on failure is a good fit for e2e polling. Using types.NamespacedName and returning both a bool and message makes it easy for tests to log context without special-casing API errors.

test/run-e2e.sh (2)

24-45: Redis TLS preflight checks are clear and effective

Enforcing presence of creds/redis-tls/ca.crt with explicit instructions, plus validating per-context state (argocd-redis-tls secret, --tls-port arg, and redis-tls volume) before running tests, gives very good feedback when TLS setup is incomplete.

This should prevent most of the confusing “Redis not TLS” failures in e2e.

Also applies to: 49-77


124-124: E2E invocation with race detector and long timeout looks appropriate

Running go test with -race and a 60-minute timeout against the e2e package is aligned with the complexity and duration of these flows. No issues.

test/e2e/fixture/fixture.go (3)

79-98: Extended AppProject sync timeout is appropriate

Bumping the AppProject sync Eventually window to 120 seconds gives more breathing room for cross-cluster propagation under TLS and heavier environments, without impacting test logic. Looks good.


108-172: Deletion helpers are clearer and more robust

The explicit 120-iteration loops in EnsureDeletion and WaitForDeletion with 1-second sleeps make the timing behavior obvious, and the second pass that strips finalizers after the first timeout is a pragmatic way to deal with stuck resources in tests.

No functional issues here.


201-267: Cleanup now favors best-effort behavior, which is appropriate for e2e

Switching many fatal cleanup errors to warnings (while still logging which Application/AppProject failed and in which cluster) makes suites more resilient to transient issues without hiding the problems.

The use of DeepCopy for principal/managed App[Project] objects before WaitForDeletion also avoids mutating loop variables, which is safer. Overall, this is a solid improvement for test stability.

Also applies to: 295-358, 377-487

test/e2e/README.md (1)

21-105: E2E README now accurately reflects TLS-enabled workflow

The stepwise instructions (setup, optional reverse tunnel, start principal/agents, then run tests) combined with the dedicated “Redis TLS” section and concrete script invocations match the new tooling you’ve added (gen-redis-tls-certs.sh, configure-redis-tls.sh, configure-argocd-redis-tls.sh). This should significantly reduce confusion when running TLS-enabled e2e locally or in CI.

test/e2e/fixture/cluster.go (1)

276-317: Redis cache client reuse & cleanup behavior is acceptable for E2E scope.

The cached map[string]*appstatecache.Cache guarded by cachedRedisClientMutex avoids repeatedly constructing Redis clients across calls, and CleanupRedisCachedClients correctly resets the map under the lock. Given that appstatecache.Cache doesn’t expose an explicit close and this is only used in the E2E fixture, relying on process‑lifetime GC for the underlying connections is a reasonable trade‑off; the new debug print makes the lifecycle visible during test teardown.

Signed-off-by: Rizwana777 <[email protected]>
@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 2 times, most recently from 06673ef to ada2bb0 Compare December 10, 2025 10:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
hack/dev-env/configure-argocd-redis-tls.sh (3)

29-31: Verify context switch succeeded before proceeding.

The script switches kubectl context without error handling. If the switch fails, subsequent operations may target the wrong cluster, potentially misconfiguring production Argo CD components.

Apply this diff:

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || { 
+    echo "Error: Failed to switch to context ${CONTEXT}"
+    echo "Please verify the context exists: kubectl config get-contexts"
+    exit 1
+}

167-182: Inconsistent handling of missing volumes array.

The argocd-repo-server configuration assumes the volumes array exists (lines 167-182), while argocd-server handles the case where it might not exist (lines 68-88). This inconsistency could cause the script to fail if argocd-repo-server has no pre-existing volumes array.

Consider applying the same defensive approach used for argocd-server - check if the volumes array exists before attempting to append to it, and create it if necessary.


237-258: Same inconsistency exists for argocd-application-controller.

The StatefulSet configuration also directly appends to volumes without checking if the array exists, unlike the defensive handling in argocd-server. Apply the same pattern for consistency.

test/e2e/fixture/fixture.go (1)

487-491: Minor: extra leading space in warning message.

Line 489 has a leading space in the format string: " Warning: Failed...". This is inconsistent with other warning messages that start without a leading space.

-		fmt.Printf(" Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)
+		fmt.Printf("Warning: Failed to reset managed agent cluster info (Redis unavailable?): %v\n", err)
hack/dev-env/start-agent-autonomous.sh (1)

37-75: Harden temporary handling of mTLS certs/keys extracted to /tmp

The Redis TLS wiring and default --redis-addr behaviour look good, but mTLS credentials are still extracted to predictable /tmp/agent-autonomous-*.{crt,key} paths without tightening permissions or cleaning them up:

  • Paths are static and may be world‑readable depending on umask.
  • Files are never removed after the agent exits.

Given these are client TLS keys, even in dev/e2e it’s safer to:

  • Use mktemp for each of TLS_CERT_PATH, TLS_KEY_PATH, and ROOT_CA_PATH.
  • Immediately chmod 600 the files (or set a restrictive umask before writing).
  • Add a trap to rm -f the temp files on exit.

For example:

TLS_CERT_PATH="$(mktemp /tmp/agent-autonomous-tls.crt.XXXXXX)"
TLS_KEY_PATH="$(mktemp /tmp/agent-autonomous-tls.key.XXXXXX)"
ROOT_CA_PATH="$(mktemp /tmp/agent-autonomous-ca.crt.XXXXXX)"
chmod 600 "$TLS_CERT_PATH" "$TLS_KEY_PATH" "$ROOT_CA_PATH"
trap 'rm -f "$TLS_CERT_PATH" "$TLS_KEY_PATH" "$ROOT_CA_PATH"' EXIT

Error handling is already decent due to set -e -o pipefail; this change would mainly tighten the security story for local runs.

Also applies to: 79-83

hack/dev-env/start-agent-managed.sh (1)

37-75: Use secure temp files for extracted mTLS credentials

As in the autonomous script, mTLS certs/keys/CA here are written to fixed /tmp/agent-managed-*.{crt,key} paths with default permissions and no cleanup. That’s workable for local e2e, but stronger hygiene is easy:

  • Allocate each path via mktemp instead of a static filename.
  • Restrict permissions (chmod 600).
  • Register a trap to delete the files on script exit.

Pattern example:

TLS_CERT_PATH="$(mktemp /tmp/agent-managed-tls.crt.XXXXXX)"
TLS_KEY_PATH="$(mktemp /tmp/agent-managed-tls.key.XXXXXX)"
ROOT_CA_PATH="$(mktemp /tmp/agent-managed-ca.crt.XXXXXX)"
chmod 600 "$TLS_CERT_PATH" "$TLS_KEY_PATH" "$ROOT_CA_PATH"
trap 'rm -f "$TLS_CERT_PATH" "$TLS_KEY_PATH" "$ROOT_CA_PATH"' EXIT

This keeps the nice Redis TLS integration while avoiding leaving long‑lived, guessable TLS key files under /tmp.

Also applies to: 79-83

🧹 Nitpick comments (8)
test/e2e/fixture/toxyproxy.go (1)

119-124: Dynamic timeout logic is sound; consider centralizing per-component config

The new timeout handling correctly preserves the 120s default and extends the principal’s window to 180s to cover informer sync, which should reduce flakiness while keeping other components unchanged. As a small cleanup, you could centralize both healthzAddr and timeout selection in a single switch or helper that takes compName to keep these settings co-located and avoid drift if principal timings change again later.

principal/options.go (1)

80-88: Redis TLS options on the principal are coherent with the rest of the TLS surface

The added ServerOptions fields and WithRedis* helpers cleanly separate server‑side TLS (proxy listener) from upstream TLS (CA / insecure), and the secret‑based variants reuse tlsutil as expected. Only minor thought: WithRedisUpstreamTLSCAFromFile currently just stores the path and defers reading/validation to connection time; if this ever shows up as a hot path, you could mirror WithTLSRootCaFromFile and eagerly build a CertPool once during option application.

Also applies to: 492-548

principal/redisproxy/redisproxy.go (1)

836-926: Upstream TLS dial logic is correct, with a couple of low‑risk refinements to consider

Functionally this method looks good: you now have a dial timeout, SNI set from principalRedisAddress, optional CA loading from memory or path, a distinct insecure mode with loud logging, and a bounded TLS handshake via deadlines.

Two small, non‑blocking tweaks you might want to consider:

  1. Avoid the concrete *net.TCPConn assertion

You don’t seem to use any TCP‑specific methods:

connTmp, err := dialer.Dial("tcp", addr.String())
if err != nil {
    // ...
}
conn := connTmp.(*net.TCPConn)

You can keep conn as a net.Conn and drop the assertion to avoid a potential panic if the implementation ever ceases to return *net.TCPConn:

- connTmp, err := dialer.Dial("tcp", addr.String())
+ conn, err := dialer.Dial("tcp", addr.String())
  if err != nil {
      // ...
- }
- conn := connTmp.(*net.TCPConn)
+ }
  1. Optional: cache the CA pool when using upstreamTLSCAPath

Right now os.ReadFile(rp.upstreamTLSCAPath) + x509.NewCertPool() runs on every new upstream connection. If connection churn is high, you might want to build and store the CertPool once (e.g., when applying options) and reuse it, similar to how WithRedisUpstreamTLSCAFromSecret sets redisUpstreamTLSCA directly.

Neither of these is a correctness blocker; the current implementation should behave as intended.

test/e2e/fixture/argoclient.go (1)

27-27: Env override for Argo CD server endpoint is helpful; just ensure expected format is clear

Letting GetArgoCDServerEndpoint short‑circuit on ARGOCD_SERVER_ADDRESS is a nice way to avoid K8s API calls in constrained environments and to support custom endpoints.

One thing to keep in mind: the ArgoRestClient constructs URLs via url.URL{Scheme: "https", Host: c.endpoint}, so ARGOCD_SERVER_ADDRESS should be a bare host (or host:port), not a full URL with scheme. If that’s not already documented where this env var is introduced, it’s worth calling out to avoid confusing “https://…” values.

Also applies to: 387-403

hack/dev-env/configure-redis-tls.sh (1)

37-46: Script flow and error handling look good; one minor redundant branch

Overall TLS setup (cert checks, secret creation, volume patches, and arg updates) is sound and nicely idempotent.

The second CA-based check at Lines 39–46 is now effectively redundant because Lines 61–66 already hard-fail if ca.crt (and the cert/key pair) are missing, so the else path (“running without TLS”) is unreachable. You can safely drop that branch or merge the messages into the initial cert check to simplify the control flow.

Also applies to: 61-66

docs/configuration/redis-tls.md (1)

114-121: Clean up tab characters flagged by markdownlint (MD010)

markdownlint is still reporting MD010 “no-hard-tabs” around these lines. There are likely literal tab characters in the bullet/paragraph indentation here even though they render fine.

Replacing the tabs with spaces in this section (and any similar spots) will satisfy MD010 without changing rendered output.

install/helm-repo/argocd-agent-agent/values.schema.json (1)

302-383: Consider documenting the type flexibility for Redis TLS boolean fields.

The schema allows redisTLS.enabled and redisTLS.insecure to accept both boolean and string types via anyOf, while networkPolicy.enabled accepts only boolean. This inconsistency might confuse users who expect uniform boolean handling across the chart.

If the string support is needed for environment variable compatibility (e.g., Kubernetes ConfigMap values), consider adding this rationale to the field descriptions:

"enabled": {
  "anyOf": [
    { "type": "string", "enum": ["true", "false"] },
    { "type": "boolean" }
  ],
  "description": "Enable TLS for Redis connections (can be boolean or string for ConfigMap compatibility)"
}

Otherwise, consider standardizing all boolean flags to use the same type validation pattern.

cmd/argocd-agent/principal.go (1)

277-291: Validation logic for default secret name may be confusing.

The mutual exclusivity check excludes the default secret name "argocd-redis-tls" from the mode count (lines 286-287). This means:

  • If a user specifies --redis-upstream-ca-path=/some/path and doesn't specify --redis-upstream-ca-secret-name, the validation passes (modesSet=1) even though the secret name has the default value
  • The if-else chain at lines 294-303 prioritizes the path, so the default secret is ignored

While this works correctly in practice, it's unintuitive. Users might expect that:

  1. Not specifying --redis-upstream-ca-secret-name means "don't use a secret"
  2. The default secret is only used when no other mode is specified

Consider either:

  • Removing the default value from the flag (empty string means "not specified")
  • Adding a comment explaining why the default is excluded from validation
  • Checking if the flag was explicitly set by the user (not just using the default)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 291dd51 and 06673ef.

📒 Files selected for processing (48)
  • Makefile (1 hunks)
  • agent/agent.go (4 hunks)
  • agent/inbound_redis.go (3 hunks)
  • agent/options.go (1 hunks)
  • agent/outbound_test.go (1 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/agent/configuration.md (1 hunks)
  • docs/configuration/agent/pki-certificates.md (1 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/README.md (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (2 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.yaml (1 hunks)
  • install/kubernetes/agent/agent-deployment.yaml (3 hunks)
  • install/kubernetes/agent/agent-params-cm.yaml (1 hunks)
  • install/kubernetes/principal/principal-deployment.yaml (3 hunks)
  • install/kubernetes/principal/principal-params-cm.yaml (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • internal/argocd/cluster/cluster_test.go (3 hunks)
  • internal/argocd/cluster/informer_test.go (6 hunks)
  • internal/argocd/cluster/manager.go (3 hunks)
  • internal/argocd/cluster/manager_test.go (3 hunks)
  • principal/options.go (2 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/server.go (3 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/application_test.go (2 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (3 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (12 hunks)
  • test/e2e/fixture/toxyproxy.go (1 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (17)
  • principal/tracker/tracking.go
  • test/run-e2e.sh
  • internal/argocd/cluster/manager.go
  • Makefile
  • test/e2e/rp_test.go
  • cmd/argocd-agent/agent.go
  • internal/argocd/cluster/manager_test.go
  • docs/configuration/agent/pki-certificates.md
  • hack/dev-env/start-e2e.sh
  • principal/resource.go
  • install/kubernetes/principal/principal-deployment.yaml
  • docs/configuration/agent/configuration.md
  • internal/argocd/cluster/cluster.go
  • hack/dev-env/start-principal.sh
  • agent/inbound_redis.go
  • install/kubernetes/principal/principal-params-cm.yaml
  • install/kubernetes/agent/agent-params-cm.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-agent-autonomous.sh
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
  • hack/dev-env/configure-argocd-redis-tls.sh
  • test/e2e/application_test.go
  • test/e2e/README.md
  • hack/dev-env/start-agent-managed.sh
  • install/kubernetes/agent/agent-deployment.yaml
  • hack/dev-env/Procfile.e2e
  • install/helm-repo/argocd-agent-agent/values.yaml
🧬 Code graph analysis (14)
agent/outbound_test.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
test/e2e/application_test.go (1)
test/e2e/fixture/argoclient.go (1)
  • IsArgoCDRepoServerReady (562-583)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/options.go (2)
principal/options.go (1)
  • WithRedisTLSEnabled (493-498)
agent/agent.go (2)
  • AgentOption (139-139)
  • Agent (65-120)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (43-59)
  • AgentManagedName (38-38)
  • AgentClusterServerURL (40-40)
test/e2e/clusterinfo_test.go (2)
test/e2e/fixture/cluster.go (4)
  • HasConnectionStatus (63-77)
  • AgentManagedName (38-38)
  • ClusterDetails (43-59)
  • AgentAutonomousName (39-39)
internal/logging/logfields/logfields.go (2)
  • Status (130-130)
  • Message (132-132)
cmd/argocd-agent/principal.go (3)
principal/options.go (6)
  • WithInformerSyncTimeout (439-444)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
internal/env/env.go (3)
  • DurationWithDefault (168-177)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
principal/server.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
test/e2e/redis_proxy_test.go (1)
internal/logging/logfields/logfields.go (5)
  • Name (59-59)
  • Kind (58-58)
  • Config (127-127)
  • Client (37-37)
  • Timeout (78-78)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/options.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/server.go (1)
  • Server (73-166)
internal/tlsutil/kubernetes.go (2)
  • TLSCertFromSecret (44-66)
  • X509CertPoolFromSecret (106-128)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

115-115: Hard tabs
Column: 1

(MD010, no-hard-tabs)


123-123: Hard tabs
Column: 1

(MD010, no-hard-tabs)


132-132: Hard tabs
Column: 1

(MD010, no-hard-tabs)

docs/getting-started/kubernetes/index.md

233-233: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🔇 Additional comments (37)
agent/outbound_test.go (1)

464-464: LGTM! API update correctly applied.

The addition of the nil parameter aligns with the extended cluster.NewManager signature that now accepts a TLS config. Passing nil is appropriate here since this test uses an in-memory miniredis instance that doesn't require TLS encryption.

internal/argocd/cluster/informer_test.go (1)

19-19: LGTM!

Test calls correctly updated to pass the new compression type parameter (cacheutil.RedisCompressionGZip) and nil for the TLS config, matching the expanded NewManager signature.

Also applies to: 33-33, 50-50, 87-87, 115-115

internal/argocd/cluster/cluster_test.go (1)

36-36: LGTM!

Tests correctly updated to pass nil for the new tlsConfig parameter in NewManager calls.

Also applies to: 225-225

principal/server.go (2)

354-373: LGTM!

Redis proxy TLS configuration is well-structured with proper separation between server-side TLS (incoming from Argo CD) and upstream TLS (outgoing to argocd-redis). The conditional logic correctly handles both path-based and direct certificate configuration.


402-427: LGTM! Solid TLS configuration with appropriate logging.

The cluster manager TLS setup properly:

  • Creates TLS config with MinVersion TLS 1.2
  • Logs a warning when InsecureSkipVerify is enabled (line 410)
  • Loads and validates CA certificates from disk with clear error messages
  • Handles the CA pool from both direct provision and file path
hack/dev-env/configure-argocd-redis-tls.sh (1)

316-325: LGTM! Replica guard logic correctly implemented.

The explicit if statements properly ensure at least 1 replica for each component, handling both empty and "0" cases correctly. This addresses the shell operator precedence issue that could have occurred with compound conditions.

test/e2e/fixture/fixture.go (2)

97-97: LGTM! Extended timeouts appropriate for TLS.

The increased timeouts (to 120 seconds) account for the additional overhead of TLS handshakes and certificate validation during test setup and teardown.

Also applies to: 110-110, 113-113, 144-144, 161-161


236-241: LGTM! Proper use of DeepCopy prevents loop variable mutation.

Creating copies via DeepCopy() before modifying namespace/name fields ensures the original loop variables aren't mutated, which is correct and prevents subtle bugs in cross-cluster deletion checks.

Also applies to: 261-266, 318-325, 351-357

agent/agent.go (2)

328-348: LGTM! TLS configuration properly constructed with appropriate warnings.

The cluster cache TLS setup correctly:

  • Creates TLS config with MinVersion TLS 1.2
  • Logs an "INSECURE" warning when certificate verification is skipped (line 335)
  • Loads and validates CA certificates from disk with clear error messages
  • Handles both insecure mode and CA-based verification

149-149: LGTM! Default interval prevents ticker panic.

Setting cacheRefreshInterval to 30 seconds by default (line 149) ensures time.NewTicker never receives a zero or negative duration, which would panic. The unified goroutine (lines 450-465) performs an immediate initial update before entering the ticker loop, which is good for startup behavior.

Also applies to: 450-465

test/e2e/fixture/cluster.go (4)

184-216: LGTM! TLS configuration with graceful CA fallback.

The TLS setup correctly:

  • Creates TLS config with MinVersion TLS 1.2
  • Loads CA certificates from disk with proper error handling
  • Falls back to InsecureSkipVerify if CA file doesn't exist (with warning)
  • Applies the same pattern for both Principal and ManagedAgent

The fallback to insecure mode is appropriate for test backward compatibility, though the warning message makes the degraded security posture clear.

Also applies to: 224-256


261-267: LGTM! Generous timeouts for E2E port-forward latency.

The extended timeouts (30s read, 10s dial/write) with retry backoff are appropriate for E2E tests that may use port-forward or run in resource-constrained environments.


283-317: Client caching prevents connection leaks.

The cached client approach (with mutex protection) ensures Redis connections are reused across test assertions rather than creating new connections for every query. The CleanupRedisCachedClients() function clears the cache map.

Note: As flagged in a previous review, appstatecache.Cache may not expose a Close() method for explicit connection cleanup. If connections need explicit closure, you may need to track the underlying redis.Client instances separately.

Consider verifying whether appstatecache.Cache or its underlying Redis client exposes a Close() method. If explicit cleanup is needed, you may want to track the raw redis.Client alongside the cache and close it in CleanupRedisCachedClients().


348-368: LGTM! Robust address resolution with multiple fallbacks.

The address resolution logic tries:

  1. LoadBalancer ingress (IP or hostname)
  2. spec.LoadBalancerIP (for local vcluster)
  3. spec.ClusterIP (last resort)

This covers various deployment scenarios and provides a clear error message if all methods fail.

Also applies to: 412-432

test/e2e/clusterinfo_test.go (1)

108-115: Timeout increases for connection status checks look reasonable

Bumping these Eventually timeouts / intervals (with clear comments) is a pragmatic way to absorb extra latency from port‑forward/TLS in long e2e runs; no logic concerns from my side.

Also applies to: 123-129, 142-142

principal/redisproxy/redisproxy.go (1)

65-75: Server‑side Redis proxy TLS wiring looks solid

The new TLS fields and setters on RedisProxy, plus createServerTLSConfig and the Start() branching into tls.Listen, are all consistent and give you clear separation between plaintext and TLS modes with explicit logging. Enforcing TLS ≥1.2 is also a good default for an internal proxy.

Also applies to: 98-154, 159-183

test/e2e/application_test.go (1)

5-5: Repo‑server readiness gate before application tests is a good addition

Waiting on IsArgoCDRepoServerReady with a bounded 180s/5s poll and logging status deltas should cut down on timing‑related flakes when creating applications that rely on repo‑server, without affecting core test logic.

Also applies to: 28-41

install/helm-repo/argocd-agent-agent/values.yaml (1)

136-152: Secure‑by‑default Redis TLS + NetworkPolicy values are reasonable but do require matching cluster setup

The new defaults (tlsRootCAPath, redisTLS.*, and networkPolicy.*) align with the goal of having Redis TLS and restricted network access enabled out of the box. That said, these defaults assume:

  • A argocd-redis-tls secret with ca.crt exists and is mounted at /app/config/redis-tls.
  • Redis and agent workloads are labeled with the selectors used in the networkPolicy section.

Installations that don’t meet those assumptions will need to either provision the secret/labels or override these values. It’d be worth making sure the chart/docs call out these expectations clearly.

Also applies to: 153-163

install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)

93-101: Redis TLS parameters are wired cleanly into the agent ConfigMap

The new agent.redis.tls.* keys mirror the redisTLS values and follow the existing pattern of stringified booleans in the params ConfigMap, so they should drop into the CLI/env parsing on the agent side without surprises.

test/e2e/fixture/argoclient.go (1)

30-30: Repo‑server readiness helper is straightforward and gives useful diagnostics

IsArgoCDRepoServerReady’s check on AvailableReplicas > 0 is a simple, robust readiness signal for the repo‑server, and returning a human‑readable message with replica counts and conditions makes the higher‑level tests’ logs much easier to interpret when readiness fails. No changes needed here.

Also applies to: 559-583

agent/options.go (1)

111-133: Agent Redis TLS options are consistent with other AgentOption helpers

WithRedisTLSEnabled, WithRedisTLSCAPath, and WithRedisTLSInsecure follow the same pattern as the existing Redis options and line up with the new Helm/params keys, so the agent can now be configured cleanly for Redis TLS just like the principal. Looks good.

docs/getting-started/kubernetes/index.md (1)

159-234: Redis TLS setup steps are clear and consistent with the TLS docs

The new Sections 2.4 and 4.4 do a good job of:

  • Generating a CA and per‑cluster Redis server certs with appropriate SANs,
  • Creating argocd-redis-tls secrets on both control-plane and workload clusters, and
  • Patching Redis arguments with a pattern that correctly expands REDIS_PASSWORD in a double‑quoted JSON patch.

The final “Related Documentation” link back to Redis TLS Configuration also helps keep the duplication under control. I don’t see any functional issues here.

Also applies to: 341-390, 655-655

hack/dev-env/gen-redis-tls-certs.sh (1)

14-26: Redis TLS cert generation script looks solid and idempotent

This script cleanly covers:

  • Idempotent CA + per‑cluster cert generation (control-plane, proxy, autonomous, managed),
  • Reasonable SANs for k8s DNS, localhost, and optional local IP,
  • Cleanup of temporary CSR/extension/serial files.

With the prior fixes (no 2>/dev/null suppression and conditional LOCAL_IP SAN), it’s in good shape for dev/e2e purposes.

Also applies to: 28-135

test/e2e/redis_proxy_test.go (1)

120-238: Improved SSE handling and retries should significantly reduce Redis-proxy test flakiness

The combination of:

  • A short post‑connect delay before mutating pods,
  • Buffered SSE channel + “drain all available messages” loops, and
  • Retried ResourceTree calls with logging on transient errors/nil trees,

is a pragmatic way to address timing/race issues in these e2e flows without overcomplicating the tests. The explicit log messages will also make diagnosing future flakes easier.

Given this code is confined to the e2e test package and uses TLS only for test traffic, I’m comfortable with the InsecureSkipVerify transport here.

Also applies to: 326-456, 588-665

hack/dev-env/Procfile.e2e (1)

1-7: LGTM! Process orchestration properly configured for TLS-enabled E2E environment.

The port-forward mappings and startup sequences are well-structured:

  • Redis port-forwards correctly target each vcluster (control-plane:6380, managed:6381, autonomous:6382)
  • Startup delays ensure proper initialization order (principal starts at 3s, agents at 5s)
  • Agent processes include the required REDIS_ADDRESS environment variables for TLS-enabled Redis connections
test/e2e/README.md (4)

21-29: LGTM! Clear documentation of mandatory Redis TLS requirement.

The documentation properly emphasizes that Redis TLS is required and automatically configured, with a helpful reference to the detailed Redis TLS section below.


31-53: LGTM! Excellent documentation of reverse tunnel setup for remote clusters.

The conditional flow for remote vs. local clusters is clearly explained, including:

  • When the reverse tunnel is needed (remote clusters only)
  • What the setup script does
  • How to keep the tunnel running

55-82: LGTM! Clear step-by-step workflow for running E2E tests.

The multi-terminal workflow is well-documented, including:

  • Port-forward requirements
  • Principal and agent process management
  • Conditional tunnel usage
  • Automatic connection method detection (local vs. CI)

83-105: Verify that the referenced TLS configuration scripts exist and are executable.

The documentation references three scripts for manual Redis TLS reconfiguration:

  • ./hack/dev-env/gen-redis-tls-certs.sh
  • ./hack/dev-env/configure-redis-tls.sh
  • ./hack/dev-env/configure-argocd-redis-tls.sh

Please confirm these scripts exist in the repository and are executable. If these scripts were added in a recent commit (such as 3b0283f), verify that file permissions were correctly set during the commit.

install/kubernetes/agent/agent-deployment.yaml (2)

149-166: LGTM! Redis TLS environment variables properly configured.

The three new environment variables for Redis TLS configuration are correctly wired:

  • ARGOCD_AGENT_REDIS_TLS_ENABLED - enables/disables TLS
  • ARGOCD_AGENT_REDIS_TLS_CA_PATH - path to CA certificate
  • ARGOCD_AGENT_REDIS_TLS_INSECURE - skip verification flag (dev/test only)

All variables use optional: true to allow graceful degradation if the ConfigMap keys are not present.


193-195: LGTM! Redis TLS CA volume properly configured with security best practices.

The volume mount and volume definition follow Kubernetes security best practices:

  • Mount is readOnly: true (prevents accidental modification)
  • Secret reference uses optional: true (allows deployment without TLS secret)
  • CA certificate properly mapped from argocd-redis-tls secret to /app/config/redis-tls/ca.crt

Also applies to: 205-211

cmd/argocd-agent/principal.go (6)

259-261: LGTM! Informer sync timeout properly wired with conditional application.

The timeout is only applied when explicitly set (> 0), which allows the internal default (60s) to be used when not specified. This aligns with the flag description at line 436.


265-275: LGTM! Redis server TLS configuration properly validated.

The validation ensures both cert and key are provided together or neither is provided, preventing partial TLS configuration. The fallback to Kubernetes secret is appropriate.


294-303: LGTM! Upstream TLS configuration priority is clear and well-logged.

The if-else chain provides a clear priority order (insecure > CA file > CA secret) with appropriate warning messages for insecure mode.


434-436: LGTM! Informer sync timeout flag properly documented.

The flag description clearly explains the behavior: 0 uses the internal default of 60s, and users can increase it for slow environments. This matches the implementation at lines 259-261.


438-459: LGTM! Redis TLS flags comprehensively cover all configuration scenarios.

The flags provide flexible TLS configuration with:

  • Global TLS enable/disable flag (default: true)
  • Server TLS cert/key from file or Kubernetes secret
  • Upstream CA from file or Kubernetes secret
  • Insecure mode for development/testing (properly marked as INSECURE)

Default values are reasonable (TLS enabled by default, using argocd-redis-tls secret).


490-490: Verify the timeout increase from 2s to 30s is necessary.

The timeout for fetching TLS configuration from Kubernetes was increased from 2 seconds to 30 seconds. While this provides more tolerance for slow clusters or cold starts, it's a significant increase (15x).

Please confirm this timeout is needed based on observed failures in your environment. If the increase is necessary, consider:

  • Whether 30s is the right value (perhaps 10s would be sufficient?)
  • Whether this affects the startup time or user experience
  • Whether similar timeouts elsewhere in the codebase should be adjusted

Based on learnings, this aligns with the PR's goal of accommodating slow environments (e.g., the informer sync timeout increase), but verification ensures the value is appropriate.

Comment on lines +136 to +153
- name: ARGOCD_AGENT_REDIS_TLS_ENABLED
valueFrom:
configMapKeyRef:
name: argocd-agent-params
key: agent.redis.tls.enabled
optional: true
- name: ARGOCD_AGENT_REDIS_TLS_CA_PATH
valueFrom:
configMapKeyRef:
name: argocd-agent-params
key: agent.redis.tls.ca-path
optional: true
- name: ARGOCD_AGENT_REDIS_TLS_INSECURE
valueFrom:
configMapKeyRef:
name: argocd-agent-params
key: agent.redis.tls.insecure
optional: true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wire Redis TLS env vars to the templated ConfigMap name and mount the CA volume

Two issues here will prevent Redis TLS from working reliably:

  1. Hard‑coded ConfigMap name for new env vars

Lines 136–153 use name: argocd-agent-params instead of the existing {{ include "argocd-agent-agent.paramsConfigMapName" . }} helper used everywhere else. This breaks deployments that override the params ConfigMap name via values.

You likely want:

-          - name: ARGOCD_AGENT_REDIS_TLS_ENABLED
-            valueFrom:
-              configMapKeyRef:
-                name: argocd-agent-params
+          - name: ARGOCD_AGENT_REDIS_TLS_ENABLED
+            valueFrom:
+              configMapKeyRef:
+                name: {{ include "argocd-agent-agent.paramsConfigMapName" . }}
                 key: agent.redis.tls.enabled
                 optional: true
-          - name: ARGOCD_AGENT_REDIS_TLS_CA_PATH
-            valueFrom:
-              configMapKeyRef:
-                name: argocd-agent-params
+          - name: ARGOCD_AGENT_REDIS_TLS_CA_PATH
+            valueFrom:
+              configMapKeyRef:
+                name: {{ include "argocd-agent-agent.paramsConfigMapName" . }}
                 key: agent.redis.tls.ca-path
                 optional: true
-          - name: ARGOCD_AGENT_REDIS_TLS_INSECURE
-            valueFrom:
-              configMapKeyRef:
-                name: argocd-agent-params
+          - name: ARGOCD_AGENT_REDIS_TLS_INSECURE
+            valueFrom:
+              configMapKeyRef:
+                name: {{ include "argocd-agent-agent.paramsConfigMapName" . }}
                 key: agent.redis.tls.insecure
                 optional: true
  1. Volume is defined but never mounted into the container

You add the redis-tls-ca volume under volumes: (Lines 253–260), but the only volumeMounts entry is still userpass-passwd (Lines 229–232). Without a mount, the agent process won’t see the CA file at the path configured in agent.redis.tls.ca-path (e.g., /app/config/redis-tls/ca.crt).

Consider adding a mount, e.g.:

           volumeMounts:
             - name: userpass-passwd
               mountPath: /app/config/creds
+{{- if .Values.redisTLS.secretName }}
+            - name: redis-tls-ca
+              mountPath: /app/config/redis-tls
+              readOnly: true
+{{- end }}

(and keep agent.redis.tls.ca-path consistent with this directory, e.g. /app/config/redis-tls/ca.crt).

These two fixes are needed for TLS-enabled Redis connections to actually succeed in-cluster.

Also applies to: 229-260

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (7)
hack/dev-env/configure-redis-tls.sh (1)

68-70: Add error handling for context switch.

If kubectl config use-context fails, subsequent kubectl operations may target the wrong cluster. The past review comment flagged this, but error handling appears to still be missing.

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || {
+    echo "Error: Failed to switch to context ${CONTEXT}"
+    exit 1
+}
docs/configuration/redis-tls.md (1)

415-432: Documentation shows non-expanded $(REDIS_PASSWORD) in kubectl patch.

The example uses single-quoted -p='[...]' which prevents shell expansion of $(REDIS_PASSWORD). Users copying this will configure Redis with the literal string $(REDIS_PASSWORD) as the password. This was flagged in a past review but not addressed.

Update the example to show proper password retrieval and interpolation:

+# First, get the Redis password from the secret
+REDIS_PASSWORD=$(kubectl -n argocd get secret argocd-redis -o jsonpath='{.data.auth}' | base64 --decode)
+
 # Update Redis args for TLS
-kubectl patch deployment argocd-redis -n argocd --type='json' -p='[
+kubectl patch deployment argocd-redis -n argocd --type='json' -p="[
   {
     \"op\": \"replace\",
     \"path\": \"/spec/template/spec/containers/0/args\",
     \"value\": [
       \"--save\", \"\",
       \"--appendonly\", \"no\",
-      \"--requirepass\", \"$(REDIS_PASSWORD)\",
+      \"--requirepass\", \"${REDIS_PASSWORD}\",
       ...
     ]
   }
-]'
+]"
hack/dev-env/configure-argocd-redis-tls.sh (2)

29-31: Add error handling for context switch.

Same issue as in configure-redis-tls.sh: if kubectl config use-context fails, subsequent operations may target the wrong cluster. The past review flagged this but it wasn't addressed.

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || {
+    echo "Error: Failed to switch to context ${CONTEXT}"
+    exit 1
+}

164-182: Inconsistent handling of missing volumes array for repo-server.

The argocd-server configuration (lines 68-108) defensively handles the case where the volumes array doesn't exist, but argocd-repo-server directly appends using /spec/template/spec/volumes/- which will fail if the array is missing. The past review flagged this inconsistency.

Apply the same defensive pattern used for argocd-server:

     if ! kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes[?(@.name=="redis-tls-ca")]}' | grep -q "redis-tls-ca"; then
         echo "  Adding redis-tls-ca volume..."
+        
+        # Check if volumes array exists
+        VOLUMES_EXIST=$(kubectl get deployment argocd-repo-server -n ${NAMESPACE} -o jsonpath='{.spec.template.spec.volumes}' 2>/dev/null || echo "")
+        
+        if [ -z "$VOLUMES_EXIST" ] || [ "$VOLUMES_EXIST" = "null" ]; then
+            # Create volumes array with first element
+            if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[
+              {
+                "op": "add",
+                "path": "/spec/template/spec/volumes",
+                "value": [{"name": "redis-tls-ca", "secret": {"secretName": "argocd-redis-tls", "items": [{"key": "ca.crt", "path": "ca.crt"}]}}]
+              }
+            ]'; then
+                echo "  ERROR: Failed to create volumes array for argocd-repo-server"
+                exit 1
+            fi
+        else
+            # Append to existing volumes array (existing code)
             if ! kubectl -n ${NAMESPACE} patch deployment argocd-repo-server --type=json -p '[
hack/dev-env/start-agent-managed.sh (1)

63-74: Certificate extraction lacks error handling.

The kubectl commands extract TLS credentials to temporary files without checking for errors. If the secrets don't exist or extraction fails, the script continues with empty or corrupt files, causing cryptic TLS errors when the agent starts.

This issue was previously flagged in earlier review comments.

hack/dev-env/start-agent-autonomous.sh (1)

63-74: Certificate extraction lacks error handling.

The kubectl commands extract TLS credentials without error checking, which can cause cryptic failures if secrets don't exist.

This issue was previously flagged and applies identically to the managed agent script.

test/e2e/fixture/cluster.go (1)

309-317: CleanupRedisCachedClients does not actually close Redis clients (behaviour vs comment mismatch)

The comment says this “closes all cached Redis clients”, but the implementation only resets the cachedRedisClients map and relies on GC / process exit to clean up connections. This was already raised previously; reiterating with a concrete suggestion.

If you want real connection cleanup between tests, you’ll need a way to call Close() on the underlying *redis.Clients created in getCacheInstance. For example:

  • Change the cache to store a small struct:
type redisCachedClient struct {
    cache  *appstatecache.Cache
    client *redis.Client
}

var (
    cachedRedisClients     = make(map[string]redisCachedClient)
    cachedRedisClientMutex sync.Mutex
)
  • Have getCachedCacheInstance populate both fields (by refactoring getCacheInstance or adding a helper that returns both cache and client).
  • Then CleanupRedisCachedClients can iterate the map, call client.Close() for each, and finally reset the map.

If you intentionally rely on process teardown and don’t want to plumb through *redis.Client, at least consider updating the comment to describe that this only clears the cache map, not active TCP connections.

🧹 Nitpick comments (9)
test/e2e/fixture/toxyproxy.go (1)

119-134: Principal-specific readiness timeout logic is reasonable; consider avoiding duplicated magic numbers (optional).

The new timeout handling with a longer window for compName == "principal" aligns with the informer sync behavior and should help reduce flakes. Non‑principal components still use the previous 120s behavior, which keeps semantics stable.

If there is (or ends up being) a shared constant or configuration for the principal informer sync timeout elsewhere, consider wiring this code to that single source instead of hard-coding 120 * time.Second here to avoid drift in future changes. This is non‑blocking and can be deferred.

test/e2e/redis_proxy_test.go (2)

120-124: SSE “settling” sleep is pragmatic but could be made condition‑based

The extra 5s wait after establishing the SSE stream should help avoid the subscription race you described and seems reasonable for now. Longer term, consider replacing the fixed sleep with a condition‑based wait (e.g., wait until at least one initial SSE/resource-tree update is observed, with a timeout) so test duration isn’t tied to an arbitrary constant.

Also applies to: 326-330


588-588: Buffered SSE channel and HTTP transport tuning are reasonable; consider a few bounds and test‑only guardrails

The buffered msgChan plus the tuned http.Transport/http.Client (keep‑alives, idle timeout, no overall timeout for SSE) are aligned with long‑lived SSE streams and should reduce connection churn and message loss in these e2e tests.

A few non‑blocking considerations:

  • With Timeout: 0 and ResponseHeaderTimeout: 0, if the endpoint is misconfigured/unreachable and the context lacks a deadline, client.Do can block for a long time. If suite.Ctx doesn’t already enforce a global test timeout, consider using a context with a finite deadline for the SSE stream creation path.
  • InsecureSkipVerify: true is understandable here given dynamically provisioned endpoints and test scope. It’d be good to keep this clearly isolated to e2e (which you’re doing) and maybe add a short comment/TODO about tightening it when CI has a stable CA / hostname story.

Overall, these changes look appropriate for the current test environment.

Also applies to: 643-653, 661-663

hack/dev-env/gen-redis-tls-certs.sh (1)

68-72: Linux IP detection may fail silently on some systems.

The ip r show default command may not output the expected format on all Linux distributions or network configurations (e.g., multiple default routes, missing src field). Consider adding a fallback or validation.

     else
-        LOCAL_IP=$(ip r show default 2>/dev/null | sed -e 's,.*\ src\ ,,' | sed -e 's,\ metric.*$,,' | head -n 1 || echo "")
+        LOCAL_IP=$(ip r show default 2>/dev/null | grep -oP 'src \K[\d.]+' | head -n 1 || \
+                   hostname -I 2>/dev/null | awk '{print $1}' || echo "")
     fi
hack/dev-env/configure-redis-tls.sh (1)

116-118: Suppressed errors during pod termination wait may hide issues.

The 2>/dev/null || true pattern suppresses all errors from kubectl wait. While this allows the script to continue if pods don't exist, it also hides legitimate errors (e.g., API server connectivity issues).

Consider logging a message when the wait command fails:

-kubectl wait --for=delete pod -l app.kubernetes.io/name=argocd-repo-server -n ${NAMESPACE} --timeout=60s 2>/dev/null || true
+kubectl wait --for=delete pod -l app.kubernetes.io/name=argocd-repo-server -n ${NAMESPACE} --timeout=60s 2>/dev/null || echo "  (no pods to wait for or wait timed out)"
principal/redisproxy/redisproxy.go (1)

130-154: Consider validation for mutually exclusive certificate configuration.

The method allows both file-based and in-memory certificate configuration to be set simultaneously, with file-based taking precedence (lines 136-145). While this works, it could lead to confusion if both are configured. Consider adding validation at configuration time to ensure only one method is used, or document this precedence behavior clearly.

Additionally, consider upgrading the minimum TLS version to 1.3 for enhanced security:

 	return &tls.Config{
 		Certificates: []tls.Certificate{cert},
-		MinVersion:   tls.VersionTLS12,
+		MinVersion:   tls.VersionTLS13,
 	}, nil
test/e2e/fixture/cluster.go (3)

276-307: Cached Redis clients: keying and lifecycle considerations

The global cachedRedisClients map keyed by "<source>:<addr>" with a mutex gives you a simple and thread-safe cache and should avoid repeated client creation during E2E runs. Two follow-ups to consider:

  • The cache key ignores password/TLS settings. If a test ever changes credentials or TLS parameters while keeping the same address, you’ll silently reuse an old client. For current E2E usage this is probably fine, but worth keeping in mind if the fixture is extended.
  • If you decide to explicitly close clients (see comment on CleanupRedisCachedClients), you’ll likely want to change the map value to a small struct that also carries the underlying *redis.Client, or maintain a parallel map keyed the same way.

Given this is test-only code, these are more about future-proofing than immediate correctness.


348-381: Managed-agent Redis address discovery and TLS defaults look solid

The address resolution order (LoadBalancer ingress → spec.loadBalancerIPClusterIP) plus a clear error when none is available is sensible for E2E environments. Always enabling TLS for managed-agent Redis and wiring the CA path, with a final override of the address via MANAGED_AGENT_REDIS_ADDR, nicely matches the “TLS by default, easy local override” goal.

Only minor note: the override assumes the env value already includes the port; that’s fine, but it may be worth documenting in test setup docs / comments if not already done elsewhere.


412-445: Principal Redis address discovery and TLS wiring consistent with managed-agent path

The principal-side getPrincipalRedisConfig mirrors the managed-agent logic: same address resolution strategy, TLS enabled by default, CA path wired, and an env override (ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS) applied last. This symmetry makes the fixture predictable and easier to reason about.

No functional issues spotted here; just ensure any test documentation mentions the expected format of the override env var (host:port).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06673ef and ada2bb0.

📒 Files selected for processing (30)
  • Makefile (1 hunks)
  • agent/agent.go (4 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/agent/configuration.md (1 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/application_test.go (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (12 hunks)
  • test/e2e/fixture/toxyproxy.go (1 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/application_test.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • principal/resource.go
  • test/e2e/fixture/argoclient.go
  • test/e2e/clusterinfo_test.go
  • docs/getting-started/kubernetes/index.md
  • test/e2e/rp_test.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • principal/tracker/tracking.go
  • test/run-e2e.sh
  • docs/configuration/agent/configuration.md
  • test/e2e/README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-agent-managed.sh
  • hack/dev-env/start-agent-autonomous.sh
  • hack/dev-env/configure-argocd-redis-tls.sh
  • hack/dev-env/start-principal.sh
  • hack/dev-env/Procfile.e2e
  • Makefile
  • hack/dev-env/start-e2e.sh
🧬 Code graph analysis (6)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (43-59)
  • AgentManagedName (38-38)
  • AgentClusterServerURL (40-40)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
cmd/argocd-agent/agent.go (2)
agent/options.go (3)
  • WithRedisTLSEnabled (112-117)
  • WithRedisTLSInsecure (128-133)
  • WithRedisTLSCAPath (120-125)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/redisproxy/redisproxy.go (1)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Analyze (go)
🔇 Additional comments (46)
test/e2e/fixture/fixture.go (9)

97-97: LGTM: Timeout increase improves test reliability.

The doubled timeout (60→120 seconds) for AppProject synchronization is appropriate given the TLS overhead introduced in this PR and aligns with other timeout increases throughout the file.


110-156: LGTM: Consistent timeout increases for resource deletion.

The wait loop increases (lines 113 and 144) from 60 to 120 iterations provide adequate time for finalizer-based cleanup in TLS-enabled environments. The two-phase approach (wait for finalizers, then force-remove if needed) remains sound.


161-172: LGTM: Timeout increase aligned with other wait functions.

The wait loop increase (60→120 iterations) in WaitForDeletion is consistent with the changes to EnsureDeletion and appropriate for TLS-enabled environments.


232-266: LGTM: DeepCopy prevents mutation and warning-based cleanup improves resilience.

The use of DeepCopy() at lines 236 and 261 correctly prevents mutation of loop variables when adjusting namespace/name for cross-cluster deletion waits. The warning-based error handling (instead of early returns) ensures cleanup continues even when individual resources fail, which is appropriate for test teardown logic.


278-292: LGTM: Consistent warning-based cleanup for remaining applications.

The warning-based error handling for remaining applications (lines 278-279, 291-292) is consistent with the approach used earlier in the cleanup flow and ensures maximum cleanup coverage.


312-325: LGTM: Correct DeepCopy usage with proper name transformation.

The DeepCopy() at line 318 with the "agent-autonomous-" name prefix (line 319) correctly maps autonomous agent AppProjects to their principal-side counterparts. The warning-based error handling with explanatory comments improves clarity.


345-374: LGTM: Proper DeepCopy usage and consistent warning handling.

The DeepCopy() at line 351 correctly prevents loop variable mutation. The namespace adjustment to "argocd" for the managed agent is appropriate, and the warning-based error handling maintains consistency with the rest of the cleanup logic.


487-491: LGTM: Non-fatal Redis reset improves cleanup resilience.

Treating Redis reset failures as warnings (line 489) rather than fatal errors is appropriate for test cleanup, especially when the Redis connection may be unavailable due to environment issues (e.g., port-forward termination). The explanatory message provides helpful context.


497-498: Verify the function rename is correct.

The function call changed from getCacheInstance to getCachedCacheInstance at line 497. Ensure that getCachedCacheInstance exists with the expected signature and that no remaining references to the old getCacheInstance function are left elsewhere in the codebase. The error wrapping with %w at line 498 follows best practices for error chain preservation.

test/e2e/redis_proxy_test.go (2)

184-184: Extended pod‑replacement Eventually window looks appropriate

Bumping the pod‑creation Eventually timeout to 60s with a 5s interval is a sensible adjustment given TLS + Redis + cluster variability; the values still look bounded and won’t excessively slow failures.

Also applies to: 402-402


211-237: ResourceTree Eventually with transient error handling looks solid

Wrapping the post‑deletion ResourceTree check in requires.Eventually with explicit handling for non‑nil errors and nil trees is a good improvement. It should help tolerate transient EOF/Redis/SSE issues while still failing deterministically if the new pod never appears in the tree. The logging around retries is also useful for diagnosing flakes.

Also applies to: 430-456

hack/dev-env/gen-redis-tls-certs.sh (2)

1-26: LGTM - Certificate generation structure is sound.

The script properly uses set -e for error handling, generates a 4096-bit RSA CA with 10-year validity, and is idempotent by checking for existing files before regeneration.


106-136: LGTM - Agent certificate loop is well-structured.

The loop generates certificates for both autonomous and managed agents with appropriate SANs. The idempotent checks for existing files are correct.

hack/dev-env/configure-redis-tls.sh (2)

61-66: LGTM - Certificate validation is comprehensive.

The validation now correctly checks for the server certificate, key, and CA certificate as suggested in past reviews.


198-229: LGTM - Redis password handling is correct.

The script properly retrieves the Redis password from the secret, fails fast if missing, and correctly interpolates it into the JSON patch using shell variable expansion with proper quoting.

hack/dev-env/configure-argocd-redis-tls.sh (1)

316-325: LGTM - Replica guard logic correctly uses explicit if statements.

The replica validation now properly handles both empty and "0" values, ensuring at least 1 replica is scaled up. This addresses the past review concern about shell operator precedence.

Makefile (1)

59-79: LGTM - Redis TLS setup sequence is well-organized.

The setup follows a logical per-cluster pattern: certificate generation → Redis TLS → ArgoCD TLS for each vcluster. Make's default behavior will stop on first script failure, and the scripts use set -e internally.

docs/configuration/redis-tls.md (2)

1-49: LGTM - Documentation overview and architecture are clear.

The introduction, architecture diagram, and TLS configuration points are well-documented and provide a clear understanding of the Redis TLS setup.


329-340: LGTM - Principal options table is accurate.

The flag names, environment variables, and defaults are correctly documented, matching the implementation in the codebase.

cmd/argocd-agent/agent.go (3)

184-199: LGTM - Redis TLS configuration logic is well-structured.

The mutual exclusion check prevents conflicting --redis-tls-insecure and --redis-tls-ca-path options. The security warning for insecure mode is appropriate.


241-250: LGTM - Redis TLS flags with secure defaults.

TLS is enabled by default (true), insecure mode is disabled by default (false), and the environment variable naming follows the established ARGOCD_AGENT_* convention.


73-77: LGTM - Redis TLS variable declarations.

The new TLS configuration variables are properly scoped within the command function alongside other configuration options.

hack/dev-env/start-agent-managed.sh (3)

37-46: LGTM!

The Redis TLS certificate detection logic is clear and appropriate for the dev/e2e environment. The messaging guides developers to generate certificates when needed.


48-61: LGTM!

The Redis address configuration appropriately defaults to localhost:6381 for local development, with clear documentation about port-forward requirements. This approach supports TLS certificate validation since localhost is included in the certificate SANs.


76-89: LGTM!

The agent startup command cleanly integrates TLS-related arguments through variable injection. The ordering and structure are appropriate for the managed agent mode.

hack/dev-env/start-agent-autonomous.sh (2)

37-61: LGTM!

The Redis TLS detection and address configuration mirror the managed agent script with appropriate port differentiation (6382 for autonomous vs 6381 for managed). This supports running multiple agents locally with distinct port-forwards.


76-91: LGTM!

The autonomous agent startup command properly integrates TLS arguments with mode-specific configuration (autonomous mode with distinct metrics/healthz ports).

hack/dev-env/start-principal.sh (3)

23-28: LGTM!

The Redis address configuration appropriately relies on external port-forward management (via Procfile.e2e or manual setup), avoiding the port conflict that was addressed in previous review iterations.


42-43: LGTM!

Setting a longer informer sync timeout (120s) for E2E tests is appropriate for CI environments where cluster startup and informer synchronization may be slower.


47-65: LGTM!

The Redis TLS configuration is thorough, checking for all required files (cert, key, and CA) and properly configuring both server TLS (for incoming connections from Argo CD) and upstream CA (for connections to Redis). The documentation of SANs is helpful for understanding the certificate requirements.

agent/agent.go (3)

328-348: LGTM!

The cluster cache TLS configuration is well-structured, with appropriate handling of insecure mode (with warning), CA certificate loading, and error propagation. The TLS 1.2 minimum version is a secure default.


350-354: LGTM!

The cluster cache initialization cleanly integrates the TLS configuration, with appropriate error handling and assignment to the agent's clusterCache field.


448-465: LGTM!

The cluster cache refresh logic is well-implemented with an immediate startup update followed by periodic updates via ticker. Both managed and autonomous agents appropriately send cluster cache info updates, and context cancellation is properly handled.

internal/argocd/cluster/cluster.go (2)

175-192: LGTM!

The signature change to NewClusterCacheInstance cleanly adds TLS configuration support. Since this is an internal package, the breaking change is acceptable. The TLS config is properly wired into the Redis client options.


135-141: LGTM!

Initializing the ConnectionState when it doesn't exist yet is appropriate for handling the initial agent connection. The default values (Successful status, descriptive message, current timestamp) are reasonable.

hack/dev-env/start-e2e.sh (2)

50-56: LGTM!

The Redis address configuration uses localhost with distinct ports for each component, which enables TLS certificate validation (localhost is included in the certificate SANs) while supporting multiple concurrent agents. This aligns with the port-forward setup in Procfile.e2e.


58-59: LGTM!

The Redis password retrieval properly separates the assignment from the export, addressing the shellcheck warning that was raised in previous review comments.

hack/dev-env/Procfile.e2e (1)

1-7: LGTM!

The Procfile.e2e cleanly orchestrates the E2E test environment:

  • Port-forward entries provide Redis and Argo CD server access with distinct ports for each component
  • Sleep delays ensure port-forwards establish before components start (3s for principal, 5s for agents)
  • Environment variable passing enables per-agent Redis address configuration

This structure supports running multiple agents with TLS-enabled Redis connections in the E2E test environment.

cmd/argocd-agent/principal.go (5)

259-261: LGTM!

The informer sync timeout is conditionally applied only when greater than zero, allowing the 0 default to use the internal default while supporting explicit override for slow environments. This addresses the previous comment about clarifying default semantics.


263-275: LGTM!

The Redis server TLS configuration properly handles both path-based and secret-based modes with validation ensuring cert and key are provided together. The logging clearly indicates which source is being used.


277-304: LGTM!

The upstream TLS validation correctly ensures mutual exclusivity between the three modes (insecure, CA from file, CA from secret). The exclusion of the default secret name from the validation count is appropriate because it allows the default to be used when no other mode is explicitly configured, while still catching conflicts when users explicitly set multiple modes.


434-459: LGTM!

The CLI flags for informer sync timeout and Redis TLS configuration are well-documented with clear descriptions. Redis TLS is appropriately enabled by default for security, and the default secret name is consistent across related flags.


490-490: LGTM!

Increasing the timeout from 2s to 30s for fetching resource proxy TLS configuration from Kubernetes is appropriate. The original timeout was tight and could cause spurious failures in busy clusters or CI environments.

principal/redisproxy/redisproxy.go (1)

157-200: LGTM!

The TLS-enabled and plaintext listener setup is well-structured with clear conditional logic and appropriate logging for both modes.

test/e2e/fixture/cluster.go (2)

128-141: Use of cached Redis cache instances in Get*ClusterInfo looks good

Switching GetManagedAgentClusterInfo and GetPrincipalClusterInfo to use getCachedCacheInstance reduces repeated client creation and matches the new cached client infrastructure. The additional debug logging around principal lookups and cache misses is also helpful for E2E failures.

Also applies to: 157-171


173-216: Verify TLS error handling and code duplication patterns in cluster.go

The review suggests two improvements to the TLS configuration code in getCacheInstance:

  • Restrict os.Stat error handling to explicitly check os.IsNotExist(err) rather than treating all errors as "file not found", and fail fast on permission or transient errors
  • Extract duplicated TLS setup logic (for principal and managed-agent branches) into a helper function to reduce duplication and keep both call sites in sync

The suggestions are sound best practices, but require verification that:

  • The referenced lines 224-256 and 261-267 contain substantially identical TLS configuration code
  • The current error handling genuinely treats all os.Stat errors as "missing file"
  • These improvements align with the test fixture's actual implementation

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from ada2bb0 to bd9c416 Compare December 10, 2025 13:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
hack/dev-env/start-agent-managed.sh (1)

63-74: Verify error handling for certificate extraction.

A previous review requested adding error handling to the kubectl certificate extraction commands (lines 68-73), and the review was marked as addressed. However, the current code still lacks explicit error checks. Please verify that proper error handling is in place or consider adding it to prevent silent failures:

 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \
-  -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}"
+  -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}" || { echo "Failed to extract TLS cert"; exit 1; }
test/e2e/fixture/cluster.go (1)

173-317: Consider factoring TLS configuration and revisiting cached client cleanup.

Two small, non-blocking improvements to keep this fixture maintainable:

  • The TLS wiring for principal and managed-agent Redis in getCacheInstance is nearly identical (min version, CA loading with optional InsecureSkipVerify). Pulling that into a small helper like buildTLSConfig(enabled bool, caPath string, who string) *tls.Config would avoid drift if you later tweak verification behavior for one side.

  • CleanupRedisCachedClients currently just clears the cachedRedisClients map and relies on GC to close underlying connections. If appstatecache.Cache ever exposes a Close() or if you decide to track the underlying *redis.Client alongside the cache, this would be the natural place to explicitly close them before resetting the map.

Neither is urgent, but both would make future TLS changes and resource management a bit safer.

🧹 Nitpick comments (6)
test/e2e/fixture/toxyproxy.go (1)

119-134: Dynamic principal readiness timeout looks good; consider centralizing the 120s constant

The new timeout logic (120s by default, 180s for principal) aligns with the comment about informer sync and should help reduce principal readiness flakes without impacting other components.

As a minor improvement only if convenient, consider sourcing the 120 * time.Second value from a shared constant or config (if one already exists for the principal informer sync timeout), so this check automatically tracks future changes to that timeout instead of relying on a duplicated magic number and comment.

test/e2e/logs_test.go (2)

118-120: Consider polling for readiness instead of a hard sleep.

The 15-second sleep addresses timing issues but makes the test unconditionally slower. The comment suggests potential test isolation problems ("recover from previous test state"). Consider polling for a specific readiness condition (e.g., checking if the log streaming endpoint responds, or verifying agent connectivity) instead of an arbitrary delay.

If a polling target is unclear in the E2E environment, you could combine a shorter initial delay with a readiness check:

-	// Wait for log streaming proxy to be ready (especially when running after other tests)
-	// The managed agent needs more time to recover from previous test state
-	time.Sleep(15 * time.Second)
+	// Wait for log streaming proxy to be ready with exponential backoff
+	backoff := 1 * time.Second
+	for i := 0; i < 5; i++ {
+		// Quick health check: attempt to fetch fresh app metadata
+		testApp := &v1alpha1.Application{}
+		if err := suite.PrincipalClient.Get(suite.Ctx, types.NamespacedName{Namespace: "agent-managed", Name: appName}, testApp, metav1.GetOptions{}); err == nil {
+			break
+		}
+		time.Sleep(backoff)
+		backoff *= 2
+	}

243-245: Hard sleep suggests connection cleanup issues; consider investigating proper teardown.

The 5-second sleep to allow log streaming connections to close suggests that TLS-enabled connections may not be closing promptly. This could indicate missing context cancellation, improper defer statements, or connection pool cleanup issues in the log streaming implementation.

Consider investigating the log streaming connection lifecycle to ensure proper cleanup. If connections aren't closing immediately:

  1. Verify that contexts passed to log streaming are properly canceled.
  2. Check if Redis TLS connections have appropriate timeouts and are being closed in defer statements.
  3. Consider if connection pooling in TLS mode requires explicit drain/close calls.

If a sleep is necessary for E2E stability, at least reduce it and add a comment explaining the specific resource being awaited:

-	// Allow time for log streaming connections to fully close before next test
-	time.Sleep(5 * time.Second)
+	// Brief delay to allow Redis TLS connections to fully close
+	// TODO: Investigate proper connection cleanup to eliminate this sleep
+	time.Sleep(2 * time.Second)
test/e2e/fixture/argoclient.go (1)

387-409: Add LoadBalancer ingress IP as a fallback in GetArgoCDServerEndpoint.

If spec.LoadBalancerIP is empty and the service has only an ingress IP (no hostname), argoEndpoint will stay empty and callers will fail. Consider also falling back to Ingress[0].IP:

-	argoEndpoint := srvService.Spec.LoadBalancerIP
-	if len(srvService.Status.LoadBalancer.Ingress) > 0 {
-		if hostname := srvService.Status.LoadBalancer.Ingress[0].Hostname; hostname != "" {
-			argoEndpoint = hostname
-		}
-	}
+	argoEndpoint := srvService.Spec.LoadBalancerIP
+	if len(srvService.Status.LoadBalancer.Ingress) > 0 {
+		ingress := srvService.Status.LoadBalancer.Ingress[0]
+		if ingress.Hostname != "" {
+			argoEndpoint = ingress.Hostname
+		} else if ingress.IP != "" {
+			argoEndpoint = ingress.IP
+		}
+	}

This keeps the env override behavior while making the K8s-based path more robust.

hack/dev-env/configure-redis-tls.sh (1)

68-71: Consider surfacing a clear error when kubectl config use-context fails.

With set -e, a bad context will stop the script but without an explicit message. A small guard improves UX:

 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || {
+    echo "Error: Failed to switch to context ${CONTEXT}" >&2
+    exit 1
+}

This makes it obvious why the script exited when the context is misconfigured.

test/run-e2e.sh (1)

88-121: Tighten macOS port-forward detection to verify all three ports.

Using a single lsof -i :6380 -i :6381 -i :6382 only guarantees that at least one of the ports is open. You can have a misconfigured start-e2e (e.g., 6380 only) and still pass this check.

Consider checking each port individually:

-if [[ "$OSTYPE" == "darwin"* ]]; then
-    if ! lsof -i :6380 -i :6381 -i :6382 >/dev/null 2>&1; then
+if [[ "$OSTYPE" == "darwin"* ]]; then
+    missing=0
+    for port in 6380 6381 6382; do
+        if ! lsof -i ":${port}" >/dev/null 2>&1; then
+            echo "  Port-forward for localhost:${port} not detected"
+            missing=1
+        fi
+    done
+
+    if [[ $missing -ne 0 ]]; then
         echo ""
         echo " WARNING: Port-forwards not detected!"
@@
-        sleep 3
-    else
-        echo "✓ Port-forwards detected (localhost:6380, 6381, 6382)"
-        echo ""
-    fi
+        sleep 3
+    else
+        echo "✓ Port-forwards detected (localhost:6380, 6381, 6382)"
+        echo ""
+    fi

This makes the warning trigger whenever any of the three Redis forwards is missing.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ada2bb0 and bd9c416.

📒 Files selected for processing (30)
  • Makefile (1 hunks)
  • agent/agent.go (4 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/agent/configuration.md (1 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (3 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (12 hunks)
  • test/e2e/fixture/toxyproxy.go (1 hunks)
  • test/e2e/logs_test.go (3 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
  • hack/dev-env/start-agent-autonomous.sh
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • test/e2e/rp_test.go
  • hack/dev-env/start-principal.sh
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/gen-redis-tls-certs.sh
  • internal/argocd/cluster/cluster.go
  • hack/dev-env/configure-argocd-redis-tls.sh
  • hack/dev-env/Procfile.e2e
  • Makefile
  • docs/configuration/agent/configuration.md
  • test/e2e/redis_proxy_test.go
  • test/e2e/clusterinfo_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-agent-managed.sh
  • test/run-e2e.sh
  • test/e2e/README.md
🧬 Code graph analysis (5)
test/e2e/fixture/fixture.go (3)
internal/logging/logfields/logfields.go (1)
  • Name (59-59)
internal/backend/interface.go (1)
  • Namespace (124-127)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (43-59)
  • AgentManagedName (38-38)
  • AgentClusterServerURL (40-40)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
  • Event (115-118)
internal/logging/logfields/logfields.go (1)
  • Event (34-34)
cmd/argocd-agent/agent.go (4)
agent/options.go (3)
  • WithRedisTLSEnabled (112-117)
  • WithRedisTLSInsecure (128-133)
  • WithRedisTLSCAPath (120-125)
principal/options.go (1)
  • WithRedisTLSEnabled (493-498)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
principal/redisproxy/redisproxy.go (2)
principal/listen.go (1)
  • Listener (55-61)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/getting-started/kubernetes/index.md

233-233: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build and push image
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build & cache Go code
  • GitHub Check: Analyze (go)
🔇 Additional comments (17)
test/e2e/logs_test.go (2)

240-240: Verify the need for 2x timeout increase in autonomous mode.

The timeout increase from 30s to 60s (and interval from 1s to 2s) indicates TLS overhead in autonomous mode as well, though less severe than managed mode (2x vs 3x). Since autonomous agents have a simpler connectivity model (direct connection without proxy), this degradation suggests Redis TLS handshake or connection setup delays are affecting both agent types.

Consider gathering metrics on log fetch latency before and after TLS enablement to quantify the performance impact and determine if further optimization is needed.


125-139: Fetching fresh app metadata is a sound improvement; timeout increases warrant investigation.

The logic to fetch fresh application data before requesting logs ensures current metadata and resource versions are used, reducing stale-data issues.

However, the timeout increase from 30s to 90s and polling interval from 1s to 3s represents a significant 3x change. If this was added due to TLS-related overhead, consider investigating the root cause:

  1. Whether managed agent's Redis TLS connection setup is adding unexpected latency.
  2. If there are connection pooling or keep-alive issues with TLS-enabled Redis.
  3. Whether certificate validation is causing delays.

Rather than scaling timeouts indefinitely, understanding the performance impact of the TLS changes would help ensure the solution is robust for production use.

principal/resource.go (1)

42-42: LGTM!

The timeout increase from 10s to 30s is appropriate given the addition of TLS handshakes and potentially higher latency Redis operations in the resource request path. This aligns with timeout values used elsewhere in the TLS implementation.

principal/tracker/tracking.go (1)

75-78: LGTM!

The change from an unbuffered to a buffered channel (capacity 1) is appropriate for preventing potential deadlocks in the request-response pattern. The inline comment clearly documents the rationale, which is helpful for future maintainers.

cmd/argocd-agent/agent.go (1)

184-199: LGTM!

The Redis TLS configuration logic is well-structured with proper validation ensuring mutual exclusivity between insecure mode and CA-based validation. The warning logs for insecure mode are appropriate security reminders.

agent/agent.go (2)

328-354: LGTM!

The TLS configuration for the cluster cache is properly implemented with appropriate security controls:

  • Minimum TLS version enforcement (TLS 1.2)
  • Warning log when insecure mode is used
  • Proper CA certificate loading and validation
  • Clear error propagation

The code correctly mirrors the TLS configuration pattern used elsewhere in the codebase.


450-465: LGTM!

The cluster cache info update goroutine is well-structured:

  • Performs an immediate update on startup before waiting for the first tick (line 453)
  • Uses the validated cacheRefreshInterval with a sensible default (30s)
  • Properly respects context cancellation for clean shutdown

This pattern ensures the principal receives initial cluster state promptly rather than waiting for the first interval.

principal/redisproxy/redisproxy.go (3)

98-154: LGTM!

The TLS configuration API is well-designed with clear separation of concerns:

  • Server TLS configuration (cert/key for incoming connections)
  • Upstream TLS configuration (CA for outgoing connections to Redis)
  • Support for both in-memory certificates and file-based loading

The createServerTLSConfig helper properly handles both configuration sources and enforces TLS 1.2 minimum.


162-183: LGTM!

The dual-mode startup (TLS vs plaintext) is clearly implemented with appropriate logging at each path. The TLS configuration is created and applied before starting the listener, ensuring security is enforced from the first connection.


836-926: LGTM! Excellent TLS implementation.

The upstream connection establishment is comprehensive and addresses all previous review concerns:

Security improvements:

  • Dial timeout prevents indefinite hangs (line 847)
  • TLS handshake deadline prevents stalled connections (line 904)
  • Security warning when server TLS enabled but upstream TLS not configured (lines 858-862)
  • Warning when CA configuration ignored due to insecure mode (lines 874-877)

Design improvements:

  • Upstream TLS decoupled from server TLS (line 866) - allows independent configuration
  • Proper SNI hostname extraction for TLS (lines 896-901)
  • Deadline cleared after successful handshake (lines 916-919)
  • Support for CA from pool, file path, or insecure mode

The implementation correctly handles all TLS scenarios while maintaining clear error messages and security warnings.

cmd/argocd-agent/principal.go (3)

259-261: LGTM!

The informer sync timeout configuration is cleanly implemented, only applying the option when a non-zero timeout is explicitly set. This allows the server to use its internal default (60s) when not specified.


263-304: Redis TLS configuration is well-structured.

The Redis TLS setup properly handles:

  • Server TLS for incoming connections from Argo CD (cert/key from file or secret)
  • Upstream TLS for connections to Redis (CA from file, secret, or insecure mode)
  • Appropriate warning logs for insecure mode
  • Input validation for cert/key pairs

The configuration options provide good flexibility for different deployment scenarios.


277-291: Clarify validation logic for default secret name.

The mutual exclusivity validation on lines 286-287 excludes the default secret name "argocd-redis-tls" from the mode count. This allows users to specify --redis-upstream-ca-path while the secret name defaults to "argocd-redis-tls", and validation passes with modesSet=1.

This creates an inconsistency between the validation logic and the error message: the message states "Only one mode can be specified," but the code treats the default secret as non-exclusive. Clarify whether:

  • The default secret should count toward the mode limit (making --redis-upstream-ca-path + default secret mutually exclusive), OR
  • The default secret is intentionally a fallback that doesn't count as a mode (in which case, update the error message to reflect this)
docs/configuration/redis-tls.md (1)

150-150: [Rewritten review comment]
[Classification tag]

test/e2e/README.md (1)

83-105: Redis TLS section and script references look consistent.

The Redis TLS section now accurately reflects the new dev-env scripts (gen-redis-tls-certs.sh, configure-redis-tls.sh, configure-argocd-redis-tls.sh) and the requirement that TLS be enabled for all E2E runs. No changes needed here.

docs/getting-started/kubernetes/index.md (1)

159-234: Redis TLS getting-started steps are technically sound and aligned with the code.

The new Redis TLS sections (2.4 and 4.4) correctly:

  • Generate a CA and server certs with appropriate SANs,
  • Create the argocd-redis-tls secret with tls.crt, tls.key, and ca.crt,
  • Patch the argocd-redis deployment to use TLS-only on port 6379, and
  • Reuse the same CA across control-plane and workload clusters.

The REDIS_PASSWORD handling in the JSON patch is now shell-expanded correctly. The added “Redis TLS Configuration” link at the bottom ties this doc into the deeper configuration guide. No further changes required.

Also applies to: 341-390, 655-655

test/e2e/fixture/fixture.go (1)

108-172: Improved cleanup robustness and safer object handling look good.

  • Extending EnsureDeletion/WaitForDeletion to 120s and stripping finalizers on timeout makes test teardown more resilient to slow clusters.
  • Using DeepCopy() for Applications and AppProjects before tweaking namespace/name avoids mutating loop variables and is a nice safety improvement.
  • The new resetManagedAgentClusterInfo call is correctly treated as best-effort, so transient Redis/port-forward issues don’t cause test failures.

No changes needed here.

Also applies to: 230-267, 295-375, 487-500

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from bd9c416 to c6242e3 Compare December 10, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redis proxy should support TLS (inbound and outbound)

2 participants