fix: separate ServiceAccount for router workloads#433
fix: separate ServiceAccount for router workloads#433ambient-code[bot] wants to merge 6 commits intomainfrom
Conversation
The controller and router previously shared the same `controller-manager` ServiceAccount, giving the router unnecessary cluster-wide secrets CRUD access. This creates a dedicated `router-sa` ServiceAccount with no RBAC bindings and `automountServiceAccountToken: false`, following the principle of least privilege. Fixes #351 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedBot user detected. To trigger a single review, invoke the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Deploy Preview for jumpstarter-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
| kind: Role | ||
| metadata: | ||
| labels: | ||
| app.kubernetes.io/name: jumpstarter-router |
There was a problem hiding this comment.
do not modify any helm, it has been deprecated.
Helm charts are deprecated; only the operator deployment path should be modified for the separate router ServiceAccount. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Status UpdateReview feedback addressed
CI failures analysisThe following CI jobs failed:
The Helm chart revert should resolve the helm-path failures. The operator-path changes (separate |
The router process needs Kubernetes API access at startup to load its configuration from a ConfigMap (via ctrl.GetConfigOrDie() and LoadRouterConfiguration). Setting AutomountServiceAccountToken: false on both the ServiceAccount and pod spec prevented the router from authenticating, causing the pod to crash and never become ready (180s timeout in CI). Changes: - Remove AutomountServiceAccountToken: false from router ServiceAccount and pod spec so the token is mounted - Add a minimal router Role granting read-only access to configmaps and secrets (the only resources the router needs) - Add a RoleBinding to bind the router Role to the router ServiceAccount This maintains the security goal of separating the router SA from the controller SA while granting only the minimum permissions needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause of CI failureThe router pod ( The router's Fix
This maintains the security goal of separating the router SA from the controller SA (the router SA has far fewer permissions than the controller SA) while providing the minimum permissions the router actually needs. |
CI Fix: Router ServiceAccount needs K8s API accessRoot cause: The router process calls Fix (commit 93ce272):
This maintains the security separation goal — the router SA has far fewer permissions than the controller SA — while granting the minimum permissions the router actually needs to function. |
| APIGroups: []string{""}, | ||
| Resources: []string{"secrets"}, | ||
| Verbs: []string{"get", "list", "watch"}, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
does the router really need to get/list/watch secrets? why?
| Resources: []string{"secrets"}, | ||
| Verbs: []string{"get", "list", "watch"}, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Good catch — you're right, it doesn't. Looking at LoadRouterConfiguration in controller/internal/config/config.go, the router only calls client.Get() on a ConfigMap (jumpstarter-controller), and LoadGrpcConfiguration just parses the in-memory config struct without any K8s API calls. The router never reads secrets.
I'll remove the secrets permission from the router Role so it only has get/list/watch on configmaps.
The router only reads a ConfigMap via LoadRouterConfiguration() and does not access any secrets. Remove the secrets PolicyRule from the router Role per review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review Status UpdateAll review feedback has been addressed:
The PR is ready for review. The operator deployment path now creates a separate |
| existingRouterRoleBinding.Labels = desiredRouterRoleBinding.Labels | ||
| existingRouterRoleBinding.Annotations = desiredRouterRoleBinding.Annotations | ||
| existingRouterRoleBinding.Subjects = desiredRouterRoleBinding.Subjects | ||
| existingRouterRoleBinding.RoleRef = desiredRouterRoleBinding.RoleRef |
There was a problem hiding this comment.
Potential issue: assigning to immutable RoleRef in the update path
Kubernetes considers RoleRef immutable after a RoleBinding is created. If it ever differs, the API server will reject the update. The safe pattern is to delete and recreate the RoleBinding when RoleRef changes.
I see this mirrors the existing controller RoleBinding pattern at lines 175-176, so it's a pre-existing concern. In practice the RoleRef is deterministic and won't change between reconciliations, so this won't cause runtime failures. But if you ever refactor the reconciliation logic, switching to delete-and-recreate for RoleRef mismatches would be the more robust approach.
Non-blocking -- just something to keep in mind.
AI-generated, human reviewed
There was a problem hiding this comment.
Acknowledged. As you noted, the RoleRef is deterministic here so it won't cause runtime issues. The delete-and-recreate pattern for RoleRef mismatches is worth considering in a future reconciliation refactor.
There was a problem hiding this comment.
Respectfully pushing back on implementing the delete-and-recreate pattern in this PR. Here is the reasoning:
-
The RoleRef is deterministic and hardcoded -- it is always
{name}-router-rolewith kindRoleand apiGrouprbac.authorization.k8s.io. It literally cannot differ between reconciliations, so the immutability constraint can never be triggered at runtime. -
This is a pre-existing pattern -- the controller RoleBinding at lines 175-176 uses the exact same
CreateOrUpdateapproach with direct RoleRef assignment. Fixing only the router side would be inconsistent; fixing both would expand this PR scope significantly. -
Your original comment said non-blocking -- you noted the RoleRef is deterministic and will not change between reconciliations, so this will not cause runtime failures. The risk is theoretical only.
-
Scope creep -- adding delete-and-recreate logic introduces a fundamentally different reconciliation pattern (delete + create vs. create-or-update), requires careful error handling for the gap between delete and create, and should be tested in its own PR with proper attention.
I would prefer to keep this PR focused on the ServiceAccount separation and address the reconciliation pattern improvement (for both controller and router RoleBindings) in a dedicated follow-up.
There was a problem hiding this comment.
Implemented in 4a87d67. Replaced the CreateOrUpdate pattern for both controller and router RoleBindings with a dedicated reconcileRoleBinding helper that:
- Creates the RoleBinding if it does not exist
- Detects RoleRef mismatches and uses delete-and-recreate (since RoleRef is immutable)
- Updates labels, annotations, and subjects in-place when only those change
This applies to both the controller RoleBinding and the router RoleBinding consistently.
| } | ||
|
|
||
| // createRouterServiceAccount creates a service account for the router with minimal RBAC permissions | ||
| func (r *JumpstarterReconciler) createRouterServiceAccount(jumpstarter *operatorv1alpha1.Jumpstarter) *corev1.ServiceAccount { |
There was a problem hiding this comment.
Suggestion: consider adding unit tests for the new router RBAC functions
The three new functions (createRouterServiceAccount, createRouterRole, createRouterRoleBinding) and their reconciliation logic don't have any unit test coverage. The existing controller RBAC also lacks unit tests, so this is a pre-existing gap, but this PR widens it.
E2e tests provide integration-level validation, but targeted unit tests would catch regressions more quickly -- for example, verifying the SA name format, the exact policy rules on the Role, and the RoleBinding subject/roleRef wiring. Would be worth adding at least basic assertions for the builder functions.
AI-generated, human reviewed
There was a problem hiding this comment.
Agreed that unit test coverage for RBAC builder functions would be valuable, but as you noted this is a pre-existing gap. Adding unit tests here would significantly expand the scope of this patch beyond the focused ServiceAccount separation fix. I'd prefer to handle that in a dedicated follow-up PR that adds test coverage for both the existing controller RBAC functions and the new router ones together.
| "namespace", existingSA.Namespace, | ||
| "operation", op) | ||
|
|
||
| // Router ServiceAccount (zero RBAC, no token automount) |
There was a problem hiding this comment.
Nit: this comment says "zero RBAC, no token automount" but the router SA actually does have RBAC (via the Role and RoleBinding created below) and AutomountServiceAccountToken is not explicitly disabled. Looks like a leftover from an earlier iteration. Something like // Router ServiceAccount or // Router ServiceAccount (uses dedicated minimal Role) would be more accurate.
AI-generated, human reviewed
There was a problem hiding this comment.
Fixed in f1a546e -- updated to // Router ServiceAccount (uses dedicated minimal Role).
| } | ||
| } | ||
|
|
||
| // createRouterServiceAccount creates a service account for the router with minimal RBAC permissions |
There was a problem hiding this comment.
Nit: this comment says "creates a service account for the router with minimal RBAC permissions" but the ServiceAccount object itself doesn't carry any RBAC permissions -- those come from the separate Role and RoleBinding. Consider something like creates a dedicated service account for router workloads.
AI-generated, human reviewed
There was a problem hiding this comment.
Fixed in f1a546e -- updated to creates a dedicated service account for router workloads.
| existingRouterSA.Name = desiredRouterSA.Name | ||
| existingRouterSA.Namespace = desiredRouterSA.Namespace | ||
|
|
||
| op, err = controllerutil.CreateOrUpdate(ctx, r.Client, existingRouterSA, func() error { |
There was a problem hiding this comment.
Nit: the controller SA block above (line 22) has an explicit comment explaining why SetControllerReference is not called (to prevent garbage collection on CR deletion). This router SA block intentionally follows the same pattern, but the rationale isn't stated. A brief comment like the one on line 22 would make the intent clear to future readers.
AI-generated, human reviewed
There was a problem hiding this comment.
Fixed in f1a546e -- added the same SetControllerReference rationale comment that the controller SA block has.
| "namespace", existingRouterRoleBinding.Namespace, | ||
| "operation", op) | ||
|
|
||
| return nil |
There was a problem hiding this comment.
Observation: the reconciliation blocks for the router SA, Role, and RoleBinding are nearly identical to their controller counterparts (~180 lines of duplicated structure). This is a pre-existing pattern that this PR extends, so not something to block on. But if you ever revisit this area, a generic reconciliation helper (using Go generics or a shared closure) could reduce the duplication significantly and ensure any future fix only needs to be applied once.
AI-generated, human reviewed
There was a problem hiding this comment.
Agreed. A generic reconciliation helper using Go generics would clean this up nicely. Keeping it consistent with the existing pattern for now to minimize review surface, but worth tackling in a dedicated refactor.
| func (r *JumpstarterReconciler) createRouterServiceAccount(jumpstarter *operatorv1alpha1.Jumpstarter) *corev1.ServiceAccount { | ||
| return &corev1.ServiceAccount{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: fmt.Sprintf("%s-router-sa", jumpstarter.Name), |
There was a problem hiding this comment.
Nit: the controller SA is named {name}-controller-manager (no -sa suffix) while the router SA is {name}-router-sa (with -sa suffix). Minor naming inconsistency -- not a functional issue, but worth noting for consistency if the naming conventions are ever standardized.
AI-generated, human reviewed
There was a problem hiding this comment.
Good catch on the naming inconsistency. The -controller-manager name is pre-existing and likely inherited from kubebuilder scaffolding conventions. Renaming it for consistency would require coordinating changes across the controller deployment and any existing installations (migration concern). I'll keep -router-sa for now since it's clear and descriptive, but a naming standardization pass could be a good follow-up.
- Fix misleading "zero RBAC, no token automount" comment to "uses dedicated minimal Role" - Add missing comment explaining why SetControllerReference is not called on router SA - Fix createRouterServiceAccount doc comment to not reference RBAC permissions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addressing review feedback from @raballewPushed f1a546e to fix the three comment nits:
Items acknowledged but deferred:
CI is all green (tests, e2e-test-operator, deploy-kind for both operator and helm paths all passing). |
Kubernetes considers RoleRef immutable after a RoleBinding is created. Replace the CreateOrUpdate pattern for RoleBindings with a dedicated reconcileRoleBinding helper that detects RoleRef changes and uses delete-and-recreate instead of in-place update. This applies to both the controller and router RoleBindings. Addresses review feedback from @raballew. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
router-saServiceAccount with minimal RBAC permissions for router workloads in the operator deployment path onlycontroller-managerSA with full RBAC (secrets CRUD, CRD access, leader election)Fixes #351
Changes
Operator (only deployment path modified)
rbac.go: AddedcreateRouterServiceAccount()andcreateRouterRole()with reconciliation logicrbac.go: AddedreconcileRoleBinding()helper that handles immutableRoleRefvia delete-and-recreate pattern (applies to both controller and router RoleBindings)jumpstarter_controller.go: UpdatedcreateRouterDeployment()to use{name}-router-saget/list/watchonconfigmapsonly (no secrets access)Helm chart
Review feedback addressed
secretspermission from router Role (commit 8de3816) - router only reads ConfigMapsRoleRef(commit 4a87d67) - per @raballew's reviewTest plan
router-sawith minimal RBACcontroller-managerSA🤖 Generated with Claude Code