|
| 1 | +# MCPGroup CRD for Kubernetes Operator |
| 2 | + |
| 3 | +## Problem Statement |
| 4 | + |
| 5 | +The CLI supports runtime groups for organizing MCP servers, but this is missing in Kubernetes. The Virtual MCP Server feature (PR #2106) requires groups to discover and aggregate backend servers. Without groups, Kubernetes users cannot use the Virtual MCP or organize their servers logically. |
| 6 | + |
| 7 | +## Goals |
| 8 | + |
| 9 | +- Add MCPGroup support to Kubernetes matching CLI runtime group behavior |
| 10 | +- Enable Virtual MCP Server to discover servers in a group |
| 11 | +- Maintain API consistency between CLI and Kubernetes |
| 12 | +- Keep implementation simple and predictable |
| 13 | + |
| 14 | +## Non-Goals |
| 15 | + |
| 16 | +- Registry groups (CLI-only feature) |
| 17 | +- Cross-namespace groups |
| 18 | +- Multi-group membership per server |
| 19 | +- Client configuration management (not applicable in Kubernetes) |
| 20 | + |
| 21 | +## Design |
| 22 | + |
| 23 | +### Design Decision: MCPGroup CRD vs Labels/Annotations |
| 24 | + |
| 25 | +**Question:** Could we use labels/annotations on MCPServer instead of creating an MCPGroup CRD? |
| 26 | + |
| 27 | +**Answer:** We need MCPGroup as a first-class construct for several reasons: |
| 28 | + |
| 29 | +1. **Meta-MCP and Virtual MCP requirements**: These features need to aggregate multiple MCP servers. They need a way to: |
| 30 | + - Discover which servers belong to a group |
| 31 | + - Reference groups in their configuration |
| 32 | + - Watch for group membership changes |
| 33 | + |
| 34 | +2. **Seamless CLI-to-Kubernetes transition**: The CLI has an explicit Group concept that workloads belong to. Users migrating from CLI to Kubernetes expect the same mental model and API patterns. |
| 35 | + |
| 36 | +3. **Growing ecosystem of constructs**: As we build more features on top of ToolHive (meta-mcp, virtual MCP, future aggregation patterns), we need a consistent way to represent server collections. |
| 37 | + |
| 38 | +4. **Group as an explicit concept**: Labels are meant for flexible, ad-hoc categorization. Groups are a core organizational concept in ToolHive's architecture, deserving explicit representation. |
| 39 | + |
| 40 | +While labels could technically provide grouping, they lack: |
| 41 | +- Discoverability (no list of available groups without scanning all servers) |
| 42 | +- A place for group-level metadata or status |
| 43 | +- Explicit lifecycle management |
| 44 | +- Ability to validate references before use |
| 45 | + |
| 46 | +**Conclusion:** MCPGroup CRD provides the foundation for meta-mcp, virtual MCP, and future aggregation features while maintaining consistency with CLI semantics. |
| 47 | + |
| 48 | +### MCPGroup CRD |
| 49 | + |
| 50 | +Simple CRD for grouping servers: |
| 51 | + |
| 52 | +```yaml |
| 53 | +apiVersion: mcp.toolhive.stacklok.io/v1alpha1 |
| 54 | +kind: MCPGroup |
| 55 | +metadata: |
| 56 | + name: engineering-team |
| 57 | + namespace: default |
| 58 | +spec: |
| 59 | + # Optional human-readable description |
| 60 | + description: "Engineering team MCP servers" |
| 61 | + |
| 62 | +status: |
| 63 | + # Number of servers in this group |
| 64 | + serverCount: 3 |
| 65 | + |
| 66 | + # List of server names for quick reference |
| 67 | + servers: |
| 68 | + - github-server |
| 69 | + - jira-server |
| 70 | + - slack-server |
| 71 | + |
| 72 | + phase: Ready |
| 73 | + conditions: |
| 74 | + - type: Ready |
| 75 | + status: "True" |
| 76 | + lastTransitionTime: "2025-10-15T10:30:00Z" |
| 77 | +``` |
| 78 | +
|
| 79 | +### MCPServer Spec Addition |
| 80 | +
|
| 81 | +Add explicit group field to MCPServer: |
| 82 | +
|
| 83 | +```yaml |
| 84 | +apiVersion: mcp.toolhive.stacklok.io/v1alpha1 |
| 85 | +kind: MCPServer |
| 86 | +metadata: |
| 87 | + name: github-server |
| 88 | + namespace: default |
| 89 | +spec: |
| 90 | + # Existing fields... |
| 91 | + image: ghcr.io/stackloklabs/github-server:latest |
| 92 | + |
| 93 | + # New: explicit group membership |
| 94 | + groupRef: engineering-team |
| 95 | +``` |
| 96 | +
|
| 97 | +**Rationale for explicit groupRef field:** |
| 98 | +- Matches CLI behavior (workload has `Group` field) |
| 99 | +- Follows Kubernetes naming conventions for references (`groupRef` instead of `group`) |
| 100 | +- Simple and predictable |
| 101 | +- Easy to query: `list MCPServers where spec.groupRef = X` |
| 102 | +- No confusion about membership |
| 103 | +- API consistency with CLI |
| 104 | + |
| 105 | +### API Consistency |
| 106 | + |
| 107 | +CLI runtime groups store membership on the workload: |
| 108 | +```go |
| 109 | +type Workload struct { |
| 110 | + Name string |
| 111 | + Group string // Explicit group membership |
| 112 | +} |
| 113 | +``` |
| 114 | + |
| 115 | +Kubernetes should match this pattern: |
| 116 | +```go |
| 117 | +type MCPServerSpec struct { |
| 118 | + // Existing fields... |
| 119 | +
|
| 120 | + // GroupRef is the name of the MCPGroup this server belongs to |
| 121 | + // +optional |
| 122 | + GroupRef string `json:"groupRef,omitempty"` |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +### Controller Behavior |
| 127 | + |
| 128 | +**MCPGroup Controller:** |
| 129 | +- Watches MCPGroup and MCPServer resources |
| 130 | +- Updates `status.servers` list when servers join/leave group |
| 131 | +- Updates `status.serverCount` |
| 132 | +- Validates referenced group exists when MCPServer is created |
| 133 | + |
| 134 | +**MCPServer Controller:** |
| 135 | +- Existing reconciliation logic |
| 136 | +- Validates `spec.groupRef` references an existing MCPGroup (if specified) |
| 137 | +- Adds condition if group reference is invalid |
| 138 | + |
| 139 | +### Discovery API |
| 140 | + |
| 141 | +Virtual MCP (and other features) can discover servers in a group: |
| 142 | + |
| 143 | +```go |
| 144 | +// List all MCPServers in a group |
| 145 | +servers, err := clientset.McpV1alpha1().MCPServers(namespace).List(ctx, metav1.ListOptions{ |
| 146 | + FieldSelector: "spec.groupRef=engineering-team", |
| 147 | +}) |
| 148 | +``` |
| 149 | + |
| 150 | +## Implementation |
| 151 | + |
| 152 | +### Phase 1: Core CRD |
| 153 | +1. Add `GroupRef` field to MCPServer spec |
| 154 | +2. Create MCPGroup CRD types |
| 155 | +3. Implement MCPGroup controller |
| 156 | +4. Add field selector support for group queries |
| 157 | +5. Update CRD manifests and documentation |
| 158 | + |
| 159 | +### Phase 2: Integration |
| 160 | +1. Virtual MCP integration with groups |
| 161 | +2. kubectl plugin support |
| 162 | + |
| 163 | +## Examples |
| 164 | + |
| 165 | +### Standalone MCPServer (No Group) |
| 166 | + |
| 167 | +MCPServers can run without belonging to a group: |
| 168 | + |
| 169 | +```yaml |
| 170 | +apiVersion: mcp.toolhive.stacklok.io/v1alpha1 |
| 171 | +kind: MCPServer |
| 172 | +metadata: |
| 173 | + name: standalone-server |
| 174 | + namespace: default |
| 175 | +spec: |
| 176 | + image: ghcr.io/stackloklabs/filesystem:latest |
| 177 | + # No groupRef - server runs independently |
| 178 | +``` |
| 179 | + |
| 180 | +### MCPServer with Group Membership |
| 181 | + |
| 182 | +```yaml |
| 183 | +# Create group |
| 184 | +apiVersion: mcp.toolhive.stacklok.io/v1alpha1 |
| 185 | +kind: MCPGroup |
| 186 | +metadata: |
| 187 | + name: engineering-team |
| 188 | + namespace: default |
| 189 | +spec: |
| 190 | + description: "Engineering team servers" |
| 191 | +--- |
| 192 | +# Create servers in group |
| 193 | +apiVersion: mcp.toolhive.stacklok.io/v1alpha1 |
| 194 | +kind: MCPServer |
| 195 | +metadata: |
| 196 | + name: github-server |
| 197 | +spec: |
| 198 | + image: ghcr.io/stackloklabs/github:latest |
| 199 | + groupRef: engineering-team |
| 200 | +--- |
| 201 | +apiVersion: mcp.toolhive.stacklok.io/v1alpha1 |
| 202 | +kind: MCPServer |
| 203 | +metadata: |
| 204 | + name: jira-server |
| 205 | +spec: |
| 206 | + image: ghcr.io/company/jira:latest |
| 207 | + groupRef: engineering-team |
| 208 | +``` |
| 209 | +
|
| 210 | +### Virtual MCP Usage |
| 211 | +
|
| 212 | +```yaml |
| 213 | +# Virtual MCP references the group |
| 214 | +# NOTE: This is an example of future MCPVirtualServer API (not yet implemented) |
| 215 | +apiVersion: mcp.toolhive.stacklok.io/v1alpha1 |
| 216 | +kind: MCPVirtualServer |
| 217 | +metadata: |
| 218 | + name: engineering-virtual |
| 219 | +spec: |
| 220 | + # References existing group |
| 221 | + groupRef: engineering-team |
| 222 | + |
| 223 | + # Virtual MCP configuration |
| 224 | + aggregation: |
| 225 | + conflictResolution: prefix |
| 226 | +``` |
| 227 | +
|
| 228 | +### Querying Servers in Group |
| 229 | +
|
| 230 | +```bash |
| 231 | +# List all servers in a group |
| 232 | +kubectl get mcpservers -n default --field-selector spec.groupRef=engineering-team |
| 233 | + |
| 234 | +# Check group status |
| 235 | +kubectl get mcpgroup engineering-team -o jsonpath='{.status.servers}' |
| 236 | +``` |
| 237 | + |
| 238 | +## Migration from CLI |
| 239 | + |
| 240 | +CLI groups and Kubernetes groups are separate concepts: |
| 241 | +- **CLI groups**: Local runtime groups (`.toolhive/` directory) |
| 242 | +- **K8s groups**: Namespace-scoped groups (etcd) |
| 243 | + |
| 244 | +**Key differences from CLI:** |
| 245 | +- In CLI: All servers must belong to a group (defaults to "default" group if not specified) |
| 246 | +- In K8s: Servers can optionally belong to a group (`spec.groupRef` is optional) |
| 247 | + |
| 248 | +No automatic migration - users manually create MCPGroup resources and set `spec.groupRef` on MCPServers. |
| 249 | + |
| 250 | +## Type Definitions |
| 251 | + |
| 252 | +```go |
| 253 | +// MCPGroupSpec defines the desired state of MCPGroup |
| 254 | +type MCPGroupSpec struct { |
| 255 | + // Description provides human-readable context |
| 256 | + // +optional |
| 257 | + Description string `json:"description,omitempty"` |
| 258 | +} |
| 259 | + |
| 260 | +// MCPGroupStatus defines observed state |
| 261 | +type MCPGroupStatus struct { |
| 262 | + // Phase indicates current state |
| 263 | + // +optional |
| 264 | + Phase MCPGroupPhase `json:"phase,omitempty"` |
| 265 | + |
| 266 | + // Servers lists server names in this group |
| 267 | + // +optional |
| 268 | + Servers []string `json:"servers,omitempty"` |
| 269 | + |
| 270 | + // ServerCount is the number of servers |
| 271 | + // +optional |
| 272 | + ServerCount int `json:"serverCount"` |
| 273 | + |
| 274 | + // Conditions represent observations |
| 275 | + // +optional |
| 276 | + Conditions []metav1.Condition `json:"conditions,omitempty"` |
| 277 | +} |
| 278 | + |
| 279 | +type MCPGroupPhase string |
| 280 | + |
| 281 | +const ( |
| 282 | + MCPGroupPhaseReady MCPGroupPhase = "Ready" |
| 283 | +) |
| 284 | + |
| 285 | +// Add to MCPServerSpec |
| 286 | +type MCPServerSpec struct { |
| 287 | + // Existing fields... |
| 288 | + |
| 289 | + // GroupRef is the MCPGroup this server belongs to |
| 290 | + // Must reference an existing MCPGroup in the same namespace |
| 291 | + // +optional |
| 292 | + GroupRef string `json:"groupRef,omitempty"` |
| 293 | +} |
| 294 | +``` |
| 295 | + |
| 296 | +## Open Questions |
| 297 | + |
| 298 | +1. **Should groupRef be immutable after creation?** |
| 299 | + - Recommendation: Allow changes, easier for user corrections |
| 300 | + |
| 301 | +2. **What happens if group is deleted?** |
| 302 | + - Recommendation: Servers continue running, `spec.groupRef` becomes dangling reference |
| 303 | + - Controller will log errors and add conditions to affected MCPServer resources |
| 304 | + |
| 305 | +3. **Should we validate group exists on MCPServer create?** |
| 306 | + - Recommendation: Yes, via controller reconciliation |
| 307 | + - Controller validates groupRef and adds status conditions if invalid |
| 308 | + - No webhook needed - keep implementation simple |
| 309 | + |
| 310 | +## Future Enhancements |
| 311 | + |
| 312 | +- Group-level policies and authorization |
| 313 | +- Cross-namespace groups (with security review) |
| 314 | +- Group quotas and resource limits |
| 315 | + |
| 316 | +## Testing |
| 317 | + |
| 318 | +- **Unit**: Group validation, status updates |
| 319 | +- **Integration (envtest)**: Controller reconciliation, field selectors |
| 320 | +- **E2E (Chainsaw)**: Complete group lifecycle, Virtual MCP integration |
0 commit comments