Skip to content

Webhook Middleware Phase 5: Kubernetes CRD and controller integration #3401

@JAORMX

Description

@JAORMX

Overview

Implement Kubernetes CRD and controller support for webhook middleware configuration. This enables declarative webhook configuration for MCP servers running in Kubernetes clusters.

RFC: https://github.com/stacklok/toolhive-rfcs/blob/main/rfcs/THV-0017-dynamic-webhook-middleware.md

Depends on: Phase 2 (Validating webhook), Phase 3 (Mutating webhook)

Files to Create

File Purpose
cmd/thv-operator/api/v1alpha1/mcpwebhookconfig_types.go CRD type definitions
cmd/thv-operator/controllers/mcpwebhookconfig_controller.go Controller implementation
cmd/thv-operator/pkg/controllerutil/webhook.go Webhook config resolution helpers
config/crd/bases/toolhive.stacklok.dev_mcpwebhookconfigs.yaml CRD manifest (generated)

Files to Modify

File Changes
cmd/thv-operator/api/v1alpha1/mcpserver_types.go Add WebhookConfigRef field
cmd/thv-operator/controllers/mcpserver_controller.go Handle webhook config resolution, add watcher
cmd/thv-operator/controllers/mcpserver_runconfig.go Add webhook config to RunConfig builder

CRD Definition

// cmd/thv-operator/api/v1alpha1/mcpwebhookconfig_types.go

type MCPWebhookConfigSpec struct {
    // Validating webhooks called to approve/deny requests
    Validating []WebhookSpec `json:"validating,omitempty"`
    
    // Mutating webhooks called to transform requests
    Mutating []WebhookSpec `json:"mutating,omitempty"`
}

type WebhookSpec struct {
    // Name is a unique identifier for this webhook
    Name string `json:"name"`
    
    // URL is the webhook endpoint (must be HTTPS)
    URL string `json:"url"`
    
    // Timeout for webhook calls (default: 10s, max: 30s)
    // +optional
    Timeout *metav1.Duration `json:"timeout,omitempty"`
    
    // FailurePolicy defines behavior on webhook errors
    // +kubebuilder:validation:Enum=Fail;Ignore
    // +kubebuilder:default=Fail
    FailurePolicy FailurePolicy `json:"failurePolicy,omitempty"`
    
    // TLSConfig for webhook connection
    // +optional
    TLSConfig *WebhookTLSConfig `json:"tlsConfig,omitempty"`
    
    // HMACSecretRef references a secret containing HMAC signing key
    // +optional
    HMACSecretRef *SecretKeyRef `json:"hmacSecretRef,omitempty"`
}

type WebhookTLSConfig struct {
    // CASecretRef references a secret containing CA certificate
    // +optional
    CASecretRef *SecretKeyRef `json:"caSecretRef,omitempty"`
    
    // ClientCertSecretRef references a secret containing client cert for mTLS
    // +optional
    ClientCertSecretRef *SecretKeyRef `json:"clientCertSecretRef,omitempty"`
    
    // InsecureSkipVerify disables certificate verification (NOT for production)
    // +optional
    InsecureSkipVerify bool `json:"insecureSkipVerify,omitempty"`
}

type SecretKeyRef struct {
    // Name of the secret
    Name string `json:"name"`
    // Key within the secret
    Key string `json:"key"`
}

type MCPWebhookConfigStatus struct {
    // ConfigHash is a hash of the spec for change detection
    ConfigHash string `json:"configHash,omitempty"`
    
    // ReferencingServers lists MCPServers using this config
    ReferencingServers []string `json:"referencingServers,omitempty"`
    
    // ObservedGeneration is the last observed generation
    ObservedGeneration int64 `json:"observedGeneration,omitempty"`
    
    // Conditions represent the latest available observations
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

Example CRD Instance

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPWebhookConfig
metadata:
  name: company-webhooks
  namespace: mcp-servers
spec:
  validating:
    - name: policy-check
      url: https://policy.company.com/validate
      timeout: 5s
      failurePolicy: Fail
      tlsConfig:
        caSecretRef:
          name: webhook-ca
          key: ca.crt
      hmacSecretRef:
        name: webhook-secrets
        key: policy-hmac

  mutating:
    - name: request-enricher
      url: https://enricher.company.com/mutate
      timeout: 3s
      failurePolicy: Ignore

MCPServer Reference

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
  name: my-mcp-server
spec:
  # ... other fields ...
  webhookConfigRef:
    name: company-webhooks
// In mcpserver_types.go
type MCPServerSpec struct {
    // ... existing fields ...
    
    // WebhookConfigRef references an MCPWebhookConfig for webhook middleware
    // +optional
    WebhookConfigRef *WebhookConfigRef `json:"webhookConfigRef,omitempty"`
}

type WebhookConfigRef struct {
    // Name of the MCPWebhookConfig resource
    Name string `json:"name"`
}

Controller Implementation

Follow the pattern from MCPExternalAuthConfig:

  1. Finalizer: Prevent deletion while referenced by MCPServers
  2. Hash calculation: Detect config changes
  3. Status updates: Track referencing servers
  4. MCPServer reconciliation trigger: On config changes
// cmd/thv-operator/controllers/mcpwebhookconfig_controller.go

const webhookConfigFinalizer = "toolhive.stacklok.dev/webhookconfig-finalizer"

func (r *MCPWebhookConfigReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch MCPWebhookConfig
    // 2. Handle deletion (finalizer logic)
    // 3. Calculate config hash
    // 4. Update status if changed
    // 5. Find referencing MCPServers
    // 6. Trigger reconciliation for affected MCPServers
}

Watch Setup

// In mcpserver_controller.go SetupWithManager
webhookConfigHandler := handler.EnqueueRequestsFromMapFunc(
    func(ctx context.Context, obj client.Object) []reconcile.Request {
        // Find MCPServers that reference this MCPWebhookConfig
        // Return reconcile requests for each
    },
)

return ctrl.NewControllerManagedBy(mgr).
    For(&mcpv1alpha1.MCPServer{}).
    Watches(&mcpv1alpha1.MCPWebhookConfig{}, webhookConfigHandler).
    Complete(r)

Config Resolution

// cmd/thv-operator/pkg/controllerutil/webhook.go

func AddWebhookConfigOptions(
    ctx context.Context,
    c client.Client,
    namespace string,
    webhookConfigRef *mcpv1alpha1.WebhookConfigRef,
    options *[]runner.RunConfigBuilderOption,
) error {
    // 1. Fetch MCPWebhookConfig
    // 2. Resolve secret references (HMAC, TLS certs)
    // 3. Convert to runner.WebhookConfig
    // 4. Add to options
}

Tests

  • Controller unit tests with fake client
  • Integration tests with envtest
  • E2E tests with Chainsaw:
    • Create MCPWebhookConfig
    • Create MCPServer referencing it
    • Verify webhook config applied
    • Update MCPWebhookConfig, verify reconciliation
    • Delete MCPWebhookConfig (should fail while referenced)

Acceptance Criteria

  • CRD defined and generated (make manifests)
  • Controller implements reconciliation loop
  • Hash-based change detection working
  • Finalizer prevents deletion while referenced
  • MCPServer reconciliation triggered on config changes
  • Secret references resolved correctly
  • Unit tests with >80% coverage
  • Integration tests with envtest
  • E2E tests with Chainsaw
  • Code passes task lint and task test

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiItems related to the APIenhancementNew feature or requestgoPull requests that update go codekubernetesItems related to Kubernetesoperator

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions