Skip to content

feat: Support foreground deletion for Sandbox CRs to wait for pod termination #729

@kon-angelo

Description

@kon-angelo

Problem Statement

When deleting a Sandbox, openshell sandbox delete (and the gRPC DeleteSandbox RPC) returns immediately after the Kubernetes API accepts the delete request. The underlying pod may still be running and terminating. This is because:

  1. SandboxClient::delete() uses DeleteParams::default(), which sends no propagationPolicy — Kubernetes defaults to Background deletion.
  2. The gRPC handler returns DeleteSandboxResponse as soon as the K8s API acknowledges the delete, without polling for actual resource removal.
  3. No ownerReferences or finalizers are set by OpenShell on the Sandbox CR.

This means callers that need to know the sandbox (and its pod) are fully gone must implement their own polling. Currently only the Python SDK provides wait_deleted() as a workaround.

Affected code paths:

  • crates/openshell-server/src/sandbox/mod.rs:257-301SandboxClient::delete() with DeleteParams::default()
  • crates/openshell-server/src/grpc.rs:587-688delete_sandbox gRPC handler returns immediately
  • crates/openshell-cli/src/run.rs:2825-2879 — CLI sandbox_delete returns immediately after gRPC response

Proposed Design

Option A: Foreground propagation policy

Set propagation_policy: Some(PropagationPolicy::Foreground) in DeleteParams when deleting the Sandbox CR. This tells the Kubernetes API server to block until all dependent resources (pods, services) are garbage-collected via ownerReferences.

Prerequisite: Verify that the agent-sandbox-controller sets ownerReferences on child Pods pointing back to the Sandbox CR. If it does not, foreground propagation alone will not block on pod deletion.

Option B: Server-side poll after delete

After issuing the delete, the gRPC handler polls (or watches) the Sandbox resource until it returns 404 Not Found, then responds to the client. This is similar to what the Python SDK's wait_deleted() does today but moved server-side so all clients benefit.

Option C: Expose a wait parameter on the RPC

Add an optional wait field to DeleteSandboxRequest (or a separate DeleteSandboxAndWait RPC). When set, the server waits for full removal before responding. When unset, behavior remains fire-and-forget for callers that don't need synchronous deletion.

Recommendation

Option C gives the most flexibility — existing behavior is preserved by default, and callers that need synchronous deletion opt in. The server-side implementation can use either foreground propagation (Option A) or polling (Option B) under the hood.

Alternatives Considered

  • Client-side polling only (status quo for Python SDK): Works but forces every client to reimplement the same logic. The Rust CLI and TUI currently have no wait support at all.
  • Finalizers on the Sandbox CR: OpenShell could add a finalizer and only remove it after confirming the pod is gone. This is more invasive and couples OpenShell more tightly to the controller lifecycle.

Agent Investigation

Investigation of the codebase confirmed:

Component Waits for pod deletion?
CLI (openshell sandbox delete) No
CLI (--no-keep cleanup) No
TUI delete action No
gRPC server handler No
Python SDK SandboxClient.delete() No
Python SDK SandboxClient.wait_deleted() Yes — polls every 1s, 60s timeout
Python SDK Sandbox.__exit__() Yes — via wait_deleted()
  • DeleteParams::default() sends no propagationPolicy → Kubernetes defaults to Background
  • No ownerReferences or finalizers are set by OpenShell on the Sandbox CR (sandbox/mod.rs:197-202)
  • The async watcher (spawn_sandbox_watcher) eventually picks up the Deleted event and cleans up the store, but this is decoupled from the delete response
  • A safety-net reconciler runs every 60s to catch orphaned store records

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions