Skip to content

Commit 782bd95

Browse files
authored
refactor(sandbox): move secrets to supervisor placeholders (#192)
1 parent 992ffce commit 782bd95

File tree

11 files changed

+839
-164
lines changed

11 files changed

+839
-164
lines changed

architecture/sandbox-providers.md

Lines changed: 57 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,12 @@ providers centralize that concern: a user configures a provider once, and any sa
1111
needs that external service can reference it.
1212

1313
At sandbox creation time, providers are validated and associated with the sandbox. The
14-
sandbox supervisor then fetches credentials at runtime and injects them as environment
15-
variables into every child process it spawns. Access is enforced through the sandbox
16-
policy — the policy decides which outbound requests are allowed or denied based on
17-
the providers attached to that sandbox.
14+
sandbox supervisor then fetches credentials at runtime, keeps the real secret values in
15+
supervisor-only memory, and injects placeholder environment variables into every child
16+
process it spawns. When outbound traffic is allowed through the sandbox proxy, the
17+
supervisor rewrites those placeholders back to the real secret values before forwarding.
18+
Access is enforced through the sandbox policy — the policy decides which outbound
19+
requests are allowed or denied based on the providers attached to that sandbox.
1820

1921
Core goals:
2022

@@ -57,7 +59,8 @@ The gRPC surface is defined in `proto/navigator.proto`:
5759
- persistence using `object_type = "provider"`.
5860
- `crates/navigator-sandbox`
5961
- sandbox supervisor fetches provider credentials via gRPC at startup,
60-
- injects credentials as environment variables into entrypoint and SSH child processes.
62+
- injects placeholder env vars into entrypoint and SSH child processes,
63+
- resolves placeholders back to real secrets in the outbound proxy path.
6164

6265
## Provider Plugins
6366

@@ -242,12 +245,18 @@ In `run_sandbox()` (`crates/navigator-sandbox/src/lib.rs`):
242245
2. fetches provider credentials via gRPC (`GetSandboxProviderEnvironment`),
243246
3. if the fetch fails, continues with an empty map (graceful degradation with a warning).
244247

245-
The returned `provider_env` `HashMap<String, String>` is then threaded to both the
246-
entrypoint process spawner and the SSH server.
248+
The returned `provider_env` `HashMap<String, String>` is immediately transformed into:
249+
250+
- a child-visible env map with placeholder values such as
251+
`openshell:resolve:env:ANTHROPIC_API_KEY`, and
252+
- a supervisor-only in-memory registry mapping each placeholder back to its real secret.
253+
254+
The placeholder env map is threaded to the entrypoint process spawner and SSH server.
255+
The registry is threaded to the proxy so it can rewrite outbound headers.
247256

248257
### Child Process Environment Variable Injection
249258

250-
Provider credentials are injected into child processes in two places, covering all
259+
Provider placeholders are injected into child processes in two places, covering all
251260
process spawning paths inside the sandbox:
252261

253262
**1. Entrypoint process** (`crates/navigator-sandbox/src/process.rs`):
@@ -257,14 +266,16 @@ let mut cmd = Command::new(program);
257266
cmd.args(args)
258267
.env("OPENSHELL_SANDBOX", "1");
259268

260-
// Set provider environment variables (credentials fetched at runtime).
269+
// Set provider environment variables (supervisor-managed placeholders).
261270
for (key, value) in provider_env {
262271
cmd.env(key, value);
263272
}
264273
```
265274

266275
This uses `tokio::process::Command`. The `.env()` call adds each variable to the child's
267-
inherited environment without clearing it.
276+
inherited environment without clearing it. The spawn path also explicitly removes
277+
`NEMOCLAW_SSH_HANDSHAKE_SECRET` so the handshake secret does not leak into the agent
278+
entrypoint process.
268279

269280
After provider env vars, proxy env vars (`HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, etc.)
270281
are also set when `NetworkMode` is `Proxy`. The child is then launched with namespace
@@ -281,14 +292,30 @@ cmd.env("OPENSHELL_SANDBOX", "1")
281292
.env("USER", "sandbox")
282293
.env("TERM", term);
283294

284-
// Set provider environment variables (credentials fetched at runtime).
295+
// Set provider environment variables (supervisor-managed placeholders).
285296
for (key, value) in provider_env {
286297
cmd.env(key, value);
287298
}
288299
```
289300

290301
This uses `std::process::Command`. The `SshHandler` holds the `provider_env` map and
291-
passes it to `spawn_pty_shell()` for each new shell or exec request.
302+
passes it to `spawn_pty_shell()` for each new shell or exec request. SSH child processes
303+
start from `env_clear()`, so the handshake secret is not present there.
304+
305+
### Proxy-Time Secret Resolution
306+
307+
When a sandboxed tool uses one of these placeholder env vars to populate an outbound HTTP
308+
header (for example `Authorization: Bearer openshell:resolve:env:ANTHROPIC_API_KEY`), the
309+
sandbox proxy rewrites the placeholder to the real secret value immediately before the
310+
request is forwarded upstream.
311+
312+
This applies to:
313+
314+
- forward-proxy HTTP requests, and
315+
- L7-inspected REST requests inside CONNECT tunnels.
316+
317+
The real secret value remains in supervisor memory only; it is not re-injected into the
318+
child process environment.
292319

293320
### End-to-End Flow
294321

@@ -309,13 +336,17 @@ CLI: openshell sandbox create -- claude
309336
K8s: pod starts navigator-sandbox binary
310337
+-- OPENSHELL_SANDBOX_ID and OPENSHELL_ENDPOINT set in pod env
311338
|
312-
Sandbox supervisor: run_sandbox()
313-
+-- Fetches policy via gRPC
314-
+-- Fetches provider env via gRPC
315-
| +-- Gateway resolves: "claude" -> credentials -> {ANTHROPIC_API_KEY: "sk-..."}
316-
+-- Spawns entrypoint: cmd.env("ANTHROPIC_API_KEY", "sk-...")
317-
+-- SSH server holds provider_env
318-
+-- Each SSH shell: cmd.env("ANTHROPIC_API_KEY", "sk-...")
339+
Sandbox supervisor: run_sandbox()
340+
+-- Fetches policy via gRPC
341+
+-- Fetches provider env via gRPC
342+
| +-- Gateway resolves: "claude" -> credentials -> {ANTHROPIC_API_KEY: "sk-..."}
343+
+-- Builds placeholder registry
344+
| +-- child env: {ANTHROPIC_API_KEY: "openshell:resolve:env:ANTHROPIC_API_KEY"}
345+
| +-- supervisor registry: {"openshell:resolve:env:ANTHROPIC_API_KEY": "sk-..."}
346+
+-- Spawns entrypoint with placeholder env
347+
+-- SSH server holds placeholder env
348+
| +-- Each SSH shell: cmd.env("ANTHROPIC_API_KEY", "openshell:resolve:env:ANTHROPIC_API_KEY")
349+
+-- Proxy rewrites outbound auth header placeholders -> real secrets
319350
```
320351

321352
## Persistence and Validation
@@ -336,6 +367,10 @@ Providers are stored with `object_type = "provider"` in the shared object store.
336367
- CLI displays only non-sensitive summaries (counts/key names where relevant).
337368
- Credentials are never persisted in the sandbox spec — they exist only in the
338369
provider store and are fetched at runtime by the sandbox supervisor.
370+
- Child processes never receive the raw provider secret values; they only receive
371+
placeholders, and the supervisor resolves those placeholders during outbound proxying.
372+
- `NEMOCLAW_SSH_HANDSHAKE_SECRET` is required by the supervisor/SSH server path but is
373+
explicitly kept out of spawned sandbox child-process environments.
339374

340375
## Test Strategy
341376

@@ -344,3 +379,6 @@ Providers are stored with `object_type = "provider"` in the shared object store.
344379
- Mocked discovery context tests cover env and path-based behavior.
345380
- CLI and gateway integration tests validate end-to-end RPC compatibility.
346381
- `resolve_provider_environment` unit tests in `crates/navigator-server/src/grpc.rs`.
382+
- sandbox unit tests validate placeholder generation and header rewriting.
383+
- E2E sandbox tests verify placeholders are visible in child env, outbound proxy traffic
384+
is rewritten with the real secret, and the SSH handshake secret is absent from exec env.

crates/navigator-sandbox/src/l7/relay.rs

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@
88
//! and either forwards or denies the request.
99
1010
use crate::l7::provider::L7Provider;
11-
use crate::l7::rest::RestProvider;
1211
use crate::l7::{EnforcementMode, L7EndpointConfig, L7Protocol, L7RequestInfo};
12+
use crate::secrets::SecretResolver;
1313
use miette::{IntoDiagnostic, Result, miette};
14-
use std::sync::Mutex;
14+
use std::sync::{Arc, Mutex};
1515
use tokio::io::{AsyncRead, AsyncWrite};
1616
use tracing::{debug, info, warn};
1717

@@ -29,6 +29,8 @@ pub struct L7EvalContext {
2929
pub ancestors: Vec<String>,
3030
/// Cmdline paths.
3131
pub cmdline_paths: Vec<String>,
32+
/// Supervisor-only placeholder resolver for outbound headers.
33+
pub(crate) secret_resolver: Option<Arc<SecretResolver>>,
3234
}
3335

3436
/// Run protocol-aware L7 inspection on a tunnel.
@@ -78,11 +80,9 @@ where
7880
C: AsyncRead + AsyncWrite + Unpin + Send,
7981
U: AsyncRead + AsyncWrite + Unpin + Send,
8082
{
81-
let provider = RestProvider;
82-
8383
loop {
8484
// Parse one HTTP request from client
85-
let req = match provider.parse_request(client).await {
85+
let req = match crate::l7::rest::RestProvider.parse_request(client).await {
8686
Ok(Some(req)) => req,
8787
Ok(None) => return Ok(()), // Client closed connection
8888
Err(e) => {
@@ -134,7 +134,13 @@ where
134134

135135
if allowed || config.enforcement == EnforcementMode::Audit {
136136
// Forward request to upstream and relay response
137-
let reusable = provider.relay(&req, client, upstream).await?;
137+
let reusable = crate::l7::rest::relay_http_request_with_resolver(
138+
&req,
139+
client,
140+
upstream,
141+
ctx.secret_resolver.as_deref(),
142+
)
143+
.await?;
138144
if !reusable {
139145
debug!(
140146
host = %ctx.host,
@@ -145,7 +151,7 @@ where
145151
}
146152
} else {
147153
// Enforce mode: deny with 403 and close connection
148-
provider
154+
crate::l7::rest::RestProvider
149155
.deny(&req, &ctx.policy_name, &reason, client)
150156
.await?;
151157
return Ok(());

0 commit comments

Comments
 (0)