Skip to content

feat: add runtime llm facade#54

Merged
eetoc merged 6 commits into
mainfrom
feature/runtime-llm-facade
Jun 24, 2026
Merged

feat: add runtime llm facade#54
eetoc merged 6 commits into
mainfrom
feature/runtime-llm-facade

Conversation

@eetoc

@eetoc eetoc commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a daemon-side Runtime LLM Facade so guest runtimes no longer receive upstream provider keys directly. The daemon now owns provider/model resolution, session-scoped facade tokens, key/header injection, and OpenAI/Anthropic protocol conversion through ai-api-protocol-bridge.

What changed

  • Add daemon-side LLM provider/model config, facade token storage, and runtime LLM facade routes.
  • Support OpenAI Responses, OpenAI Chat Completions, and Anthropic Messages inbound facade APIs.
  • Bridge OpenAI-family and Anthropic-family requests/responses, including SSE streams.
  • Inject daemon-managed facade env aliases into runtime sessions and agent runs, including LLM_API_KEY, LLM_API_ENDPOINT, provider-specific key aliases, and facade base URLs.
  • Filter real LLM provider keys from user-supplied runtime env while preserving unrelated secret=true env vars.
  • Generate per-session Codex config dynamically instead of hardcoding upstream provider settings in assets.
  • Document runtime LLM facade behavior and supported environment variables.

Token lifecycle & resolution hardening

  • Scope per-agent-run facade tokens to the run: mint at run start and delete on completion, so live tokens never accumulate over the lifetime of a long-running session.
  • Prune revoked/expired token rows on session stop/reconcile as a crash backstop, keeping the token table bounded.
  • Carry run_id in the facade token scope for auditability.
  • Resolve the Anthropic auth header (x-api-key vs Authorization: Bearer) from the same env source the provider key is resolved from, so a session-scoped provider never mixes a key from one scope with a header decided by another.
  • Skip the redundant env/default provider bootstrap on the facade hot path when the requested provider already exists.
  • Internal cleanup: unify the OpenAI/Anthropic env-provider bootstrap behind a shared source-major lookup, share the LLM provider-key denylist between the driver and facade layers, and drop dead code.

Compatibility

Breaking for some Docker setups. Because provider keys are no longer passed through to guest runtimes, Codex/Claude reach their LLM upstream through the daemon facade and need a guest-reachable daemon URL. The bundled docker-compose.yml / docker-compose.deploy.yml default AGENT_COMPOSE_RUNTIME_BASE_URL=http://agent-compose:7410. A daemon run directly on a host with the Docker driver and an HTTP_LISTEN=127.0.0.1:... bind must set AGENT_COMPOSE_RUNTIME_BASE_URL to a host-reachable IP/name and port (e.g. http://host.docker.internal:7410); otherwise facade config is skipped and agent runs have no working LLM credentials.

Related issues

Fixes #27

Related but not closed: #14, #20, #22, #31.

Validation

Rebased onto latest main (incl. #61 agent system prompt convention); both feature sets coexist.

  • go test ./pkg/agentcompose ./pkg/driver (full packages, all green)
  • Targeted: go test ./pkg/agentcompose -run 'TestRuntimeLLM|TestEnsureSessionLLMFacadeConfig|TestEnsureSessionAnthropicEnvProviderAuthUsesSessionEnvOnly|TestRevokeLLMFacadeTokensForSessionPrunesDeadRows|TestDeleteLLMFacadeToken|TestResolveRuntimeLLMTargetByExistingProviderID|TestManagedRuntimeEnvMapKeepsFacadeKeyAliases|TestCreateSessionFiltersLLMProviderKeysFromPersistedEnv|TestAnthropicProvider|TestProviderForwardHeaders'
  • go build ./..., go vet ./pkg/agentcompose ./pkg/driver, gofmt clean

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@eetoc eetoc force-pushed the feature/runtime-llm-facade branch from 17a6d61 to 3273e6d Compare June 23, 2026 23:44
eetoc and others added 5 commits June 24, 2026 09:00
ensureSessionClaudeLLMFacadeConfig tolerated a missing model whenever any
Anthropic key was present, including keys that only live in per-session env.
Request-time provider resolution runs without session env, so such a token can
never resolve a provider and every runtime request fails. Restrict the
tolerance to daemon-level keys (global/os/config), which can bootstrap a
provider from the request's model at call time; a session-only key without a
model now fails fast at config time instead of injecting an unbound token.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Remove the now-unused (*LLMClient).resolveAPIKey, orphaned when Generate
  switched to provider-target header resolution (staticcheck U1000).
- Drop the redundant http.CanonicalHeaderKey wrap in providerForwardHeaders;
  http.Header.Set canonicalizes its key internally (staticcheck S1035).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Session provider keys live only in the non-persisted Session.ProviderEnvItems
and in the llm_provider row written at creation; the session env persisted to
the store has the keys filtered out. After stop/resume, resolution saw only the
key-filtered env, so it skipped the session-env provider (selected only with an
explicit id) and could even overwrite its key with the empty env, leaving the
session without working LLM credentials.

Pin the persisted session-env provider id during resolution when the env can no
longer supply a key for the family, so the durable llm_provider row (the
intended authority for the key) is reused instead of skipped or clobbered. An
env that still carries a key keeps re-bootstrapping, so key rotation is
unaffected. Keeps raw keys out of the persisted session surface.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@eetoc

eetoc commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Regression update for the runtime LLM facade PR:

Validated:

  • Docker Compose service starts successfully.
  • Codex runtime facade works with the OpenAI Responses API path.
  • Claude runtime facade works with the Anthropic Messages API path after the follow-up fixes.
  • Guest runtimes receive ac_* facade tokens instead of raw provider credentials.
  • Provider family selection is now constrained by agent family: Codex selects OpenAI-family providers, Claude selects Anthropic-family providers.
  • The Claude path now handles the SDK's x-api-key auth behavior and injects ANTHROPIC_MODEL / CLAUDE_MODEL so the requested model matches the facade token scope.

Compatibility finding:

  • Runtime Codex facade is not currently compatible with chat_completions upstreams.
  • Codex SDK rejects wire_api = "chat_completions"; it currently expects responses.
  • Keeping Codex guest-facing wire API as responses and bridging to an OpenAI chat upstream also fails because the protocol bridge does not support openai_responses -> openai_chat yet.

Result:

  • responses + Claude/Anthropic paths are validated.
  • chat_completions for runtime Codex facade remains unsupported until responses -> chat_completions bridging or another Codex-compatible strategy is added.
  • Gemini was not included in this regression scope.

Merge stance:

  • OK to merge if the intended scope is Codex Responses + Claude Anthropic facade support.
  • Not a full compatibility sign-off for runtime Codex with chat_completions or Gemini.

@eetoc eetoc merged commit 0534e2c into main Jun 24, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant