This document explains how Yantra talks to LLMs — the provider abstraction, how each API differs, and how the reliable wrapper handles failures.
There are three major LLM APIs, and they all work differently:
| Aspect | OpenAI | Anthropic | Gemini |
|---|---|---|---|
| Endpoint | Chat Completions | Messages | GenerateContent |
| Message format | {role, content, tool_calls} |
{role, content: [blocks]} |
{role, parts: [parts]} |
| System message | Regular message with role "system" | Separate system parameter |
SystemInstruction config |
| Tool calls | tool_calls array on message |
tool_use content block |
FunctionCall part |
| Tool results | Message with role "tool" | tool_result content block |
FunctionResponse part |
| Token counting | usage.prompt_tokens |
usage.input_tokens |
UsageMetadata.PromptTokenCount |
Without an abstraction, the runtime would need if openai ... else if anthropic ... else if gemini everywhere. That's unmaintainable.
type Provider interface {
Complete(ctx context.Context, c *Context) (*Response, error)
Stream(ctx context.Context, c *Context) <-chan StreamItem
ProviderID() ProviderID
ModelID() ModelID
MaxContextTokens() int
}Context (not to be confused with Go's context.Context) is the conversation:
type Context struct {
Messages []Message
Tools []FunctionDecl
Metadata map[string]string
}Response is what comes back:
type Response struct {
Message Message // The LLM's reply
FinishReason string // "stop", "tool_calls", etc.
Usage Usage // Token counts
}Every provider converts Context into its API format, makes the HTTP call, and converts the response back. The runtime never sees API-specific types.
SDK: github.com/openai/openai-go/v3
Message conversion (convertMessagesOpenAI):
system→ OpenAI system messageuser→ OpenAI user messageassistant→ OpenAI assistant message (if it has ToolCalls, they're converted to OpenAI'stool_callsformat)tool→ OpenAI tool message withtool_call_id
Tool conversion (convertToolsOpenAI):
FunctionDecl → OpenAI ChatCompletionTool{
Type: "function",
Function: {Name, Description, Parameters}
}
OpenAI's format is the simplest — Parameters goes straight through as JSON Schema.
Streaming:
OpenAI streams ChatCompletionChunkChoice objects. Each chunk can contain:
Delta.Content— text fragment →StreamTextDelta.ToolCalls— incremental tool call data →StreamToolCallDelta- Final chunk with
Usage→StreamDone
Default context window: 128,000 tokens
SDK: github.com/anthropics/anthropic-sdk-go
Key difference: Anthropic uses content blocks, not flat fields.
An Anthropic message's content is an array of typed blocks:
[
{"type": "text", "text": "Let me read that file."},
{"type": "tool_use", "id": "call_123", "name": "read_file", "input": {"path": "main.go"}}
]Message conversion (convertMessagesAnthropic):
- System messages are extracted and merged into Anthropic's separate
systemparameter user→ Anthropic user message with text content blockassistant→ Text block + tool_use blocks (if tool calls present)tool→tool_resultcontent block inside a user message, referencing the tool_use ID
Tool conversion (convertToolsAnthropic):
FunctionDecl → Anthropic Tool{
Name, Description,
InputSchema: Parameters (as JSON Schema)
}
Streaming: Anthropic streams events:
ContentBlockDeltawithTextDelta→StreamTextContentBlockDeltawithInputJSONDelta→StreamToolCallDelta(tool arguments arrive as JSON fragments)MessageDeltawithUsage→StreamDone
Default context window: 200,000 tokens
SDK: google.golang.org/genai
Key difference: Gemini uses Content with Part arrays and has its own schema format.
Message conversion (convertMessagesGemini):
- System messages go into
SystemInstruction(a config field, not a message) user→ Content with role "user" and Text partassistant→ Content with role "model" (Gemini calls it "model", not "assistant")tool→ Content with role "tool" and FunctionResponse part
Tool conversion — the tricky part:
Gemini doesn't accept raw JSON Schema. It has its own genai.Schema struct:
type Schema struct {
Type Type
Properties map[string]*Schema
Required []string
Description string
Enum []string
Items *Schema
}So Yantra has jsonSchemaToGeminiSchema() — a recursive converter that walks the JSON Schema and builds Gemini's native schema objects. It handles nested objects, arrays, enums, and required fields.
Streaming:
Gemini streams GenerateContentResponse objects. Each response's Candidates[0].Content.Parts can contain:
Textparts →StreamTextFunctionCallparts →StreamToolCallDelta- Final response with
UsageMetadata→StreamDone
Default context window: 1,000,000 tokens (Gemini has the largest context)
func Build(name string, entry ProviderRegistryEntry, model string) (Provider, error)The factory is the only place that knows about concrete provider types. It:
- Resolves the API key from environment variables
- Routes to the right constructor based on
ProviderType - Returns the provider behind the
Providerinterface
API key resolution chain:
1. Check entry.APIKeyEnv (explicit override in config)
2. Check provider-specific default:
- OpenAI → OPENAI_API_KEY
- Anthropic → ANTHROPIC_API_KEY
- Gemini → GEMINI_API_KEY
3. Check generic API_KEY fallback
4. Error if nothing found
Convenience wrapper:
func BuildFromConfig(cfg *YantraConfig) (Provider, error)Pulls cfg.Selection.Provider and cfg.Selection.Model, looks up the provider registry entry, and calls Build.
reliable := NewReliableProvider(inner, DefaultReliableConfig())The ReliableProvider decorates any Provider with automatic retries.
The isRetryable function checks several conditions:
// Retryable conditions:
- ProviderError with Retryable: true
- HTTP 429 (rate limited)
- HTTP 5xx (server error)
- Connection refused/reset/timeout
- Unexpected EOFNon-retryable:
- HTTP 400 (bad request — your fault, retrying won't help)
- HTTP 401/403 (auth error)
- Context cancelled (user cancelled)
Attempt 1: immediate
Attempt 2: 250ms (± jitter)
Attempt 3: 500ms (± jitter)
Attempt 4: 1000ms (± jitter) [if max_attempts > 3]
...
Cap: 2000ms
Exponential backoff: Each wait doubles from the base (250ms).
Jitter: A random component (±50%) prevents thundering herd. If 100 requests all hit a rate limit at the same time, you don't want them all retrying at exactly 250ms — they'd all hit the limit again. Jitter spreads them out.
Cap: Backoff never exceeds 2 seconds. Waiting longer than that usually means the problem isn't transient.
For Stream(), retries only happen during connection setup — before the first StreamItem arrives. Once streaming starts, a failure mid-stream is not retried because:
- The LLM has already started generating (retrying would restart generation)
- Partial results may have already been shown to the user
- The context may have changed (tool results appended)
Provider configuration lives in yantra.toml:
[selection]
provider = "openai"
model = "gpt-4o"
[providers.registry.openai]
provider_type = "openai"
api_key_env = "OPENAI_API_KEY"
max_context_tokens = 128000
[providers.registry.anthropic]
provider_type = "anthropic"
api_key_env = "ANTHROPIC_API_KEY"
max_context_tokens = 200000
[providers.registry.gemini]
provider_type = "gemini"
api_key_env = "GEMINI_API_KEY"
max_context_tokens = 1000000Switching providers is a one-line change:
provider = "anthropic"
model = "claude-sonnet-4-20250514"Custom endpoints (for proxies, Azure OpenAI, or self-hosted models):
[providers.registry.local]
provider_type = "openai"
base_url = "http://localhost:8080/v1"
api_key_env = "LOCAL_API_KEY"Any OpenAI-compatible API works with the OpenAI provider type — just set the base_url.