From 87f6b1c8e3f8bafd8c5a5ee3e0d2290e43e43f90 Mon Sep 17 00:00:00 2001 From: Djordje Lukic Date: Fri, 5 Jun 2026 06:02:00 +0000 Subject: [PATCH 1/2] docs: add thinking/reasoning guide and expand provider thinking docs - Add docs/guides/thinking/index.md: comprehensive cross-provider guide covering OpenAI (effort levels, adaptive, token floor), Anthropic (token budget, adaptive/effort-based, interleaved thinking, thinking display, task budget), Gemini 2.5 (token), Gemini 3 (level-based), Bedrock Claude (token + effort mapping, interleaved thinking), and xAI/Mistral. - Expand AWS Bedrock docs: add Thinking Budget and Interleaved Thinking sections with effort-to-token mapping table and beta header behaviour. - Expand OpenAI docs: replace one-liner with effort-level table, hidden reasoning token warning, and adaptive thinking example. - Register thinking guide in nav.yml under Guides. --- docs/_data/nav.yml | 2 + docs/guides/thinking/index.md | 328 ++++++++++++++++++++++++++++++++ docs/providers/bedrock/index.md | 52 ++++- docs/providers/openai/index.md | 37 +++- 4 files changed, 415 insertions(+), 4 deletions(-) create mode 100644 docs/guides/thinking/index.md diff --git a/docs/_data/nav.yml b/docs/_data/nav.yml index ce59a8490..5256bb387 100644 --- a/docs/_data/nav.yml +++ b/docs/_data/nav.yml @@ -144,6 +144,8 @@ items: - title: Tips & Best Practices url: /guides/tips/ + - title: Thinking / Reasoning + url: /guides/thinking/ - title: Managing Secrets url: /guides/secrets/ - title: Go SDK diff --git a/docs/guides/thinking/index.md b/docs/guides/thinking/index.md new file mode 100644 index 000000000..d8bb6e5fb --- /dev/null +++ b/docs/guides/thinking/index.md @@ -0,0 +1,328 @@ +--- +title: "Thinking / Reasoning" +description: "Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, and AWS Bedrock." +permalink: /guides/thinking/ +--- + +# Thinking / Reasoning + +_Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, and AWS Bedrock._ + +## What Is Thinking? + +Several modern models support an extended reasoning phase that happens before they produce visible output. During this phase the model plans, evaluates options, and works through the problem — internally, not shown in the response by default. This typically improves accuracy on complex tasks like coding, math, and multi-step planning, at the cost of higher token usage and latency. + +docker-agent exposes this through a single `thinking_budget` field on any named model. The value format differs slightly by provider, but the semantics are the same: higher effort means more thorough reasoning. + +
+
Think tool vs. thinking budget +
+

The think tool is a scratchpad for models that lack native reasoning. If your model supports thinking_budget, you do not need the think tool.

+
+ +## Quick Reference + +| Provider | Format | Values | Default | +| -------------- | ---------- | ------------------------------------------------------------ | ------------ | +| OpenAI | string | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`, `adaptive/` | `medium` | +| Anthropic | int or str | 1024–32768 tokens, or `adaptive`, `low`–`max`, `none` | off | +| Gemini 2.5 | int | `0` (off), `-1` (dynamic), or token count (max 24576 / 32768) | `-1` (dynamic)| +| Gemini 3 | string | `minimal`, `low`, `medium`, `high` | model-dependent | +| AWS Bedrock | int or str | 1024–32768 tokens (`minimal`–`max` mapped to tokens) | off | +| xAI / Mistral | string | `minimal`, `low`, `medium`, `high`, `xhigh`, `none` | off | + +## OpenAI + +OpenAI reasoning models (o-series, gpt-5, gpt-5-mini) use a string effort level that maps to their `reasoning_effort` API parameter. + +```yaml +models: + gpt-thinker: + provider: openai + model: gpt-5-mini + thinking_budget: high # minimal | low | medium | high | xhigh +``` + +**Effort levels:** + +| Level | Description | +| --------- | -------------------------------------------------------- | +| `none` | Disable thinking (alias for `0`). Minimum output floor still applies. | +| `minimal` | Fastest; lightest reasoning pass. | +| `low` | Quick reasoning for straightforward tasks. | +| `medium` | Balanced default. | +| `high` | More thorough; recommended for complex tasks. | +| `xhigh` | Near-maximum effort; slower but most accurate. | + +
+
Tokens and max_tokens +
+

OpenAI reasoning models always reason internally — even with thinking_budget: none there are hidden reasoning tokens that count against max_tokens. docker-agent automatically raises the output-token floor so hidden reasoning cannot starve visible text output.

+
+ +### Adaptive thinking (OpenAI) + +Use `adaptive/` to let the model decide when to apply extended reasoning and scale effort dynamically: + +```yaml +models: + gpt-adaptive: + provider: openai + model: gpt-5-mini + thinking_budget: adaptive/medium # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max +``` + +## Anthropic + +Anthropic Claude supports two thinking modes: a **token budget** (older models) and **adaptive / effort-based** thinking (newer models). + +### Token budget (Claude 4 and earlier) + +Set an explicit number of thinking tokens (1024–32768). This must be less than `max_tokens`: + +```yaml +models: + claude-thinker: + provider: anthropic + model: claude-sonnet-4-5 + thinking_budget: 16384 # tokens reserved for internal reasoning +``` + +docker-agent auto-adjusts `max_tokens` when you set a thinking budget but leave `max_tokens` at its default. If you set `max_tokens` explicitly, it must be greater than `thinking_budget`. + +### Adaptive thinking (Claude Opus 4.6+) + +Newer Claude models support adaptive thinking, where the model decides how much to think. Use `adaptive` or pair it with an effort level: + +```yaml +models: + claude-adaptive: + provider: anthropic + model: claude-opus-4-6 + thinking_budget: adaptive # model decides effort + + claude-adaptive-low: + provider: anthropic + model: claude-opus-4-6 + thinking_budget: low # adaptive with low effort: low | medium | high | max +``` + +**Adaptive effort levels:** + +| Level | Description | +| --------- | ------------------------------------------------- | +| `low` | Minimal thinking; fastest adaptive mode. | +| `medium` | Moderate effort. | +| `high` | Thorough reasoning; default for `adaptive`. | +| `max` | Maximum effort. | + +### Disabling thinking + +```yaml +thinking_budget: none # or 0 +``` + +### Interleaved thinking + +By default, thinking is non-interleaved (the model reasons once before answering). Enable interleaved thinking to allow reasoning during tool calls — useful for complex agentic tasks: + +```yaml +models: + claude-interleaved: + provider: anthropic + model: claude-sonnet-4-5 + thinking_budget: 16384 + provider_opts: + interleaved_thinking: true +``` + +
+
Temperature and top_p +
+

When extended thinking is enabled, Anthropic requires temperature=1.0. docker-agent automatically suppresses any temperature or top_p settings you have configured — they are silently ignored while thinking is active.

+
+ +### Thinking display + +Claude Opus 4.7 hides thinking content by default. Use `thinking_display` in `provider_opts` to control what you receive: + +```yaml +models: + opus-47: + provider: anthropic + model: claude-opus-4-7 + thinking_budget: adaptive + provider_opts: + thinking_display: summarized # summarized | display | omitted +``` + +| Value | Behavior | +| ------------ | ------------------------------------------------------------------------------------- | +| `summarized` | Thinking blocks returned with a text summary (default for Claude 4 models pre-4.7). | +| `display` | Full thinking blocks returned for display. | +| `omitted` | Thinking blocks hidden — only the signature is returned (default for Opus 4.7). | + +Full thinking tokens are billed regardless of `thinking_display`. + +### Task budget (Anthropic) + +`task_budget` caps total tokens across an entire multi-step agentic task (thinking + tool calls + output combined): + +```yaml +models: + opus-bounded: + provider: anthropic + model: claude-opus-4-7 + thinking_budget: adaptive + task_budget: 128000 # total token ceiling for the whole task +``` + +See the [Anthropic provider page]({{ '/providers/anthropic/#task-budget' | relative_url }}) for details. + +## Google Gemini + +Gemini 2.5 and Gemini 3 use different formats. + +### Gemini 2.5 (token budget) + +```yaml +models: + gemini-off: + provider: google + model: gemini-2.5-flash + thinking_budget: 0 # disable thinking + + gemini-dynamic: + provider: google + model: gemini-2.5-flash + thinking_budget: -1 # let the model decide (default) + + gemini-fixed: + provider: google + model: gemini-2.5-flash + thinking_budget: 8192 # fixed token budget (max 24576 for Flash, 32768 for Pro) +``` + +### Gemini 3 (level-based) + +```yaml +models: + gemini3-flash: + provider: google + model: gemini-3-flash + thinking_budget: medium # minimal | low | medium | high + + gemini3-pro: + provider: google + model: gemini-3-pro + thinking_budget: high # low | high (Pro supports fewer levels) +``` + +## AWS Bedrock (Claude) + +Bedrock Claude uses a token budget like Anthropic, but only supports integer token values. String effort levels (`minimal`–`max`) are mapped automatically: + +| Effort level | Token budget | +| ------------ | ------------ | +| `minimal` | 1,024 | +| `low` | 2,048 | +| `medium` | 8,192 | +| `high` | 16,384 | +| `xhigh`/`max`| 32,768 | + +```yaml +models: + bedrock-claude-thinker: + provider: amazon-bedrock + model: global.anthropic.claude-sonnet-4-5-20250929-v1:0 + thinking_budget: 8192 # or use an effort level: medium + provider_opts: + region: us-east-1 + + bedrock-claude-interleaved: + provider: amazon-bedrock + model: global.anthropic.claude-sonnet-4-5-20250929-v1:0 + thinking_budget: high + provider_opts: + region: us-east-1 + interleaved_thinking: true +``` + +
+
Bedrock thinking requirements +
+

Bedrock Claude requires thinking_budget to be ≥ 1024 and less than max_tokens. docker-agent logs a warning and ignores the budget if either condition is violated. Interleaved thinking requires the interleaved-thinking-2025-05-14 beta header, which docker-agent adds automatically when interleaved_thinking: true is set.

+
+ +## xAI (Grok) and Mistral + +Both providers use the OpenAI-compatible API and accept the same effort strings: + +```yaml +models: + grok-thinker: + provider: xai + model: grok-3 + thinking_budget: high # minimal | low | medium | high | xhigh | none + + mistral-thinker: + provider: mistral + model: mistral-large-latest + thinking_budget: medium # minimal | low | medium | high | xhigh | none +``` + +Thinking is **off by default** for these providers. Set `thinking_budget` explicitly to enable it. + +## Disabling Thinking + +Use `none` or `0` to disable thinking on any provider: + +```yaml +models: + fast-model: + provider: openai + model: gpt-5-mini + thinking_budget: none + + gemini-no-think: + provider: google + model: gemini-2.5-flash + thinking_budget: 0 +``` + +## Choosing an Effort Level + +| Task complexity | Recommended level | +| -------------------------------- | ----------------------- | +| Simple factual Q&A | `none` / `minimal` | +| General-purpose chat | `low` / `medium` | +| Coding, debugging, analysis | `medium` / `high` | +| Complex reasoning, planning | `high` / `xhigh` | +| Research, difficult math/logic | `xhigh` / `max` | +| Long agentic tasks (Anthropic) | `adaptive` | + +## Sharing Thinking Config Across Models + +Define a provider with a default `thinking_budget` and all models that reference it inherit it: + +```yaml +providers: + deep-anthropic: + provider: anthropic + thinking_budget: high + max_tokens: 32768 + +models: + claude-smart: + provider: deep-anthropic + model: claude-sonnet-4-5 # inherits thinking_budget: high + + claude-faster: + provider: deep-anthropic + model: claude-haiku-4-5 + thinking_budget: low # overrides to low +``` + +## Full Example + +See [`examples/thinking_budget.yaml`](https://github.com/docker/docker-agent/blob/main/examples/thinking_budget.yaml) for a runnable config covering all providers. diff --git a/docs/providers/bedrock/index.md b/docs/providers/bedrock/index.md index 9a0beba81..1f9e61e2a 100644 --- a/docs/providers/bedrock/index.md +++ b/docs/providers/bedrock/index.md @@ -80,7 +80,7 @@ models: | `role_session_name` | string | docker-agent-bedrock-session | Session name for assumed role | | `external_id` | string | — | External ID for role assumption | | `endpoint_url` | string | — | Custom endpoint (VPC/testing) | -| `interleaved_thinking` | bool | true | Reasoning during tool calls (Claude) | +| `interleaved_thinking` | bool | false | Allow reasoning between tool calls (Claude); adds the required beta header automatically | | `disable_prompt_caching` | bool | false | Disable automatic prompt caching | ## Inference Profiles @@ -101,6 +101,56 @@ Use inference profile prefixes for optimal routing: +## Thinking Budget (Claude on Bedrock) + +Bedrock Claude models support extended thinking — an internal reasoning phase before the model produces its response. Set `thinking_budget` to a token count (1024–32768) or an effort level string that maps automatically: + +| Effort level | Token budget | +| ------------ | ------------ | +| `minimal` | 1,024 | +| `low` | 2,048 | +| `medium` | 8,192 | +| `high` | 16,384 | +| `xhigh`/`max`| 32,768 | + +```yaml +models: + bedrock-claude-thinking: + provider: amazon-bedrock + model: global.anthropic.claude-sonnet-4-5-20250929-v1:0 + thinking_budget: 8192 # tokens, or use an effort string like "medium" + max_tokens: 16384 # must be > thinking_budget + provider_opts: + region: us-east-1 +``` + +`thinking_budget` must be ≥ 1024 and less than `max_tokens`. Values outside this range are logged as a warning and ignored. + +
+
Temperature and top_p +
+

Bedrock Claude suppresses temperature and top_p while extended thinking is active — Anthropic requires temperature=1.0 internally.

+
+ +## Interleaved Thinking (Claude on Bedrock) + +Interleaved thinking lets the model reason between tool calls, not just at the start. This is useful for complex agentic tasks. Enable it alongside a thinking budget: + +```yaml +models: + bedrock-claude-interleaved: + provider: amazon-bedrock + model: global.anthropic.claude-sonnet-4-5-20250929-v1:0 + thinking_budget: high + provider_opts: + region: us-east-1 + interleaved_thinking: true +``` + +docker-agent automatically adds the `interleaved-thinking-2025-05-14` beta header when `interleaved_thinking: true` is set. If you set `interleaved_thinking: false` while a thinking budget is active, a warning is logged because the budget may be ignored by Bedrock without the beta header. + +See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for a full cross-provider overview. + ## Prompt Caching Automatically enabled for supported models to reduce latency and costs. System prompts, tool definitions, and recent messages are cached with a 5-minute TTL. diff --git a/docs/providers/openai/index.md b/docs/providers/openai/index.md index 4c69a85b2..fbadfd282 100644 --- a/docs/providers/openai/index.md +++ b/docs/providers/openai/index.md @@ -49,16 +49,47 @@ Find more model names at [modelnames.ai](https://modelnames.ai/) or in the [offi ## Thinking Budget -OpenAI uses effort level strings: +OpenAI reasoning models (o-series, gpt-5, gpt-5-mini) support extended thinking through the `reasoning_effort` API parameter. Set `thinking_budget` to control the effort level: ```yaml models: - gpt-thinking: + gpt-thinker: provider: openai model: gpt-5-mini - thinking_budget: low # minimal | low | medium (default) | high | xhigh | max | none | adaptive/ + thinking_budget: high # minimal | low | medium | high | xhigh ``` +**Effort levels:** + +| Level | Description | +| --------- | -------------------------------------------------------- | +| `none` | Disable thinking (a minimum output floor still applies). | +| `minimal` | Fastest; lightest reasoning pass. | +| `low` | Quick reasoning for straightforward tasks. | +| `medium` | Balanced default. | +| `high` | More thorough; recommended for complex tasks. | +| `xhigh` | Near-maximum effort; slower but most accurate. | + +
+
Hidden reasoning tokens +
+

OpenAI reasoning models always produce hidden reasoning tokens that count against max_tokens — even with thinking_budget: none. docker-agent automatically raises the output-token floor so reasoning cannot starve visible text output.

+
+ +### Adaptive Thinking + +Use `adaptive/` to let the model scale effort dynamically based on task complexity: + +```yaml +models: + gpt-adaptive: + provider: openai + model: gpt-5-mini + thinking_budget: adaptive/medium # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max +``` + +See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for a cross-provider overview. +
Custom endpoints
From afe921b8dccb1c9c8240763bf084e3fa0c674c64 Mon Sep 17 00:00:00 2001 From: Djordje Lukic Date: Fri, 5 Jun 2026 06:17:47 +0000 Subject: [PATCH 2/2] docs: clarify that 'max' is only valid as adaptive/max for OpenAI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a note in both the thinking guide and OpenAI provider page that bare 'thinking_budget: max' is not accepted by OpenAI — max is only valid inside the adaptive/ prefix (adaptive/max). ForOpenAI() in pkg/effort/effort.go confirms: only minimal/low/medium/high/xhigh are accepted as standalone levels. --- docs/guides/thinking/index.md | 4 +++- docs/providers/openai/index.md | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/guides/thinking/index.md b/docs/guides/thinking/index.md index d8bb6e5fb..244848b06 100644 --- a/docs/guides/thinking/index.md +++ b/docs/guides/thinking/index.md @@ -24,7 +24,7 @@ docker-agent exposes this through a single `thinking_budget` field on any named | Provider | Format | Values | Default | | -------------- | ---------- | ------------------------------------------------------------ | ------------ | -| OpenAI | string | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`, `adaptive/` | `medium` | +| OpenAI | string | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`, `adaptive/` (`max` only via `adaptive/max`) | `medium` | | Anthropic | int or str | 1024–32768 tokens, or `adaptive`, `low`–`max`, `none` | off | | Gemini 2.5 | int | `0` (off), `-1` (dynamic), or token count (max 24576 / 32768) | `-1` (dynamic)| | Gemini 3 | string | `minimal`, `low`, `medium`, `high` | model-dependent | @@ -72,6 +72,8 @@ models: thinking_budget: adaptive/medium # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max ``` +> `max` is only valid inside the `adaptive/` prefix — `thinking_budget: max` (bare) is not accepted by OpenAI. + ## Anthropic Anthropic Claude supports two thinking modes: a **token budget** (older models) and **adaptive / effort-based** thinking (newer models). diff --git a/docs/providers/openai/index.md b/docs/providers/openai/index.md index fbadfd282..79d589d95 100644 --- a/docs/providers/openai/index.md +++ b/docs/providers/openai/index.md @@ -88,6 +88,8 @@ models: thinking_budget: adaptive/medium # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max ``` +> `max` is only valid inside the `adaptive/` prefix — `thinking_budget: max` (bare) is not accepted by OpenAI. + See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for a cross-provider overview.