From 87f6b1c8e3f8bafd8c5a5ee3e0d2290e43e43f90 Mon Sep 17 00:00:00 2001
From: Djordje Lukic <djordje.lukic@docker.com>
Date: Fri, 5 Jun 2026 06:02:00 +0000
Subject: [PATCH 1/2] docs: add thinking/reasoning guide and expand provider
 thinking docs

- Add docs/guides/thinking/index.md: comprehensive cross-provider guide
  covering OpenAI (effort levels, adaptive, token floor), Anthropic
  (token budget, adaptive/effort-based, interleaved thinking, thinking
  display, task budget), Gemini 2.5 (token), Gemini 3 (level-based),
  Bedrock Claude (token + effort mapping, interleaved thinking), and
  xAI/Mistral.
- Expand AWS Bedrock docs: add Thinking Budget and Interleaved Thinking
  sections with effort-to-token mapping table and beta header behaviour.
- Expand OpenAI docs: replace one-liner with effort-level table, hidden
  reasoning token warning, and adaptive thinking example.
- Register thinking guide in nav.yml under Guides.
---
 docs/_data/nav.yml              |   2 +
 docs/guides/thinking/index.md   | 328 ++++++++++++++++++++++++++++++++
 docs/providers/bedrock/index.md |  52 ++++-
 docs/providers/openai/index.md  |  37 +++-
 4 files changed, 415 insertions(+), 4 deletions(-)
 create mode 100644 docs/guides/thinking/index.md
diff --git a/docs/_data/nav.yml b/docs/_data/nav.yml
index ce59a8490..5256bb387 100644
--- a/docs/_data/nav.yml
+++ b/docs/_data/nav.yml
@@ -144,6 +144,8 @@
   items:
     - title: Tips & Best Practices
       url: /guides/tips/
+    - title: Thinking / Reasoning
+      url: /guides/thinking/
     - title: Managing Secrets
       url: /guides/secrets/
     - title: Go SDK
diff --git a/docs/guides/thinking/index.md b/docs/guides/thinking/index.md
new file mode 100644
index 000000000..d8bb6e5fb
--- /dev/null
+++ b/docs/guides/thinking/index.md
@@ -0,0 +1,328 @@
+---
+title: "Thinking / Reasoning"
+description: "Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, and AWS Bedrock."
+permalink: /guides/thinking/
+---
+
+# Thinking / Reasoning
+
+_Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, and AWS Bedrock._
+
+## What Is Thinking?
+
+Several modern models support an extended reasoning phase that happens before they produce visible output. During this phase the model plans, evaluates options, and works through the problem — internally, not shown in the response by default. This typically improves accuracy on complex tasks like coding, math, and multi-step planning, at the cost of higher token usage and latency.
+
+docker-agent exposes this through a single `thinking_budget` field on any named model. The value format differs slightly by provider, but the semantics are the same: higher effort means more thorough reasoning.
+
+<div class="callout callout-info" markdown="1">
+<div class="callout-title">Think tool vs. thinking budget
+</div>
+  <p>The <a href="{{ '/tools/think/' | relative_url }}">think tool</a> is a scratchpad for models that lack native reasoning. If your model supports <code>thinking_budget</code>, you do not need the think tool.</p>
+</div>
+
+## Quick Reference
+
+| Provider       | Format     | Values                                                       | Default      |
+| -------------- | ---------- | ------------------------------------------------------------ | ------------ |
+| OpenAI         | string     | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`, `adaptive/<level>` | `medium`     |
+| Anthropic      | int or str | 1024–32768 tokens, or `adaptive`, `low`–`max`, `none`        | off          |
+| Gemini 2.5     | int        | `0` (off), `-1` (dynamic), or token count (max 24576 / 32768) | `-1` (dynamic)|
+| Gemini 3       | string     | `minimal`, `low`, `medium`, `high`                           | model-dependent |
+| AWS Bedrock    | int or str | 1024–32768 tokens (`minimal`–`max` mapped to tokens)         | off          |
+| xAI / Mistral  | string     | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`          | off          |
+
+## OpenAI
+
+OpenAI reasoning models (o-series, gpt-5, gpt-5-mini) use a string effort level that maps to their `reasoning_effort` API parameter.
+
+```yaml
+models:
+  gpt-thinker:
+    provider: openai
+    model: gpt-5-mini
+    thinking_budget: high   # minimal | low | medium | high | xhigh
+```
+
+**Effort levels:**
+
+| Level     | Description                                              |
+| --------- | -------------------------------------------------------- |
+| `none`    | Disable thinking (alias for `0`). Minimum output floor still applies. |
+| `minimal` | Fastest; lightest reasoning pass.                        |
+| `low`     | Quick reasoning for straightforward tasks.               |
+| `medium`  | Balanced default.                                        |
+| `high`    | More thorough; recommended for complex tasks.            |
+| `xhigh`   | Near-maximum effort; slower but most accurate.           |
+
+<div class="callout callout-warning" markdown="1">
+<div class="callout-title">Tokens and max_tokens
+</div>
+  <p>OpenAI reasoning models always reason internally — even with <code>thinking_budget: none</code> there are hidden reasoning tokens that count against <code>max_tokens</code>. docker-agent automatically raises the output-token floor so hidden reasoning cannot starve visible text output.</p>
+</div>
+
+### Adaptive thinking (OpenAI)
+
+Use `adaptive/<level>` to let the model decide when to apply extended reasoning and scale effort dynamically:
+
+```yaml
+models:
+  gpt-adaptive:
+    provider: openai
+    model: gpt-5-mini
+    thinking_budget: adaptive/medium  # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max
+```
+
+## Anthropic
+
+Anthropic Claude supports two thinking modes: a **token budget** (older models) and **adaptive / effort-based** thinking (newer models).
+
+### Token budget (Claude 4 and earlier)
+
+Set an explicit number of thinking tokens (1024–32768). This must be less than `max_tokens`:
+
+```yaml
+models:
+  claude-thinker:
+    provider: anthropic
+    model: claude-sonnet-4-5
+    thinking_budget: 16384   # tokens reserved for internal reasoning
+```
+
+docker-agent auto-adjusts `max_tokens` when you set a thinking budget but leave `max_tokens` at its default. If you set `max_tokens` explicitly, it must be greater than `thinking_budget`.
+
+### Adaptive thinking (Claude Opus 4.6+)
+
+Newer Claude models support adaptive thinking, where the model decides how much to think. Use `adaptive` or pair it with an effort level:
+
+```yaml
+models:
+  claude-adaptive:
+    provider: anthropic
+    model: claude-opus-4-6
+    thinking_budget: adaptive          # model decides effort
+
+  claude-adaptive-low:
+    provider: anthropic
+    model: claude-opus-4-6
+    thinking_budget: low               # adaptive with low effort: low | medium | high | max
+```
+
+**Adaptive effort levels:**
+
+| Level     | Description                                       |
+| --------- | ------------------------------------------------- |
+| `low`     | Minimal thinking; fastest adaptive mode.          |
+| `medium`  | Moderate effort.                                  |
+| `high`    | Thorough reasoning; default for `adaptive`.       |
+| `max`     | Maximum effort.                                   |
+
+### Disabling thinking
+
+```yaml
+thinking_budget: none   # or 0
+```
+
+### Interleaved thinking
+
+By default, thinking is non-interleaved (the model reasons once before answering). Enable interleaved thinking to allow reasoning during tool calls — useful for complex agentic tasks:
+
+```yaml
+models:
+  claude-interleaved:
+    provider: anthropic
+    model: claude-sonnet-4-5
+    thinking_budget: 16384
+    provider_opts:
+      interleaved_thinking: true
+```
+
+<div class="callout callout-info" markdown="1">
+<div class="callout-title">Temperature and top_p
+</div>
+  <p>When extended thinking is enabled, Anthropic requires <code>temperature=1.0</code>. docker-agent automatically suppresses any <code>temperature</code> or <code>top_p</code> settings you have configured — they are silently ignored while thinking is active.</p>
+</div>
+
+### Thinking display
+
+Claude Opus 4.7 hides thinking content by default. Use `thinking_display` in `provider_opts` to control what you receive:
+
+```yaml
+models:
+  opus-47:
+    provider: anthropic
+    model: claude-opus-4-7
+    thinking_budget: adaptive
+    provider_opts:
+      thinking_display: summarized   # summarized | display | omitted
+```
+
+| Value        | Behavior                                                                              |
+| ------------ | ------------------------------------------------------------------------------------- |
+| `summarized` | Thinking blocks returned with a text summary (default for Claude 4 models pre-4.7).  |
+| `display`    | Full thinking blocks returned for display.                                            |
+| `omitted`    | Thinking blocks hidden — only the signature is returned (default for Opus 4.7).       |
+
+Full thinking tokens are billed regardless of `thinking_display`.
+
+### Task budget (Anthropic)
+
+`task_budget` caps total tokens across an entire multi-step agentic task (thinking + tool calls + output combined):
+
+```yaml
+models:
+  opus-bounded:
+    provider: anthropic
+    model: claude-opus-4-7
+    thinking_budget: adaptive
+    task_budget: 128000   # total token ceiling for the whole task
+```
+
+See the [Anthropic provider page]({{ '/providers/anthropic/#task-budget' | relative_url }}) for details.
+
+## Google Gemini
+
+Gemini 2.5 and Gemini 3 use different formats.
+
+### Gemini 2.5 (token budget)
+
+```yaml
+models:
+  gemini-off:
+    provider: google
+    model: gemini-2.5-flash
+    thinking_budget: 0      # disable thinking
+
+  gemini-dynamic:
+    provider: google
+    model: gemini-2.5-flash
+    thinking_budget: -1     # let the model decide (default)
+
+  gemini-fixed:
+    provider: google
+    model: gemini-2.5-flash
+    thinking_budget: 8192   # fixed token budget (max 24576 for Flash, 32768 for Pro)
+```
+
+### Gemini 3 (level-based)
+
+```yaml
+models:
+  gemini3-flash:
+    provider: google
+    model: gemini-3-flash
+    thinking_budget: medium   # minimal | low | medium | high
+
+  gemini3-pro:
+    provider: google
+    model: gemini-3-pro
+    thinking_budget: high     # low | high (Pro supports fewer levels)
+```
+
+## AWS Bedrock (Claude)
+
+Bedrock Claude uses a token budget like Anthropic, but only supports integer token values. String effort levels (`minimal`–`max`) are mapped automatically:
+
+| Effort level | Token budget |
+| ------------ | ------------ |
+| `minimal`    | 1,024        |
+| `low`        | 2,048        |
+| `medium`     | 8,192        |
+| `high`       | 16,384       |
+| `xhigh`/`max`| 32,768       |
+
+```yaml
+models:
+  bedrock-claude-thinker:
+    provider: amazon-bedrock
+    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
+    thinking_budget: 8192   # or use an effort level: medium
+    provider_opts:
+      region: us-east-1
+
+  bedrock-claude-interleaved:
+    provider: amazon-bedrock
+    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
+    thinking_budget: high
+    provider_opts:
+      region: us-east-1
+      interleaved_thinking: true
+```
+
+<div class="callout callout-warning" markdown="1">
+<div class="callout-title">Bedrock thinking requirements
+</div>
+  <p>Bedrock Claude requires <code>thinking_budget</code> to be ≥ 1024 and less than <code>max_tokens</code>. docker-agent logs a warning and ignores the budget if either condition is violated. Interleaved thinking requires the <code>interleaved-thinking-2025-05-14</code> beta header, which docker-agent adds automatically when <code>interleaved_thinking: true</code> is set.</p>
+</div>
+
+## xAI (Grok) and Mistral
+
+Both providers use the OpenAI-compatible API and accept the same effort strings:
+
+```yaml
+models:
+  grok-thinker:
+    provider: xai
+    model: grok-3
+    thinking_budget: high   # minimal | low | medium | high | xhigh | none
+
+  mistral-thinker:
+    provider: mistral
+    model: mistral-large-latest
+    thinking_budget: medium  # minimal | low | medium | high | xhigh | none
+```
+
+Thinking is **off by default** for these providers. Set `thinking_budget` explicitly to enable it.
+
+## Disabling Thinking
+
+Use `none` or `0` to disable thinking on any provider:
+
+```yaml
+models:
+  fast-model:
+    provider: openai
+    model: gpt-5-mini
+    thinking_budget: none
+
+  gemini-no-think:
+    provider: google
+    model: gemini-2.5-flash
+    thinking_budget: 0
+```
+
+## Choosing an Effort Level
+
+| Task complexity                  | Recommended level       |
+| -------------------------------- | ----------------------- |
+| Simple factual Q&A               | `none` / `minimal`      |
+| General-purpose chat             | `low` / `medium`        |
+| Coding, debugging, analysis      | `medium` / `high`       |
+| Complex reasoning, planning      | `high` / `xhigh`        |
+| Research, difficult math/logic   | `xhigh` / `max`         |
+| Long agentic tasks (Anthropic)   | `adaptive`              |
+
+## Sharing Thinking Config Across Models
+
+Define a provider with a default `thinking_budget` and all models that reference it inherit it:
+
+```yaml
+providers:
+  deep-anthropic:
+    provider: anthropic
+    thinking_budget: high
+    max_tokens: 32768
+
+models:
+  claude-smart:
+    provider: deep-anthropic
+    model: claude-sonnet-4-5   # inherits thinking_budget: high
+
+  claude-faster:
+    provider: deep-anthropic
+    model: claude-haiku-4-5
+    thinking_budget: low       # overrides to low
+```
+
+## Full Example
+
+See [`examples/thinking_budget.yaml`](https://github.com/docker/docker-agent/blob/main/examples/thinking_budget.yaml) for a runnable config covering all providers.
diff --git a/docs/providers/bedrock/index.md b/docs/providers/bedrock/index.md
index 9a0beba81..1f9e61e2a 100644
--- a/docs/providers/bedrock/index.md
+++ b/docs/providers/bedrock/index.md
@@ -80,7 +80,7 @@ models:
 | `role_session_name`      | string | docker-agent-bedrock-session | Session name for assumed role        |
 | `external_id`            | string | —                      | External ID for role assumption      |
 | `endpoint_url`           | string | —                      | Custom endpoint (VPC/testing)        |
-| `interleaved_thinking`   | bool   | true                   | Reasoning during tool calls (Claude) |
+| `interleaved_thinking`   | bool   | false                  | Allow reasoning between tool calls (Claude); adds the required beta header automatically |
 | `disable_prompt_caching` | bool   | false                  | Disable automatic prompt caching     |
 
 ## Inference Profiles
@@ -101,6 +101,56 @@ Use inference profile prefixes for optimal routing:
 
 </div>
 
+## Thinking Budget (Claude on Bedrock)
+
+Bedrock Claude models support extended thinking — an internal reasoning phase before the model produces its response. Set `thinking_budget` to a token count (1024–32768) or an effort level string that maps automatically:
+
+| Effort level | Token budget |
+| ------------ | ------------ |
+| `minimal`    | 1,024        |
+| `low`        | 2,048        |
+| `medium`     | 8,192        |
+| `high`       | 16,384       |
+| `xhigh`/`max`| 32,768       |
+
+```yaml
+models:
+  bedrock-claude-thinking:
+    provider: amazon-bedrock
+    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
+    thinking_budget: 8192   # tokens, or use an effort string like "medium"
+    max_tokens: 16384       # must be > thinking_budget
+    provider_opts:
+      region: us-east-1
+```
+
+`thinking_budget` must be ≥ 1024 and less than `max_tokens`. Values outside this range are logged as a warning and ignored.
+
+<div class="callout callout-info" markdown="1">
+<div class="callout-title">Temperature and top_p
+</div>
+  <p>Bedrock Claude suppresses <code>temperature</code> and <code>top_p</code> while extended thinking is active — Anthropic requires <code>temperature=1.0</code> internally.</p>
+</div>
+
+## Interleaved Thinking (Claude on Bedrock)
+
+Interleaved thinking lets the model reason between tool calls, not just at the start. This is useful for complex agentic tasks. Enable it alongside a thinking budget:
+
+```yaml
+models:
+  bedrock-claude-interleaved:
+    provider: amazon-bedrock
+    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
+    thinking_budget: high
+    provider_opts:
+      region: us-east-1
+      interleaved_thinking: true
+```
+
+docker-agent automatically adds the `interleaved-thinking-2025-05-14` beta header when `interleaved_thinking: true` is set. If you set `interleaved_thinking: false` while a thinking budget is active, a warning is logged because the budget may be ignored by Bedrock without the beta header.
+
+See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for a full cross-provider overview.
+
 ## Prompt Caching
 
 Automatically enabled for supported models to reduce latency and costs. System prompts, tool definitions, and recent messages are cached with a 5-minute TTL.
diff --git a/docs/providers/openai/index.md b/docs/providers/openai/index.md
index 4c69a85b2..fbadfd282 100644
--- a/docs/providers/openai/index.md
+++ b/docs/providers/openai/index.md
@@ -49,16 +49,47 @@ Find more model names at [modelnames.ai](https://modelnames.ai/) or in the [offi
 
 ## Thinking Budget
 
-OpenAI uses effort level strings:
+OpenAI reasoning models (o-series, gpt-5, gpt-5-mini) support extended thinking through the `reasoning_effort` API parameter. Set `thinking_budget` to control the effort level:
 
 ```yaml
 models:
-  gpt-thinking:
+  gpt-thinker:
     provider: openai
     model: gpt-5-mini
-    thinking_budget: low # minimal | low | medium (default) | high | xhigh | max | none | adaptive/<level>
+    thinking_budget: high   # minimal | low | medium | high | xhigh
 ```
 
+**Effort levels:**
+
+| Level     | Description                                              |
+| --------- | -------------------------------------------------------- |
+| `none`    | Disable thinking (a minimum output floor still applies). |
+| `minimal` | Fastest; lightest reasoning pass.                        |
+| `low`     | Quick reasoning for straightforward tasks.               |
+| `medium`  | Balanced default.                                        |
+| `high`    | More thorough; recommended for complex tasks.            |
+| `xhigh`   | Near-maximum effort; slower but most accurate.           |
+
+<div class="callout callout-warning" markdown="1">
+<div class="callout-title">Hidden reasoning tokens
+</div>
+  <p>OpenAI reasoning models always produce hidden reasoning tokens that count against <code>max_tokens</code> — even with <code>thinking_budget: none</code>. docker-agent automatically raises the output-token floor so reasoning cannot starve visible text output.</p>
+</div>
+
+### Adaptive Thinking
+
+Use `adaptive/<level>` to let the model scale effort dynamically based on task complexity:
+
+```yaml
+models:
+  gpt-adaptive:
+    provider: openai
+    model: gpt-5-mini
+    thinking_budget: adaptive/medium   # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max
+```
+
+See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for a cross-provider overview.
+
 <div class="callout callout-tip" markdown="1">
 <div class="callout-title">Custom endpoints
 </div>

From afe921b8dccb1c9c8240763bf084e3fa0c674c64 Mon Sep 17 00:00:00 2001
From: Djordje Lukic <djordje.lukic@docker.com>
Date: Fri, 5 Jun 2026 06:17:47 +0000
Subject: [PATCH 2/2] docs: clarify that 'max' is only valid as adaptive/max
 for OpenAI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add a note in both the thinking guide and OpenAI provider page that
bare 'thinking_budget: max' is not accepted by OpenAI — max is only
valid inside the adaptive/ prefix (adaptive/max). ForOpenAI() in
pkg/effort/effort.go confirms: only minimal/low/medium/high/xhigh
are accepted as standalone levels.
---
 docs/guides/thinking/index.md  | 4 +++-
 docs/providers/openai/index.md | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/guides/thinking/index.md b/docs/guides/thinking/index.md
index d8bb6e5fb..244848b06 100644
--- a/docs/guides/thinking/index.md
+++ b/docs/guides/thinking/index.md
@@ -24,7 +24,7 @@ docker-agent exposes this through a single `thinking_budget` field on any named
 
 | Provider       | Format     | Values                                                       | Default      |
 | -------------- | ---------- | ------------------------------------------------------------ | ------------ |
-| OpenAI         | string     | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`, `adaptive/<level>` | `medium`     |
+| OpenAI         | string     | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`, `adaptive/<level>` (`max` only via `adaptive/max`) | `medium`     |
 | Anthropic      | int or str | 1024–32768 tokens, or `adaptive`, `low`–`max`, `none`        | off          |
 | Gemini 2.5     | int        | `0` (off), `-1` (dynamic), or token count (max 24576 / 32768) | `-1` (dynamic)|
 | Gemini 3       | string     | `minimal`, `low`, `medium`, `high`                           | model-dependent |
@@ -72,6 +72,8 @@ models:
     thinking_budget: adaptive/medium  # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max
 ```
 
+> `max` is only valid inside the `adaptive/` prefix — `thinking_budget: max` (bare) is not accepted by OpenAI.
+
 ## Anthropic
 
 Anthropic Claude supports two thinking modes: a **token budget** (older models) and **adaptive / effort-based** thinking (newer models).
diff --git a/docs/providers/openai/index.md b/docs/providers/openai/index.md
index fbadfd282..79d589d95 100644
--- a/docs/providers/openai/index.md
+++ b/docs/providers/openai/index.md
@@ -88,6 +88,8 @@ models:
     thinking_budget: adaptive/medium   # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max
 ```
 
+> `max` is only valid inside the `adaptive/` prefix — `thinking_budget: max` (bare) is not accepted by OpenAI.
+
 See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for a cross-provider overview.
 
 <div class="callout callout-tip" markdown="1">