[aw-failures] Two daily workflows fail at agent start: Documentation Healer (effort-param 400) & Model Inventory Checker (BYOK auth)

### Recommendation

Two daily scheduled workflows fail **at agent start, before any work**, due to engine/model configuration — fix both: drop the unsupported `effort` parameter for Documentation Healer, and restore BYOK auth token plumbing for Model Inventory Checker. Neither is a network/firewall issue (`audit-diff` shows 0 new domains, 0 anomalies vs the last green run).

This sub-issue root-causes the shallow auto-notifier issues #37010 and #37014.

### Cluster A — Daily Documentation Healer: `effort` parameter rejected (fresh regression)

**Fix:** remove or guard the `effort` parameter for the Claude `small-agent` model (or select a model that supports `effort`).

- Affected run: [26986947133](https://github.com/github/gh-aw/actions/runs/26986947133) (schedule, 2026-06-05T00:01Z) · notifier issue #37010
- Regression signal: 7 consecutive prior days `success`, **first failure today** — `failure success success success success success success success`.
- Dominant error (all 4 attempts, ~0s each):
```
API Error: 400 This model does not support the effort parameter.
```
- Engine = Claude Code, model = `small-agent`. The `--continue` retry path then hit `Error: No deferred tool marker found in the resumed session` (`isNoDeferredMarkerError=true`) and `--continue` was disabled permanently. Agent produced **0 real turns / 0 tokens**.
- `audit-diff` (base success [26921451398] vs failure 26986947133): `new_domain_count=0`, `anomaly_count=0`, `run2_turns=1` — regression is isolated to the agent engine invocation, not networking.
- Failure class: **config-error**.

### Cluster B — Daily Model Inventory Checker: BYOK auth missing (persistent regression)

**Fix:** restore BYOK token plumbing so the Copilot SDK driver receives valid auth (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / `GITHUB_TOKEN` present in the agent env).

- Affected run: [26987484745](https://github.com/github/gh-aw/actions/runs/26987484745) (schedule, 2026-06-05T00:16Z) · notifier issue #37014
- Regression signal: **2 consecutive days failing**, green before — `failure failure success success success success success success`.
- Dominant error:
```
[Error: Execution failed: Error: Session was not created with authentication info or custom provider]
```
- The Copilot SDK BYOK driver sample (`.github/drivers/copilot_sdk_driver_sample_node.cjs`) threw an uncaught promise rejection on attempt 1. Harness flagged `isAuthError=true`: "no authentication information found — not retrying (COPILOT_GITHUB_TOKEN, GH_TOKEN, and GITHUB_TOKEN are all absent or invalid)". The entrypoint **unset `COPILOT_GITHUB_TOKEN` and `COPILOT_PROVIDER_API_KEY`** just before the driver spawned.
- All upstream `collect_*` setup jobs succeeded; failure isolated to the agent driver; `detection`/`safe_outputs` skipped. **0 tokens**.
- Failure class: **config-error**.

### Affected workflows and run IDs

| Cluster | Workflow | Run | Notifier issue | Failure class |
| --- | --- | --- | --- | --- |
| A | Daily Documentation Healer | 26986947133 | #37010 | config-error (effort param) |
| B | Daily Model Inventory Checker | 26987484745 | #37014 | config-error (BYOK auth) |

### Success criteria / verification

- **Cluster A:** Documentation Healer agent phase executes (>0 turns) and the run completes `success`; no `400 ... effort parameter` error.
- **Cluster B:** Model Inventory Checker driver creates a session; no `Session was not created with authentication info` error; auth tokens present in the agent env.
- Both workflows green for at least 2 consecutive scheduled runs.

---
Parent: #37005 · Root-causes #37010, #37014 · Analyzed run IDs: 26986947133, 26987484745, 26921451398 · Window: last 6h ending ~2026-06-05T01:34Z.
Related to #37005







> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/26989851467) · opus48 20.2M · 1.3K AIC · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)
> - [x] expires  on Jun 12, 2026, 2:35 AM UTC






---

### Update 2026-06-05 ~13:46Z — Cluster A (`effort` param) scope expansion: now hits the **`agent`/default variant**, third workflow affected

Fresh evidence from the last-6h sweep shows the `400 ... effort parameter` failure is **not** specific to the Claude `small-agent` model as originally scoped. It also fired on the **default `agent` variant**, on a workflow not previously listed here.

#### New affected run

1. **Daily Go Function Namer** — run [27014847510](https://github.com/github/gh-aw/actions/runs/27014847510) (schedule, 2026-06-05T12:28Z).
2. Engine = `claude`; experiment `model_size` selected variant **`agent`** (confirmed: `ANTHROPIC_MODEL: agent`, `experiment.model_size=agent`) — i.e. the **large/default** group (`sonnet-6x, gpt-5.4, gpt-5.3, gemini-pro, any`), **not** `small-agent`.
3. Dominant error on **all 3 retry attempts** (~0.4s each), identical to Cluster A:
```
API Error: 400 This model does not support the effort parameter.
```
4. Agent produced **1 turn / 0 tokens / $0**; `--continue` retry path again hit the no-deferred-marker condition and was disabled permanently (same harness path as Documentation Healer).
5. `audit-diff` (base last-green [26951925565] vs failure 27014847510): `new_domain_count=0`, `status_change_count=0`, `anomaly_count=0`, `has_anomalies=false` — regression isolated to the agent engine invocation, **not networking/firewall**.
6. Regression signal: 7 consecutive prior green days, **first failure today** — matches Documentation Healer's onset pattern (first failure 2026-06-05T00:01Z).

#### Implication for the fix

Because both the `small-agent` (Documentation Healer) **and** the `agent` (Daily Go Function Namer) variants fail with the same 400, the defect is **not per-workflow / per-variant configuration**. The `effort` parameter is being attached to the Claude engine request for models that reject it, regardless of model-size variant. The fix should **guard `effort` at the engine/token-steering layer** (omit it whenever the resolved concrete model does not advertise `effort` support) rather than editing individual workflows.

#### Updated affected-workflow table (Cluster A)

| Workflow | Variant | Run | First failure |
| --- | --- | --- | --- |
| Daily Documentation Healer | small-agent | [26986947133](https://github.com/github/gh-aw/actions/runs/26986947133) | 2026-06-05T00:01Z |
| Daily Go Function Namer | **agent** | [27014847510](https://github.com/github/gh-aw/actions/runs/27014847510) | 2026-06-05T12:28Z |

#### Updated success criteria (Cluster A)

- No `400 ... effort parameter` error on **any** model-size variant (`agent` and `small-agent`).
- Documentation Healer **and** Daily Go Function Namer each green for ≥2 consecutive scheduled runs.

<details>
<summary>Other failures in this 6h window (no action — for the record)</summary>

- **Test Quality Sentinel** run [27013336594](https://github.com/github/gh-aw/actions/runs/27013336594) (pull_request, 11:54Z): Copilot CLI 15-minute execution timeout after 50 turns / ~1.5M tokens on PR branch `copilot/aw-compat-fix-codemod-issue`. Assessed **one-off** (11 surrounding TQS runs succeeded; failure class = execution-timeout on a large PR), not a systemic cluster — no issue filed.
- **CGO** ([27013363820](https://github.com/github/gh-aw/actions/runs/27013363820)) and **CJS** ([27013363789](https://github.com/github/gh-aw/actions/runs/27013363789)): non-agentic compile/build CI checks — out of scope for agentic-failure tracking.

</details>

---
Analyzed run IDs: 27014847510, 26951925565, 27013336594 · Window: last 6h ending ~2026-06-05T13:46Z.

> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/27018483900) · 272.5 AIC · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aw-failures] Two daily workflows fail at agent start: Documentation Healer (effort-param 400) & Model Inventory Checker (BYOK auth) #37039

Recommendation

Cluster A — Daily Documentation Healer: `effort` parameter rejected (fresh regression)

Cluster B — Daily Model Inventory Checker: BYOK auth missing (persistent regression)

Affected workflows and run IDs

Success criteria / verification

Update 2026-06-05 ~13:46Z — Cluster A (`effort` param) scope expansion: now hits the `agent`/default variant, third workflow affected

New affected run

Implication for the fix

Updated affected-workflow table (Cluster A)

Updated success criteria (Cluster A)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cluster	Workflow	Run	Notifier issue	Failure class
A	Daily Documentation Healer	26986947133	#37010	config-error (effort param)
B	Daily Model Inventory Checker	26987484745	#37014	config-error (BYOK auth)

Workflow	Variant	Run	First failure
Daily Documentation Healer	small-agent	26986947133	2026-06-05T00:01Z
Daily Go Function Namer	agent	27014847510	2026-06-05T12:28Z

[aw-failures] Two daily workflows fail at agent start: Documentation Healer (effort-param 400) & Model Inventory Checker (BYOK auth) #37039

Description

Recommendation

Cluster A — Daily Documentation Healer: effort parameter rejected (fresh regression)

Cluster B — Daily Model Inventory Checker: BYOK auth missing (persistent regression)

Affected workflows and run IDs

Success criteria / verification

Update 2026-06-05 ~13:46Z — Cluster A (effort param) scope expansion: now hits the agent/default variant, third workflow affected

New affected run

Implication for the fix

Updated affected-workflow table (Cluster A)

Updated success criteria (Cluster A)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Cluster A — Daily Documentation Healer: `effort` parameter rejected (fresh regression)

Update 2026-06-05 ~13:46Z — Cluster A (`effort` param) scope expansion: now hits the `agent`/default variant, third workflow affected