You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: nightly research report 2026-03-21 (report #15)
Key new findings:
- Confirmed 2 broken abc_audit quality-gate checks via pytest (T5 solution
leak + R2 contamination both return PASS for bad tasks; duplicate defs)
- Cost pipeline never passes model ID to calculate_cost_from_tokens;
all costs silently calculated at Opus-4.5 rates (Sonnet runs: 5x overstatement)
- claude-sonnet-4-6 and claude-haiku-4-6 absent from MODEL_PRICING table
- Zero CI workflows run pytest — 2 confirmed test failures invisible to CI
- Hardcoded Feb-2026 staging run IDs in verify_retrieval_eval_smoke.py
- Non-atomic credential write in daytona_runner.py:234
-Deprecated model in shell: `rerun_fixed_tasks.sh:34`, `rerun_zero_mcp_tasks.sh:29`; add grep CI check.
158
+
-`run_selected_tasks.sh:648,699,711`: mktemp+mv race — `mv` failure swallowed by subshell.
159
+
-**Cost pipeline**: `extract_task_metrics.py:266`, `discovery.py:310` never pass model → all costs at Opus-4.5 rates. `claude-sonnet-4-6` + `claude-haiku-4-6` absent from `MODEL_PRICING` (`extractors.py:1071`). Sonnet runs: 5× overstatement.
160
+
-**CI test gap**: 212 tests / 2 confirmed failing / none of the 4 CI workflows run `pytest`.
161
+
-`verify_retrieval_eval_smoke.py:26-30`: 5 hardcoded Feb-2026 run IDs; smoke test breaks if staging rotated.
164
162
165
163
## Maintenance
166
164
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
-Deprecated model in shell: `rerun_fixed_tasks.sh:34`, `rerun_zero_mcp_tasks.sh:29`; add grep CI check.
158
+
-`run_selected_tasks.sh:648,699,711`: mktemp+mv race — `mv` failure swallowed by subshell.
159
+
-**Cost pipeline**: `extract_task_metrics.py:266`, `discovery.py:310` never pass model → all costs at Opus-4.5 rates. `claude-sonnet-4-6` + `claude-haiku-4-6` absent from `MODEL_PRICING` (`extractors.py:1071`). Sonnet runs: 5× overstatement.
160
+
-**CI test gap**: 212 tests / 2 confirmed failing / none of the 4 CI workflows run `pytest`.
161
+
-`verify_retrieval_eval_smoke.py:26-30`: 5 hardcoded Feb-2026 run IDs; smoke test breaks if staging rotated.
164
162
165
163
## Maintenance
166
164
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
-Deprecated model in shell: `rerun_fixed_tasks.sh:34`, `rerun_zero_mcp_tasks.sh:29`; add grep CI check.
158
+
-`run_selected_tasks.sh:648,699,711`: mktemp+mv race — `mv` failure swallowed by subshell.
159
+
-**Cost pipeline**: `extract_task_metrics.py:266`, `discovery.py:310` never pass model → all costs at Opus-4.5 rates. `claude-sonnet-4-6` + `claude-haiku-4-6` absent from `MODEL_PRICING` (`extractors.py:1071`). Sonnet runs: 5× overstatement.
160
+
-**CI test gap**: 212 tests / 2 confirmed failing / none of the 4 CI workflows run `pytest`.
161
+
-`verify_retrieval_eval_smoke.py:26-30`: 5 hardcoded Feb-2026 run IDs; smoke test breaks if staging rotated.
164
162
165
163
## Maintenance
166
164
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
0 commit comments