Skip to content

Commit 0afecda

Browse files
sjarmakclaude
andcommitted
Fix benchmark source links in official results export
Links from docs/official_results/suites/ need ../../../benchmarks/ to reach the repo root, not ../../benchmarks/ which landed in docs/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 419b613 commit 0afecda

21 files changed

+706
-706
lines changed

docs/official_results/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This bundle is generated from `runs/official/` and includes only valid scored tasks (`passed`/`failed` with numeric reward).
44

5-
Generated: `2026-02-27T03:28:33.363516+00:00`
5+
Generated: `2026-02-27T03:35:06.443867+00:00`
66

77
## Local Browse
88

docs/official_results/data/official_results.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83042,7 +83042,7 @@
8304283042
"wall_clock_seconds": 175.571784
8304383043
}
8304483044
],
83045-
"generated_at": "2026-02-27T03:28:33.225158+00:00",
83045+
"generated_at": "2026-02-27T03:35:06.305042+00:00",
8304683046
"run_count": 124,
8304783047
"run_summaries": [
8304883048
{

docs/official_results/suites/ccb_build.md

Lines changed: 46 additions & 46 deletions
Large diffs are not rendered by default.

docs/official_results/suites/ccb_debug.md

Lines changed: 40 additions & 40 deletions
Large diffs are not rendered by default.

docs/official_results/suites/ccb_design.md

Lines changed: 47 additions & 47 deletions
Large diffs are not rendered by default.

docs/official_results/suites/ccb_document.md

Lines changed: 56 additions & 56 deletions
Large diffs are not rendered by default.

docs/official_results/suites/ccb_fix.md

Lines changed: 61 additions & 61 deletions
Large diffs are not rendered by default.

docs/official_results/suites/ccb_mcp_compliance.md

Lines changed: 34 additions & 34 deletions
Large diffs are not rendered by default.

docs/official_results/suites/ccb_mcp_crossorg.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -18,24 +18,24 @@
1818

1919
| Task | Benchmark | Config | Status | Reward | Runs | MCP Ratio |
2020
|---|---|---|---|---:|---:|---:|
21-
| [ccx-crossorg-061](../tasks/ccb_mcp_crossorg_haiku_022126--baseline--ccx-crossorg-061.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-061) | `baseline` | `passed` | 0.500 | 1 | 0.000 |
22-
| [ccx-crossorg-061](../tasks/ccb_mcp_crossorg_haiku_022126--mcp--ccx-crossorg-061.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-061) | `mcp` | `passed` | 1.000 | 1 | 0.889 |
23-
| [ccx-crossorg-062](../tasks/ccb_mcp_crossorg_haiku_20260226_205845--baseline-local-direct--ccx-crossorg-062.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `baseline-local-direct` | `passed` | 0.658 | 1 | 0.000 |
24-
| [mcp_CCX-crossorg-062_7tCLGe](../tasks/ccb_mcp_crossorg_haiku_20260226_035628_variance--mcp-remote-direct--mcp_CCX-crossorg-062_7tCLGe.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | `passed` | 0.680 | 4 | 0.986 |
25-
| [mcp_CCX-crossorg-062_CJrdeX](../tasks/ccb_mcp_crossorg_haiku_20260226_035622_variance--mcp-remote-direct--mcp_CCX-crossorg-062_CJrdeX.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | `passed` | 0.680 | 4 | 0.854 |
26-
| [mcp_CCX-crossorg-062_Dp7ADh](../tasks/ccb_mcp_crossorg_haiku_20260226_035633_variance--mcp-remote-direct--mcp_CCX-crossorg-062_Dp7ADh.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | `passed` | 0.711 | 4 | 0.929 |
27-
| [mcp_CCX-crossorg-062_XLX9KX](../tasks/ccb_mcp_crossorg_haiku_20260226_035617--mcp-remote-direct--mcp_CCX-crossorg-062_XLX9KX.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | `passed` | 0.800 | 4 | 0.987 |
28-
| [ccx-crossorg-066](../tasks/ccb_mcp_crossorg_haiku_022126--baseline--ccx-crossorg-066.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-066) | `baseline` | `passed` | 1.000 | 1 | 0.000 |
29-
| [ccx-crossorg-066](../tasks/ccb_mcp_crossorg_haiku_022126--mcp--ccx-crossorg-066.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-066) | `mcp` | `passed` | 1.000 | 1 | 0.857 |
30-
| [bl_CCX-crossorg-121_PDC0i6](../tasks/ccb_mcp_crossorg_haiku_20260225_011700--baseline-local-artifact--bl_CCX-crossorg-121_PDC0i6.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-121) | `baseline-local-artifact` | `failed` | 0.000 | 1 | 0.000 |
31-
| [mcp_CCX-crossorg-121_ZILlm2](../tasks/ccb_mcp_crossorg_haiku_20260224_181919--mcp-remote-artifact--mcp_CCX-crossorg-121_ZILlm2.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-121) | `mcp-remote-artifact` | `passed` | 0.343 | 1 | 0.944 |
32-
| [bl_CCX-crossorg-132_5p0UW6](../tasks/ccb_mcp_crossorg_haiku_20260225_011700--baseline-local-artifact--bl_CCX-crossorg-132_5p0UW6.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-132) | `baseline-local-artifact` | `passed` | 0.125 | 1 | 0.000 |
33-
| [mcp_CCX-crossorg-132_22a84e](../tasks/ccb_mcp_crossorg_haiku_20260224_181919--mcp-remote-artifact--mcp_CCX-crossorg-132_22a84e.md) | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-132) | `mcp-remote-artifact` | `failed` | 0.000 | 1 | 0.971 |
21+
| [ccx-crossorg-061](../tasks/ccb_mcp_crossorg_haiku_022126--baseline--ccx-crossorg-061.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-061) | `baseline` | `passed` | 0.500 | 1 | 0.000 |
22+
| [ccx-crossorg-061](../tasks/ccb_mcp_crossorg_haiku_022126--mcp--ccx-crossorg-061.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-061) | `mcp` | `passed` | 1.000 | 1 | 0.889 |
23+
| [ccx-crossorg-062](../tasks/ccb_mcp_crossorg_haiku_20260226_205845--baseline-local-direct--ccx-crossorg-062.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `baseline-local-direct` | `passed` | 0.658 | 1 | 0.000 |
24+
| [mcp_CCX-crossorg-062_7tCLGe](../tasks/ccb_mcp_crossorg_haiku_20260226_035628_variance--mcp-remote-direct--mcp_CCX-crossorg-062_7tCLGe.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | `passed` | 0.680 | 4 | 0.986 |
25+
| [mcp_CCX-crossorg-062_CJrdeX](../tasks/ccb_mcp_crossorg_haiku_20260226_035622_variance--mcp-remote-direct--mcp_CCX-crossorg-062_CJrdeX.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | `passed` | 0.680 | 4 | 0.854 |
26+
| [mcp_CCX-crossorg-062_Dp7ADh](../tasks/ccb_mcp_crossorg_haiku_20260226_035633_variance--mcp-remote-direct--mcp_CCX-crossorg-062_Dp7ADh.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | `passed` | 0.711 | 4 | 0.929 |
27+
| [mcp_CCX-crossorg-062_XLX9KX](../tasks/ccb_mcp_crossorg_haiku_20260226_035617--mcp-remote-direct--mcp_CCX-crossorg-062_XLX9KX.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | `passed` | 0.800 | 4 | 0.987 |
28+
| [ccx-crossorg-066](../tasks/ccb_mcp_crossorg_haiku_022126--baseline--ccx-crossorg-066.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-066) | `baseline` | `passed` | 1.000 | 1 | 0.000 |
29+
| [ccx-crossorg-066](../tasks/ccb_mcp_crossorg_haiku_022126--mcp--ccx-crossorg-066.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-066) | `mcp` | `passed` | 1.000 | 1 | 0.857 |
30+
| [bl_CCX-crossorg-121_PDC0i6](../tasks/ccb_mcp_crossorg_haiku_20260225_011700--baseline-local-artifact--bl_CCX-crossorg-121_PDC0i6.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-121) | `baseline-local-artifact` | `failed` | 0.000 | 1 | 0.000 |
31+
| [mcp_CCX-crossorg-121_ZILlm2](../tasks/ccb_mcp_crossorg_haiku_20260224_181919--mcp-remote-artifact--mcp_CCX-crossorg-121_ZILlm2.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-121) | `mcp-remote-artifact` | `passed` | 0.343 | 1 | 0.944 |
32+
| [bl_CCX-crossorg-132_5p0UW6](../tasks/ccb_mcp_crossorg_haiku_20260225_011700--baseline-local-artifact--bl_CCX-crossorg-132_5p0UW6.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-132) | `baseline-local-artifact` | `passed` | 0.125 | 1 | 0.000 |
33+
| [mcp_CCX-crossorg-132_22a84e](../tasks/ccb_mcp_crossorg_haiku_20260224_181919--mcp-remote-artifact--mcp_CCX-crossorg-132_22a84e.md) | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-132) | `mcp-remote-artifact` | `failed` | 0.000 | 1 | 0.971 |
3434

3535
## Multi-Run Variance
3636

3737
Tasks with multiple valid runs (1 task/config pairs).
3838

3939
| Task | Benchmark | Config | Runs | Mean | Std | Individual Rewards |
4040
|---|---|---|---:|---:|---:|---|
41-
| CCX-crossorg-062 | [source](../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | 4 | 0.718 | 0.057 | 0.800, 0.680, 0.680, 0.711 |
41+
| CCX-crossorg-062 | [source](../../../benchmarks/ccb_mcp_crossorg/ccx-crossorg-062) | `mcp-remote-direct` | 4 | 0.718 | 0.057 | 0.800, 0.680, 0.680, 0.711 |

docs/official_results/suites/ccb_mcp_crossrepo.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,11 @@
4646
| [mcp_CCX-dep-trace-102_6nybzq](../tasks/ccb_mcp_crossrepo_tracing_haiku_20260226_035617--mcp-remote-direct--mcp_CCX-dep-trace-102_6nybzq.md) || `mcp-remote-direct` | `passed` | 0.533 | 4 | 0.882 |
4747
| [mcp_CCX-dep-trace-102_xB3mHY](../tasks/ccb_mcp_crossrepo_tracing_haiku_20260226_035633_variance--mcp-remote-direct--mcp_CCX-dep-trace-102_xB3mHY.md) || `mcp-remote-direct` | `passed` | 0.467 | 4 | 0.909 |
4848
| [mcp_CCX-dep-trace-102_93NryZ](../tasks/ccb_mcp_crossrepo_tracing_haiku_20260226_035628_variance--mcp-remote-direct--mcp_CCX-dep-trace-102_93NryZ.md) || `mcp-remote-direct` | `passed` | 0.800 | 4 | 0.952 |
49-
| [ccx-dep-trace-106](../tasks/ccb_mcp_crossrepo_haiku_20260226_205845--baseline-local-direct--ccx-dep-trace-106.md) | [source](../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `baseline-local-direct` | `passed` | 0.867 | 1 | 0.000 |
50-
| [mcp_CCX-dep-trace-106_UE5RZ8](../tasks/ccb_mcp_crossrepo_haiku_20260226_035622_variance--mcp-remote-direct--mcp_CCX-dep-trace-106_UE5RZ8.md) | [source](../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | `passed` | 0.644 | 4 | 0.926 |
51-
| [mcp_CCX-dep-trace-106_nCV7RO](../tasks/ccb_mcp_crossrepo_haiku_20260226_035633_variance--mcp-remote-direct--mcp_CCX-dep-trace-106_nCV7RO.md) | [source](../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | `passed` | 0.850 | 4 | 0.960 |
52-
| [mcp_CCX-dep-trace-106_pKe0DJ](../tasks/ccb_mcp_crossrepo_haiku_20260226_035617--mcp-remote-direct--mcp_CCX-dep-trace-106_pKe0DJ.md) | [source](../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | `passed` | 0.767 | 4 | 0.957 |
53-
| [mcp_CCX-dep-trace-106_zMwdbQ](../tasks/ccb_mcp_crossrepo_haiku_20260226_035628_variance--mcp-remote-direct--mcp_CCX-dep-trace-106_zMwdbQ.md) | [source](../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | `passed` | 0.767 | 4 | 0.929 |
49+
| [ccx-dep-trace-106](../tasks/ccb_mcp_crossrepo_haiku_20260226_205845--baseline-local-direct--ccx-dep-trace-106.md) | [source](../../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `baseline-local-direct` | `passed` | 0.867 | 1 | 0.000 |
50+
| [mcp_CCX-dep-trace-106_UE5RZ8](../tasks/ccb_mcp_crossrepo_haiku_20260226_035622_variance--mcp-remote-direct--mcp_CCX-dep-trace-106_UE5RZ8.md) | [source](../../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | `passed` | 0.644 | 4 | 0.926 |
51+
| [mcp_CCX-dep-trace-106_nCV7RO](../tasks/ccb_mcp_crossrepo_haiku_20260226_035633_variance--mcp-remote-direct--mcp_CCX-dep-trace-106_nCV7RO.md) | [source](../../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | `passed` | 0.850 | 4 | 0.960 |
52+
| [mcp_CCX-dep-trace-106_pKe0DJ](../tasks/ccb_mcp_crossrepo_haiku_20260226_035617--mcp-remote-direct--mcp_CCX-dep-trace-106_pKe0DJ.md) | [source](../../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | `passed` | 0.767 | 4 | 0.957 |
53+
| [mcp_CCX-dep-trace-106_zMwdbQ](../tasks/ccb_mcp_crossrepo_haiku_20260226_035628_variance--mcp-remote-direct--mcp_CCX-dep-trace-106_zMwdbQ.md) | [source](../../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | `passed` | 0.767 | 4 | 0.929 |
5454
| [ccx-dep-trace-116](../tasks/ccb_mcp_crossrepo_tracing_haiku_20260226_214446--baseline-local-direct--ccx-dep-trace-116.md) || `baseline-local-direct` | `passed` | 0.571 | 1 | 0.000 |
5555
| [mcp_CCX-dep-trace-116_hutEUF](../tasks/ccb_mcp_crossrepo_tracing_haiku_20260226_221038--mcp-remote-direct--mcp_CCX-dep-trace-116_hutEUF.md) || `mcp-remote-direct` | `passed` | 0.800 | 1 | 0.950 |
5656
| [bl_CCX-dep-trace-123_2Fw9jl](../tasks/ccb_mcp_crossrepo_tracing_haiku_20260225_011700--baseline-local-artifact--bl_CCX-dep-trace-123_2Fw9jl.md) || `baseline-local-artifact` | `failed` | 0.000 | 1 | 0.000 |
@@ -64,4 +64,4 @@ Tasks with multiple valid runs (1 task/config pairs).
6464

6565
| Task | Benchmark | Config | Runs | Mean | Std | Individual Rewards |
6666
|---|---|---|---:|---:|---:|---|
67-
| CCX-dep-trace-106 | [source](../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | 4 | 0.757 | 0.085 | 0.644, 0.850, 0.767, 0.767 |
67+
| CCX-dep-trace-106 | [source](../../../benchmarks/ccb_mcp_crossrepo/ccx-dep-trace-106) | `mcp-remote-direct` | 4 | 0.757 | 0.085 | 0.644, 0.850, 0.767, 0.767 |

0 commit comments

Comments
 (0)