Skip to content

Commit b9fee16

Browse files
sjarmakclaude
andcommitted
fix: ABC audit fixes for MCP-unique suites — test.sh and R.2 false positive
- Add missing tests/test.sh to 4 pre-existing MCP-unique tasks that were missing the Harbor compatibility wrapper: ccx-compliance-057, ccx-platform-091, ccx-onboard-042, ccx-onboard-050 - Fix abc_audit.py R.2 check to skip for ccb_mcp_* suites; instruction.md referencing Sourcegraph MCP tools is correct behavior for MCP-unique tasks, not contamination - All 11 ccb_mcp_* suites now grade A (were D/F) with 0 critical failures - 10/10 smoke tests pass across all newly expanded suites (Docker build + verifier) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent f6a59cd commit b9fee16

File tree

5 files changed

+45
-0
lines changed

5 files changed

+45
-0
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
# test.sh — Harbor compatibility wrapper
3+
# Harbor requires tests/test.sh for task discovery (TaskPaths.is_valid() check).
4+
# The actual evaluation logic lives in eval.sh (SWE-Factory exit-code-first pattern).
5+
6+
# sg_only_env: restore full repo before verification (no-op for regular runs)
7+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
8+
9+
exec bash "$(dirname "$0")/eval.sh" "$@"
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
# test.sh — Harbor compatibility wrapper
3+
# Harbor requires tests/test.sh for task discovery (TaskPaths.is_valid() check).
4+
# The actual evaluation logic lives in eval.sh (SWE-Factory exit-code-first pattern).
5+
6+
# sg_only_env: restore full repo before verification (no-op for regular runs)
7+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
8+
9+
exec bash "$(dirname "$0")/eval.sh" "$@"
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
# test.sh — Harbor compatibility wrapper
3+
# Harbor requires tests/test.sh for task discovery (TaskPaths.is_valid() check).
4+
# The actual evaluation logic lives in eval.sh (SWE-Factory exit-code-first pattern).
5+
6+
# sg_only_env: restore full repo before verification (no-op for regular runs)
7+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
8+
9+
exec bash "$(dirname "$0")/eval.sh" "$@"
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
# test.sh — Harbor compatibility wrapper
3+
# Harbor requires tests/test.sh for task discovery (TaskPaths.is_valid() check).
4+
# The actual evaluation logic lives in eval.sh (SWE-Factory exit-code-first pattern).
5+
6+
# sg_only_env: restore full repo before verification (no-op for regular runs)
7+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
8+
9+
exec bash "$(dirname "$0")/eval.sh" "$@"

scripts/abc_audit.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -955,6 +955,15 @@ def audit_suite(suite: str, dimension: Optional[Dimension] = None) -> AuditRepor
955955
))
956956
continue
957957

958+
# R.2 doesn't apply to MCP-unique suites: instructions intentionally
959+
# reference Sourcegraph MCP tools (that's the point of these tasks).
960+
if cid == "R.2" and suite.startswith("ccb_mcp_"):
961+
report.results.append(CriterionResult(
962+
criterion_id=cid, status=Status.SKIP,
963+
evidence="MCP-unique suite: MCP tool references in instructions are by design",
964+
))
965+
continue
966+
958967
# Run automated check
959968
if cid in TASK_CHECKS:
960969
result = TASK_CHECKS[cid](tasks)

0 commit comments

Comments
 (0)