|
102 | 102 | {"id":"CodeContextBench-k0q","title":"US-008a: Scaffold first 3 governance tasks","status":"closed","priority":1,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-15T14:29:12.130442847Z","created_by":"LoCoBench Bot","updated_at":"2026-02-15T14:33:55.402208118Z","closed_at":"2026-02-15T14:33:55.402208118Z","close_reason":"US-008a completed: 3 governance tasks scaffolded"} |
103 | 103 | {"id":"CodeContextBench-k3s","title":"Scaffold ccb_investigation tasks — regression hunt, impact analysis, cross-service debug, migration audit","description":"Implement 3-4 prototype investigation tasks using existing SG-indexed repos. Each task needs: task.toml, instruction.md, Dockerfile, tests/test.sh. Task designs: (a) Regression Hunt using flipt or ansible repo — 'users report X broke, find the commit and fix', requires commit_search + diff_search. (b) Impact Analysis using kubernetes — 'change function Foo signature, find and update all callers', requires find_references cross-repo. (c) Cross-Service Debug — 'Service A fails calling Service B, diagnose contract mismatch', requires multi-repo search with only Service A in workspace. (d) Migration Discovery — 'library X deprecated API Y, find all usages across org', requires org-wide keyword_search. Blocked by design task.","status":"closed","priority":0,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-07T13:00:17.768845617Z","created_by":"LoCoBench Bot","updated_at":"2026-02-07T13:29:12.011467354Z","closed_at":"2026-02-07T13:29:12.011467354Z","close_reason":"Scaffolded 4 investigation tasks: inv-regression-001 (Grafana v38 migration), inv-impact-001 (K8s DRA AllocationMode), inv-debug-001 (Prometheus remote-write resharding), inv-migration-001 (Django ADMINS/MANAGERS). Each has task.toml, instruction.md, Dockerfile, test.sh, ground_truth.json. All commits verified in SG.","dependencies":[{"issue_id":"CodeContextBench-k3s","depends_on_id":"CodeContextBench-4q2","type":"blocks","created_at":"2026-02-07T13:00:37.791203497Z","created_by":"LoCoBench Bot"}]} |
104 | 104 | {"id":"CodeContextBench-kph","title":"Rerun SG_full tasks with Deep Search retry preamble","description":"Deep Search retry fix has been applied to claude_baseline_agent.py preamble. Need to rerun SG_full configs for benchmarks where old runs had \u003e30% polling-only DS responses: K8s Docs (40% success), PyTorch (50% success), SWE-bench Pro (67% success). Also rerun LoCoBench and RepoQA SG_full which used old DS instruction format (H1: LoCoBench 2/23, RepoQA 0/10 compliance).","notes":"2026-02-08: SG_full reruns partially complete. LoCoBench 25/25 (0.499), RepoQA 9/9 (1.000), PyTorch 11/12 (0.243, sgt-025 Docker fail). SWE-Pro 25/36 OK (0.760, 10 AgentSetupTimeoutError + 1 zero-token). K8s Docs 0/5 (auth_failed—tokens expired). SWE-Pro rerun also auth_failed. Auth-failed runs archived. Remaining: K8s Docs SG_full (5 tasks), SWE-Pro (10-11 tasks), RepoQA 1 task (cpp-skypjack-uvw-00), PyTorch 1 task (sgt-025). Tokens expired, need headless_login.py refresh first.","status":"closed","priority":1,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-06T14:50:13.685838976Z","created_by":"LoCoBench Bot","updated_at":"2026-02-16T00:50:42.40242785Z","closed_at":"2026-02-16T00:50:42.40242785Z","close_reason":"V3 preamble deployed, Deep Search retry no longer needed (0% DS usage)"} |
105 | | -{"id":"CodeContextBench-kqz","title":"Phase 1: Run K8s Docs isolated pilot","status":"open","priority":2,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-16T18:42:37.550250999Z","created_by":"LoCoBench Bot","updated_at":"2026-02-16T18:42:37.550250999Z"} |
| 105 | +{"id":"CodeContextBench-kqz","title":"Phase 1: Run K8s Docs isolated pilot","status":"in_progress","priority":2,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-16T18:42:37.550250999Z","created_by":"LoCoBench Bot","updated_at":"2026-02-16T18:45:43.876173371Z"} |
106 | 106 | {"id":"CodeContextBench-lqf","title":"Push commits to remote (US-009 complete)","description":"BLOCKING: Cannot push 10 commits - gh auth not configured. Need user to run 'gh auth login' or configure git credentials. Commits: US-009 through US-014 (all PRD user stories complete).","status":"closed","priority":0,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-16T16:30:46.002090815Z","created_by":"LoCoBench Bot","updated_at":"2026-02-16T16:47:52.000648447Z","closed_at":"2026-02-16T16:47:52.000648447Z","close_reason":"All US-001 through US-014 are complete and passing. Branch ready to push."} |
107 | 107 | {"id":"CodeContextBench-lr2","title":"Phase 5: Expand sourcegraph_isolated to more suites","status":"open","priority":2,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-16T18:42:44.702161298Z","created_by":"LoCoBench Bot","updated_at":"2026-02-16T18:42:44.702161298Z","dependencies":[{"issue_id":"CodeContextBench-lr2","depends_on_id":"CodeContextBench-c6m","type":"blocks","created_at":"2026-02-16T18:42:50.68748367Z","created_by":"LoCoBench Bot"}]} |
108 | 108 | {"id":"CodeContextBench-m5m","title":"US-007a: Scaffold 2 cross-file refactoring tasks (Tier A)","status":"closed","priority":2,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-15T23:31:33.028320306Z","created_by":"LoCoBench Bot","updated_at":"2026-02-15T23:35:12.670649307Z","closed_at":"2026-02-15T23:35:12.670649307Z","close_reason":"US-007a complete: K8s ScoreExtensions→ScoreNormalizer (16 files) + Rust SubtypePredicate→SubtypeRelation (19 files)"} |
|
143 | 143 | {"id":"CodeContextBench-yzh","title":"Create configs/run_overnight.sh orchestrator","status":"closed","priority":1,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-08T03:36:13.60028793Z","created_by":"LoCoBench Bot","updated_at":"2026-02-08T03:40:10.064123563Z","closed_at":"2026-02-08T03:40:10.064123563Z","close_reason":"Created configs/run_overnight.sh with sequential benchmark queue, token health checks, canary integration, dry-run, resume-from support"} |
144 | 144 | {"id":"CodeContextBench-z7n","title":"US-006b: Scaffold 3 architectural understanding tasks (Tier B repos)","status":"closed","priority":2,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-15T23:21:56.729085935Z","created_by":"LoCoBench Bot","updated_at":"2026-02-15T23:23:08.101442069Z","closed_at":"2026-02-15T23:23:08.101442069Z","close_reason":"Scaffolded 3 Tier B architectural understanding tasks (Camel, Flink, QuantLib)"} |
145 | 145 | {"id":"CodeContextBench-zj6","title":"Rerun SG_base after fixing doubled github.com bug","description":"After fixing the doubled github.com prefix bug in claude_baseline_agent.py, ALL SG_base results for benchmarks using sg-benchmarks mirror repos need reruns. Known affected: LinuxFLBench (confirmed 2 tasks scored lower), plus any benchmark where instance_to_mirror.json maps to github.com/sg-benchmarks/*. SG_full may also be affected for keyword_search calls (Deep Search uses different mechanics so may be less impacted).","status":"closed","priority":1,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-06T22:02:14.358067618Z","created_by":"LoCoBench Bot","updated_at":"2026-02-07T18:39:19.01740942Z","closed_at":"2026-02-07T18:39:19.01740942Z","close_reason":"SG_base reruns complete: CodeReview (0.98 vs BL 0.93), LinuxFLBench (0.82 vs BL 0.86)"} |
146 | | -{"id":"CodeContextBench-zku","title":"Phase 2: Add sourcegraph_only config to agent code","status":"in_progress","priority":2,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-16T18:42:39.88870793Z","created_by":"LoCoBench Bot","updated_at":"2026-02-16T18:43:12.869971482Z"} |
| 146 | +{"id":"CodeContextBench-zku","title":"Phase 2: Add sourcegraph_only config to agent code","status":"closed","priority":2,"issue_type":"task","owner":"locobench@anthropic.com","created_at":"2026-02-16T18:42:39.88870793Z","created_by":"LoCoBench Bot","updated_at":"2026-02-16T18:45:33.239193235Z","closed_at":"2026-02-16T18:45:33.239193235Z","close_reason":"sourcegraph_only added to claude_baseline_agent.py (9 locations), eval_matrix.json, aggregate_status.py, generate_manifest.py"} |
0 commit comments