Skip to content

Commit a726e6d

Browse files
sjarmakclaude
andcommitted
feat: add SDLC mode to Daytona curator runner + 32 ground truth files
Refactor daytona_curator_runner.py to support CodeScaleBench SDLC tasks alongside existing ContextBench calibration mode. Adds --sdlc-all, --missing-only, --suite, --task-dir flags with dual-mode dispatch. New _extract_repo_info_for_sandbox() resolves repos via 4 strategies: git clone URLs, # Repo: comments, # Source: org/repo (commit), SWEAP FROM tags, and TAC_REPO_MAP (bustub, openhands). Also adds Strategies 4/4b/5 to _resolve_repos() in context_retrieval_agent.py for the same patterns. First batch run completed 32/56 tasks (24 remaining due to timeouts and VM spindown). Includes handoff for resuming the remaining tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b768d6a commit a726e6d

File tree

67 files changed

+2945
-96
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+2945
-96
lines changed
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
{
2+
"files": [
3+
"internal/server/auth/middleware.go",
4+
"internal/server/auth/middleware_test.go",
5+
"internal/storage/auth/memory/store.go",
6+
"internal/storage/auth/auth.go",
7+
"internal/containers/option.go",
8+
"cmd/flipt/main.go"
9+
],
10+
"symbols": [
11+
{
12+
"file": "internal/server/auth/middleware.go",
13+
"symbol": "UnaryInterceptor",
14+
"repo": null
15+
},
16+
{
17+
"file": "internal/server/auth/middleware.go",
18+
"symbol": "clientTokenFromMetadata",
19+
"repo": null
20+
},
21+
{
22+
"file": "internal/server/auth/middleware.go",
23+
"symbol": "cookieFromMetadata",
24+
"repo": null
25+
},
26+
{
27+
"file": "internal/server/auth/middleware.go",
28+
"symbol": "clientTokenFromAuthorization",
29+
"repo": null
30+
},
31+
{
32+
"file": "internal/server/auth/middleware.go",
33+
"symbol": "WithServerSkipsAuthentication",
34+
"repo": null
35+
},
36+
{
37+
"file": "internal/server/auth/middleware.go",
38+
"symbol": "InterceptorOptions",
39+
"repo": null
40+
},
41+
{
42+
"file": "internal/server/auth/middleware.go",
43+
"symbol": "GetAuthenticationFrom",
44+
"repo": null
45+
},
46+
{
47+
"file": "internal/server/auth/middleware.go",
48+
"symbol": "Authenticator",
49+
"repo": null
50+
},
51+
{
52+
"file": "internal/storage/auth/memory/store.go",
53+
"symbol": "NewStore",
54+
"repo": null
55+
},
56+
{
57+
"file": "internal/storage/auth/memory/store.go",
58+
"symbol": "CreateAuthentication",
59+
"repo": null
60+
},
61+
{
62+
"file": "internal/containers/option.go",
63+
"symbol": "ApplyAll",
64+
"repo": null
65+
}
66+
]
67+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"has_ground_truth": true,
3+
"has_chunk_ground_truth": false,
4+
"ground_truth_source": "curator_agent",
5+
"ground_truth_confidence": "medium",
6+
"task_name": "flipt-auth-cookie-regression-prove-001",
7+
"curator_agent_version": "2.0",
8+
"model": "claude-opus-4-6",
9+
"backend": "hybrid",
10+
"timestamp": "2026-03-03T16:26:01Z",
11+
"files_count": 6,
12+
"edit_files_count": 0,
13+
"chunks_count": 0,
14+
"symbols_count": 11,
15+
"cost_usd": 1.65988845,
16+
"elapsed_sec": 538.3,
17+
"exploration_notes": "The bug is in internal/server/auth/middleware.go. Before commit 6fe76d02, the UnaryInterceptor only extracted client tokens from the 'authorization' metadata key (expecting 'Bearer <token>' format). It had no cookie fallback and no server skip mechanism. The fix added: (1) clientTokenFromMetadata() which falls back to cookieFromMetadata() when no Authorization header is present, parsing the 'flipt_client_token' cookie from the 'grpcgateway-cookie' metadata key; (2) WithServerSkipsAuthentication("
18+
}
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"files": [
3+
"qutebrowser/config/configtypes.py"
4+
],
5+
"symbols": [
6+
{
7+
"file": "qutebrowser/config/configtypes.py",
8+
"symbol": "_parse_value",
9+
"repo": null
10+
},
11+
{
12+
"file": "qutebrowser/config/configtypes.py",
13+
"symbol": "QtColor",
14+
"repo": null
15+
}
16+
]
17+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"has_ground_truth": true,
3+
"has_chunk_ground_truth": false,
4+
"ground_truth_source": "curator_agent",
5+
"ground_truth_confidence": "medium",
6+
"task_name": "qutebrowser-hsv-color-regression-prove-001",
7+
"curator_agent_version": "2.0",
8+
"model": "claude-opus-4-6",
9+
"backend": "hybrid",
10+
"timestamp": "2026-03-03T16:23:04Z",
11+
"files_count": 1,
12+
"edit_files_count": 0,
13+
"chunks_count": 0,
14+
"symbols_count": 2,
15+
"cost_usd": 2.04564375,
16+
"elapsed_sec": 362.6,
17+
"exploration_notes": "The bug is in QtColor._parse_value() in qutebrowser/config/configtypes.py. Before commit 6b320dc18, _parse_value used a hardcoded multiplier of 255.0 for all color channels including hue, which should use 359.0. The fix added a 'kind' parameter to _parse_value and uses 359.0 when kind=='h'. The regression test at /workspace/regression_test.py extracts _parse_value from the source via AST parsing (avoiding PyQt5 import dependencies) and verifies hue percentages are scaled to 0-359. It passes on t"
18+
}
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
{
2+
"files": [
3+
"lib/auth/auth.go",
4+
"lib/auth/grpcserver.go",
5+
"lib/client/weblogin.go",
6+
"lib/auth/u2f/authenticate.go",
7+
"lib/auth/apiserver.go",
8+
"lib/web/apiserver.go",
9+
"lib/services/local/users.go",
10+
"lib/auth/auth_test.go"
11+
],
12+
"symbols": [
13+
{
14+
"file": "lib/auth/auth.go",
15+
"symbol": "U2FAuthenticateChallenge",
16+
"repo": null
17+
},
18+
{
19+
"file": "lib/auth/auth.go",
20+
"symbol": "U2FSignRequest",
21+
"repo": null
22+
},
23+
{
24+
"file": "lib/auth/auth.go",
25+
"symbol": "mfaAuthChallenge",
26+
"repo": null
27+
},
28+
{
29+
"file": "lib/auth/auth.go",
30+
"symbol": "validateMFAAuthResponse",
31+
"repo": null
32+
},
33+
{
34+
"file": "lib/auth/auth.go",
35+
"symbol": "checkU2F",
36+
"repo": null
37+
},
38+
{
39+
"file": "lib/auth/grpcserver.go",
40+
"symbol": "DeleteMFADevice",
41+
"repo": null
42+
},
43+
{
44+
"file": "lib/auth/grpcserver.go",
45+
"symbol": "AddMFADevice",
46+
"repo": null
47+
},
48+
{
49+
"file": "lib/auth/grpcserver.go",
50+
"symbol": "deleteMFADeviceAuthChallenge",
51+
"repo": null
52+
},
53+
{
54+
"file": "lib/client/weblogin.go",
55+
"symbol": "SSHAgentU2FLogin",
56+
"repo": null
57+
},
58+
{
59+
"file": "lib/auth/u2f/authenticate.go",
60+
"symbol": "AuthenticateInit",
61+
"repo": null
62+
},
63+
{
64+
"file": "lib/auth/u2f/authenticate.go",
65+
"symbol": "AuthenticateSignChallenge",
66+
"repo": null
67+
},
68+
{
69+
"file": "lib/auth/u2f/authenticate.go",
70+
"symbol": "AuthenticateVerify",
71+
"repo": null
72+
},
73+
{
74+
"file": "lib/services/local/users.go",
75+
"symbol": "GetMFADevices",
76+
"repo": null
77+
},
78+
{
79+
"file": "lib/services/local/users.go",
80+
"symbol": "DeleteMFADevice",
81+
"repo": null
82+
}
83+
]
84+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"has_ground_truth": true,
3+
"has_chunk_ground_truth": false,
4+
"ground_truth_source": "curator_agent",
5+
"ground_truth_confidence": "medium",
6+
"task_name": "teleport-ssh-regression-prove-001",
7+
"curator_agent_version": "2.0",
8+
"model": "claude-opus-4-6",
9+
"backend": "hybrid",
10+
"timestamp": "2026-03-03T16:28:55Z",
11+
"files_count": 8,
12+
"edit_files_count": 0,
13+
"chunks_count": 0,
14+
"symbols_count": 14,
15+
"cost_usd": 3.1670065999999992,
16+
"elapsed_sec": 483.9,
17+
"exploration_notes": "Investigation identified three root causes for multi-device U2F authentication issues:\n\n1. **Single-device lock-in for old/web clients** (lib/auth/auth.go:830-884): The `U2FAuthenticateChallenge` struct embeds a single `*u2f.AuthenticateChallenge` for backward compatibility. When JSON-serialized, Go flattens the embedded struct fields to the top level. Old clients (and the web UI via POST /webapi/u2f/signrequest) only read these top-level fields, seeing only 1 device challenge out of N registere"
18+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"files": [
3+
"django/db/models/signals.py",
4+
"django/db/models/base.py"
5+
],
6+
"symbols": [
7+
{
8+
"file": "django/db/models/signals.py",
9+
"symbol": "pre_validate",
10+
"repo": null
11+
},
12+
{
13+
"file": "django/db/models/base.py",
14+
"symbol": "full_clean",
15+
"repo": null
16+
}
17+
]
18+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"has_ground_truth": true,
3+
"has_chunk_ground_truth": false,
4+
"ground_truth_source": "curator_agent",
5+
"ground_truth_confidence": "medium",
6+
"task_name": "django-pre-validate-signal-design-001",
7+
"curator_agent_version": "2.0",
8+
"model": "claude-opus-4-6",
9+
"backend": "hybrid",
10+
"timestamp": "2026-03-03T16:21:54Z",
11+
"files_count": 2,
12+
"edit_files_count": 0,
13+
"chunks_count": 0,
14+
"symbols_count": 2,
15+
"cost_usd": 0.6277082500000001,
16+
"elapsed_sec": 70.6,
17+
"exploration_notes": "Added pre_validate signal to Django models. Three changes were made: (1) Defined `pre_validate = ModelSignal(use_caching=True)` in signals.py alongside existing model signals, (2) Added `pre_validate` to the import list in base.py, (3) Dispatched the signal at the top of `full_clean()` before any validation runs, following the exact same pattern as `pre_save` \u2014 using `self._meta.auto_created` guard, `send()` with `sender=self.__class__` and `instance=self`, plus validation-relevant kwargs (exclu"
18+
}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"files": [
3+
"django/middleware/ratelimit.py",
4+
"django/middleware/security.py",
5+
"django/middleware/clickjacking.py",
6+
"django/middleware/common.py",
7+
"django/utils/deprecation.py",
8+
"django/http/response.py"
9+
],
10+
"symbols": [
11+
{
12+
"file": "django/middleware/ratelimit.py",
13+
"symbol": "RateLimitMiddleware",
14+
"repo": null
15+
},
16+
{
17+
"file": "django/utils/deprecation.py",
18+
"symbol": "MiddlewareMixin",
19+
"repo": null
20+
},
21+
{
22+
"file": "django/http/response.py",
23+
"symbol": "HttpResponseForbidden",
24+
"repo": null
25+
},
26+
{
27+
"file": "django/middleware/security.py",
28+
"symbol": "SecurityMiddleware",
29+
"repo": null
30+
}
31+
]
32+
}
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"has_ground_truth": true,
3+
"has_chunk_ground_truth": false,
4+
"ground_truth_source": "curator_agent",
5+
"ground_truth_confidence": "medium",
6+
"task_name": "django-rate-limit-design-001",
7+
"curator_agent_version": "2.0",
8+
"model": "claude-opus-4-6",
9+
"backend": "hybrid",
10+
"timestamp": "2026-03-03T16:21:41Z",
11+
"files_count": 6,
12+
"edit_files_count": 0,
13+
"chunks_count": 0,
14+
"symbols_count": 4,
15+
"cost_usd": 0.5020742499999999,
16+
"elapsed_sec": 58.3,
17+
"exploration_notes": "Created django/middleware/ratelimit.py with RateLimitMiddleware following the actual Django middleware pattern observed in security.py, clickjacking.py, and common.py. The pattern is: extend MiddlewareMixin, call super().__init__(get_response), read settings in __init__ with getattr defaults, and use process_request to intercept requests (returning a response short-circuits the pipeline). The middleware tracks per-IP request timestamps in a class-level dict, returns HttpResponseForbidden (403) w"
18+
}

0 commit comments

Comments
 (0)