Skip to content

Commit 3f77719

Browse files
author
LoCoBench Bot
committed
Merge remote-tracking branch 'origin/ralph/gapfill-crossrepo'
# Conflicts: # configs/codereview_2config.sh # configs/selected_benchmark_tasks.json # prd.json
2 parents ca10e30 + 63a0657 commit 3f77719

File tree

42 files changed

+2790
-160
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+2790
-160
lines changed

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ docs/
4949
| TAC | 6 | Mixed | Tool-augmented coding | Active |
5050
| DIBench | 8 | Mixed | Dependency installation | Active |
5151
| SWE-Perf | 3 | Python | Performance optimization | Active |
52-
| CodeReview | 3 | TS, C#, Mixed | AI code review: find & fix injected PR defects | Active — harder variant planned |
52+
| CodeReview | 3 | C#, JS, TS | AI code review: find & fix injected PR defects | Active — harder variant planned |
5353
| LinuxFLBench | 5 | C | Linux kernel fault localization | Active — verifier needs hardening |
5454
| Investigation | 4 | Mixed | Codebase investigation tasks | Active — harder variant planned |
5555
| Enterprise | 6 | Mixed | Enterprise codebase tasks | Active |
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
FROM golang:1.23-alpine AS base
2+
3+
# Install git for cloning
4+
RUN apk add --no-cache git bash
5+
6+
# Clone repos at pinned commits
7+
WORKDIR /workspace
8+
9+
# Clone kubernetes/kubernetes
10+
RUN git clone https://github.com/kubernetes/kubernetes.git kubernetes && \
11+
cd kubernetes && \
12+
git checkout 31bf3ed48b91b67e5003d8df1b3bd0b918d1fb94
13+
14+
# Clone kubernetes/api (synced from k/k staging)
15+
RUN git clone https://github.com/kubernetes/api.git api && \
16+
cd api && \
17+
git checkout f32ed1d60cf0787a512bebd6c06a4b84ae0b7cc7
18+
19+
# Clone kubernetes/apimachinery
20+
RUN git clone https://github.com/kubernetes/apimachinery.git apimachinery && \
21+
cd apimachinery && \
22+
git checkout b2e9f88ff6d4c50c13061a53b1239c7707354eda
23+
24+
# Verify repos exist
25+
RUN ls -la /workspace/ && \
26+
test -d /workspace/kubernetes && \
27+
test -d /workspace/api && \
28+
test -d /workspace/apimachinery
29+
30+
FROM base AS final
31+
WORKDIR /workspace
32+
CMD ["/bin/bash"]
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
Trace the definition of `TypeMeta` through a chain of Kubernetes package dependencies.
2+
3+
## Background
4+
5+
In the Kubernetes ecosystem, core API types are defined across multiple repositories forming a dependency chain. The `TypeMeta` struct is a fundamental type embedded in all Kubernetes resource objects (Pod, Deployment, Service, etc.).
6+
7+
## Repositories
8+
9+
Three repositories are available under `/workspace/`:
10+
11+
- `/workspace/kubernetes/` — kubernetes/kubernetes (main Kubernetes repo with controller implementations)
12+
- `/workspace/api/` — kubernetes/api (API type definitions for core resources)
13+
- `/workspace/apimachinery/` — kubernetes/apimachinery (shared machinery and meta types)
14+
15+
## Task
16+
17+
Trace the `TypeMeta` type from its **usage site** in the main Kubernetes repo through any **intermediate re-exports** to its **original definition**. Document each step in the chain.
18+
19+
Start from this usage site:
20+
- **File**: `/workspace/kubernetes/staging/src/k8s.io/api/core/v1/types.go`
21+
- **Usage**: The `Pod` struct embeds `metav1.TypeMeta`
22+
23+
For each link in the chain, record:
24+
- `step`: sequence number (1 for usage, 2 for re-export, 3 for definition)
25+
- `repo`: which repository (e.g., `kubernetes/kubernetes`, `kubernetes/api`, `kubernetes/apimachinery`)
26+
- `file`: path relative to the repository root
27+
- `line`: line number where the symbol appears (approximate is acceptable)
28+
- `context`: what happens at this step (e.g., "Pod embeds TypeMeta", "api/core/v1 re-exports metav1", "TypeMeta defined here")
29+
30+
## Output
31+
32+
Write your results to `/workspace/chain.json`:
33+
34+
```json
35+
[
36+
{
37+
"step": 1,
38+
"repo": "kubernetes/kubernetes",
39+
"file": "staging/src/k8s.io/api/core/v1/types.go",
40+
"line": 4500,
41+
"context": "Pod struct embeds metav1.TypeMeta"
42+
},
43+
{
44+
"step": 2,
45+
"repo": "kubernetes/api",
46+
"file": "core/v1/types.go",
47+
"line": 4500,
48+
"context": "api/core/v1 imports metav1 from apimachinery"
49+
},
50+
{
51+
"step": 3,
52+
"repo": "kubernetes/apimachinery",
53+
"file": "pkg/apis/meta/v1/types.go",
54+
"line": 50,
55+
"context": "TypeMeta struct definition with APIVersion and Kind fields"
56+
}
57+
]
58+
```
59+
60+
## Notes
61+
62+
- The kubernetes/kubernetes repository contains a staging directory (`staging/src/k8s.io/`) with code that is synced to separate repositories (kubernetes/api, kubernetes/apimachinery). For this task, treat them as separate codebases.
63+
- Use `go_to_definition` or cross-file search to trace imports and type references.
64+
- You may encounter intermediate re-exports—document all steps.
65+
- Line numbers are approximate; +/- 50 lines is acceptable if the symbol is in that region.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
version = "1.0"
2+
3+
[metadata]
4+
task_id = "crossrepo-chain-001"
5+
category = "symbol_resolution"
6+
language = "go"
7+
difficulty = "very_hard"
8+
tags = ["ccb-crossrepo", "dependency-chain", "kubernetes", "type-resolution", "cross-repo-navigation"]
9+
10+
[verifier]
11+
timeout_sec = 300.0
12+
command = "bash /tests/test.sh"
13+
14+
[agent]
15+
timeout_sec = 1200.0
16+
17+
[verification]
18+
reward_type = "partial_credit"
19+
description = "Partial credit for each correct link in the dependency chain (usage → re-export → definition)"
20+
21+
[environment]
22+
docker_image = "harbor-ccb_crossrepo:crossrepo-chain-001"
23+
build_timeout_sec = 600.0
24+
cpus = 2
25+
memory_mb = 8192
26+
storage_mb = 20480
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"steps": [
3+
{
4+
"step": 1,
5+
"repo": "kubernetes/kubernetes",
6+
"file": "staging/src/k8s.io/api/core/v1/types.go",
7+
"line": 5465,
8+
"context": "Pod struct embeds metav1.TypeMeta"
9+
},
10+
{
11+
"step": 2,
12+
"repo": "kubernetes/api",
13+
"file": "core/v1/types.go",
14+
"line": 21,
15+
"context": "core/v1 imports metav1 from k8s.io/apimachinery/pkg/apis/meta/v1"
16+
},
17+
{
18+
"step": 3,
19+
"repo": "kubernetes/apimachinery",
20+
"file": "pkg/apis/meta/v1/types.go",
21+
"line": 42,
22+
"context": "TypeMeta struct definition with Kind and APIVersion fields"
23+
}
24+
]
25+
}
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
#!/bin/bash
2+
# Reward: Partial credit (0.0-1.0) — each correct link in the chain scores independently
3+
#
4+
# DEPENDENCY CHAIN SCORER
5+
# -----------------------
6+
# Scores agent output by comparing each step in the dependency chain against
7+
# ground truth. Each step is worth an equal fraction of the total score.
8+
# Matching is fuzzy on line numbers (+/- tolerance) and exact on repo/file.
9+
10+
set -e
11+
12+
OUTPUT_PATH="/workspace/chain.json"
13+
GROUND_TRUTH="/tests/ground_truth.json"
14+
REWARD_FILE="/logs/verifier/reward.txt"
15+
LINE_TOLERANCE=50 # +/- 50 lines for line number matching
16+
17+
mkdir -p /logs/verifier
18+
19+
# ── Check prerequisites ───────────────────────────────────────────────────
20+
if [ ! -f "$GROUND_TRUTH" ]; then
21+
echo "ERROR: ground_truth.json not found at $GROUND_TRUTH"
22+
echo "0.0" > "$REWARD_FILE"
23+
exit 0
24+
fi
25+
26+
if [ ! -f "$OUTPUT_PATH" ]; then
27+
echo "No agent output found at $OUTPUT_PATH"
28+
echo "Agent did not produce the required chain.json file."
29+
echo "0.0" > "$REWARD_FILE"
30+
exit 0
31+
fi
32+
33+
echo "Scoring dependency chain..."
34+
echo "Output: $OUTPUT_PATH"
35+
echo "Ground truth: $GROUND_TRUTH"
36+
echo ""
37+
38+
# ── Delegate scoring to Python ────────────────────────────────────────────
39+
OUTPUT_PATH="$OUTPUT_PATH" GROUND_TRUTH="$GROUND_TRUTH" \
40+
REWARD_FILE="$REWARD_FILE" LINE_TOLERANCE="$LINE_TOLERANCE" \
41+
python3 << 'PYEOF'
42+
import json, os, re, sys
43+
44+
OUTPUT_PATH = os.environ["OUTPUT_PATH"]
45+
GT_PATH = os.environ["GROUND_TRUTH"]
46+
REWARD_PATH = os.environ["REWARD_FILE"]
47+
LINE_TOLERANCE = int(os.environ.get("LINE_TOLERANCE", "50"))
48+
49+
def write_reward(score):
50+
"""Write score to reward file and print summary."""
51+
with open(REWARD_PATH, "w") as f:
52+
f.write(f"{score:.2f}\n")
53+
print(f"\nTests completed - Score: {score:.2f}")
54+
55+
def strip_code_fences(text):
56+
"""Strip markdown code fences if agent wrapped JSON in ```json blocks."""
57+
m = re.search(r'```(?:json)?\s*\n(.*?)```', text, re.DOTALL)
58+
return m.group(1).strip() if m else text.strip()
59+
60+
def normalize_path(path):
61+
"""Normalize file paths (remove leading ./ or /workspace/)."""
62+
path = path.strip()
63+
path = re.sub(r'^\./', '', path)
64+
path = re.sub(r'^/workspace/[^/]+/', '', path)
65+
return path
66+
67+
def lines_match(line1, line2, tolerance):
68+
"""Check if two line numbers match within tolerance (both can be None)."""
69+
if line1 is None or line2 is None:
70+
return True # Don't penalize if line number not provided
71+
return abs(int(line1) - int(line2)) <= tolerance
72+
73+
# ── Load ground truth ────────────────────────────────────────────────────
74+
with open(GT_PATH) as f:
75+
gt = json.load(f)
76+
77+
expected_steps = gt.get("steps", [])
78+
if not expected_steps:
79+
print("ERROR: ground_truth.json must have a 'steps' array")
80+
write_reward(0.0)
81+
sys.exit(0)
82+
83+
num_expected = len(expected_steps)
84+
85+
# ── Load agent output ────────────────────────────────────────────────────
86+
try:
87+
with open(OUTPUT_PATH) as f:
88+
raw = f.read()
89+
raw = strip_code_fences(raw)
90+
reported_steps = json.loads(raw)
91+
if not isinstance(reported_steps, list):
92+
print("Agent output is not a JSON array — scoring as empty.")
93+
reported_steps = []
94+
except (json.JSONDecodeError, ValueError) as e:
95+
print(f"Malformed JSON in agent output: {e}")
96+
reported_steps = []
97+
98+
num_reported = len(reported_steps)
99+
100+
if num_reported == 0:
101+
print("Agent output is empty — no chain steps to score.")
102+
print(f"Expected {num_expected} steps.")
103+
write_reward(0.0)
104+
sys.exit(0)
105+
106+
# ── Score each step ──────────────────────────────────────────────────────
107+
print(f"=== Dependency Chain Scoring ===")
108+
print(f" Expected steps: {num_expected}")
109+
print(f" Reported steps: {num_reported}")
110+
print(f" Line tolerance: +/- {LINE_TOLERANCE}")
111+
print()
112+
113+
correct_steps = 0
114+
step_details = []
115+
116+
for i, expected in enumerate(expected_steps, start=1):
117+
# Find matching reported step by step number or position
118+
reported = None
119+
for r in reported_steps:
120+
if r.get("step") == expected.get("step", i):
121+
reported = r
122+
break
123+
124+
if not reported and i <= num_reported:
125+
# Fallback: match by position if step field missing
126+
reported = reported_steps[i-1]
127+
128+
if not reported:
129+
step_details.append({
130+
"step": i,
131+
"status": "MISSING",
132+
"expected": expected
133+
})
134+
continue
135+
136+
# Check each field
137+
repo_match = expected.get("repo", "").strip() == reported.get("repo", "").strip()
138+
file_match = normalize_path(expected.get("file", "")) == normalize_path(reported.get("file", ""))
139+
line_match = lines_match(expected.get("line"), reported.get("line"), LINE_TOLERANCE)
140+
141+
all_match = repo_match and file_match and line_match
142+
143+
if all_match:
144+
correct_steps += 1
145+
status = "CORRECT"
146+
else:
147+
status = "PARTIAL" if (repo_match or file_match) else "WRONG"
148+
149+
step_details.append({
150+
"step": i,
151+
"status": status,
152+
"repo_match": repo_match,
153+
"file_match": file_match,
154+
"line_match": line_match,
155+
"expected": expected,
156+
"reported": reported
157+
})
158+
159+
# ── Compute score ────────────────────────────────────────────────────────
160+
# Each step is worth equal credit
161+
score = correct_steps / num_expected if num_expected > 0 else 0.0
162+
163+
# ── Print detailed results ───────────────────────────────────────────────
164+
print("=== Step-by-Step Results ===")
165+
for detail in step_details:
166+
status = detail["status"]
167+
symbol = "✓" if status == "CORRECT" else "✗" if status == "WRONG" else "~"
168+
print(f"\nStep {detail['step']}: [{symbol}] {status}")
169+
170+
exp = detail["expected"]
171+
print(f" Expected: {exp.get('repo')} / {exp.get('file')} : {exp.get('line')}")
172+
print(f" {exp.get('context', 'N/A')}")
173+
174+
if "reported" in detail:
175+
rep = detail["reported"]
176+
print(f" Reported: {rep.get('repo')} / {rep.get('file')} : {rep.get('line')}")
177+
print(f" {rep.get('context', 'N/A')}")
178+
179+
if status != "MISSING":
180+
print(f" Match: repo={detail['repo_match']}, file={detail['file_match']}, line={detail['line_match']}")
181+
else:
182+
print(f" Reported: (missing)")
183+
184+
print(f"\n=== Summary ===")
185+
print(f" Correct steps: {correct_steps}/{num_expected}")
186+
print(f" Score: {score:.2f}")
187+
188+
write_reward(score)
189+
PYEOF
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
FROM golang:1.23-alpine AS base
2+
3+
# Install git for cloning
4+
RUN apk add --no-cache git bash
5+
6+
# Clone repos at pinned commits
7+
WORKDIR /workspace
8+
9+
# Clone istio/istio
10+
RUN git clone https://github.com/istio/istio.git istio && \
11+
cd istio && \
12+
git checkout 4c1f845d839e9086ee85ad9337f2647492322eb4
13+
14+
# Clone envoyproxy/go-control-plane
15+
RUN git clone https://github.com/envoyproxy/go-control-plane.git go-control-plane && \
16+
cd go-control-plane && \
17+
git checkout 71637ad69bbc5f51fbb2562e612a4365292804a5
18+
19+
# Clone envoyproxy/data-plane-api (protobuf definitions)
20+
RUN git clone https://github.com/envoyproxy/data-plane-api.git data-plane-api && \
21+
cd data-plane-api && \
22+
git checkout 84e84367f2560cdb47b9bb78fd3e615feb80c3e4
23+
24+
# Verify repos exist
25+
RUN ls -la /workspace/ && \
26+
test -d /workspace/istio && \
27+
test -d /workspace/go-control-plane && \
28+
test -d /workspace/data-plane-api
29+
30+
FROM base AS final
31+
WORKDIR /workspace
32+
CMD ["/bin/bash"]

0 commit comments

Comments
 (0)