You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New findings from 4 sessions (nightly report #9, compound PRD, prd.json
conversion, learnings extraction):
Skills/Automation:
- 54 stale ~/CodeScaleBench paths in 25 skill files
- 21 stale sourcegraph_full config refs in skills + schemas
- 3 deprecated model IDs in skills
Infrastructure:
- No pyproject.toml; 200+ scripts use sys.path.insert hack
- CI uses 3 Python versions across 4 workflows
- Schema examples embed legacy suite names
- prd-archive/ and prd.json not gitignored
Also condensed verbose sections to stay within 12,288-byte limit.
Copy file name to clipboardExpand all lines: AGENTS.md
+26-28Lines changed: 26 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,22 +5,17 @@ Keep it small. Use it to route to the right workflow and local guide, not as the
5
5
full operations manual.
6
6
7
7
## Non-Negotiables
8
-
- All work happens on `main` by default. If you use feature branches, keep them small, short-lived, and easy to fast-forward back into `main`.
9
-
- Every `harbor run`must be gated by interactive confirmation.
10
-
- Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
11
-
- Prefer a **remote execution environment**(e.g., Daytona) for large benchmark runs; use local Docker only when a task’s image or registry is incompatible with your cloud environment. See `docs/DAYTONA.md`.
12
-
- Set **parallelism based on your own account and model limits**. Avoid exceeding documented concurrency or rate caps for your environment or provider.
13
-
-Before launching any benchmark batch, check account readiness with `python3 scripts/check_infra.py` or `python3 scripts/account_health.py status`. Do not assume OAuth accounts are usable just because credentials exist.
8
+
- All work on `main`. Feature branches: small, short-lived, fast-forward merge.
9
+
- Every `harbor run` gated by interactive confirmation.
10
+
- Before commit/push: `python3 scripts/repo_health.py` (or `--quick` for docs/config-only).
11
+
- Prefer **Daytona** for large runs; local Docker only for incompatible tasks. See `docs/DAYTONA.md`.
12
+
- Set parallelism to your account/model limits. Don’t exceed documented concurrency caps.
-**no_changes_guard** must use `git diff origin/main HEAD` (not `git diff HEAD`) for auto-committing agents (e.g., OpenHands).
101
95
- Verifier fallbacks: `${TASK_WORKDIR:-/workspace}` for workdir, `${TASK_REPO_ROOT:-${VERIFY_REPO:-/workspace}}` for repo root.
102
96
- Set `GOWORK=off` in test.sh when sg_only verifier restores full repo (go.work may need newer Go).
103
-
-**122 active tasks**(259 total with backups) hardcode `ANSWER_PATH="/workspace/answer.json"` without fallbacks. Also check `ANSWER_JSON`variable in `answer_json_verifier_lib.sh`. All use same template pattern; bulk fix feasible. Zero scores on non-Harbor harnesses.
97
+
-**122 active tasks** hardcode `ANSWER_PATH="/workspace/answer.json"`. Check `ANSWER_JSON` in verifier lib. Bulk fix feasible; zero scores on non-Harbor.
104
98
105
99
### Scripts / Code Quality
106
100
-**abc_audit.py duplicate functions**: `check_oa_equivalent_solutions`, `check_ob_negated_solutions`, `check_og_determinism`, `check_t10_shared_state` each defined twice. Python uses last definition silently.
-`chown -R /workspace` blocks large repos; edit `runtime_init.py`. Set `PYTHONSAFEPATH=1`.
155
153
156
154
### CI / Workflows
157
155
-`docs-consistency.yml` redundant (subsumed by `repo_health.yml`). Export HTML truncates at 1200 rows.
156
+
- 4 workflows use 3 Python versions (3.10/3.11/3.12); standardize to 3.10. `roam.yml` unpinned `pip install roam-code`.
158
157
159
158
### Pre-commit / Pytest / Ralph
160
-
- Secret-detection false-positives on detection code. Use `--no-verify` when flagged code is detection logic.
161
-
- Classes named `TestPlan`/`TestCase`/`TestResult` auto-collected by pytest. Rename to `EvaluationPlan` etc.
162
-
- Ralph: `progress.txt` on feature branches, compound after merge. `prd.json` is single-active; archive before overwrite.
159
+
- Secret-detection false-positives: use `--no-verify` when flagged code is detection logic. Classes `TestPlan`/`TestCase`/`TestResult` auto-collected by pytest; rename.
160
+
- Ralph: `prd.json` single-active; archive before overwrite. `prd-archive/` and `prd.json` not gitignored; risk of accidental commit.
163
161
164
162
## Maintenance
165
163
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
Copy file name to clipboardExpand all lines: CLAUDE.md
+26-28Lines changed: 26 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,22 +5,17 @@ Keep it small. Use it to route to the right workflow and local guide, not as the
5
5
full operations manual.
6
6
7
7
## Non-Negotiables
8
-
- All work happens on `main` by default. If you use feature branches, keep them small, short-lived, and easy to fast-forward back into `main`.
9
-
- Every `harbor run`must be gated by interactive confirmation.
10
-
- Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
11
-
- Prefer a **remote execution environment**(e.g., Daytona) for large benchmark runs; use local Docker only when a task’s image or registry is incompatible with your cloud environment. See `docs/DAYTONA.md`.
12
-
- Set **parallelism based on your own account and model limits**. Avoid exceeding documented concurrency or rate caps for your environment or provider.
13
-
-Before launching any benchmark batch, check account readiness with `python3 scripts/check_infra.py` or `python3 scripts/account_health.py status`. Do not assume OAuth accounts are usable just because credentials exist.
8
+
- All work on `main`. Feature branches: small, short-lived, fast-forward merge.
9
+
- Every `harbor run` gated by interactive confirmation.
10
+
- Before commit/push: `python3 scripts/repo_health.py` (or `--quick` for docs/config-only).
11
+
- Prefer **Daytona** for large runs; local Docker only for incompatible tasks. See `docs/DAYTONA.md`.
12
+
- Set parallelism to your account/model limits. Don’t exceed documented concurrency caps.
-**no_changes_guard** must use `git diff origin/main HEAD` (not `git diff HEAD`) for auto-committing agents (e.g., OpenHands).
101
95
- Verifier fallbacks: `${TASK_WORKDIR:-/workspace}` for workdir, `${TASK_REPO_ROOT:-${VERIFY_REPO:-/workspace}}` for repo root.
102
96
- Set `GOWORK=off` in test.sh when sg_only verifier restores full repo (go.work may need newer Go).
103
-
-**122 active tasks**(259 total with backups) hardcode `ANSWER_PATH="/workspace/answer.json"` without fallbacks. Also check `ANSWER_JSON`variable in `answer_json_verifier_lib.sh`. All use same template pattern; bulk fix feasible. Zero scores on non-Harbor harnesses.
97
+
-**122 active tasks** hardcode `ANSWER_PATH="/workspace/answer.json"`. Check `ANSWER_JSON` in verifier lib. Bulk fix feasible; zero scores on non-Harbor.
104
98
105
99
### Scripts / Code Quality
106
100
-**abc_audit.py duplicate functions**: `check_oa_equivalent_solutions`, `check_ob_negated_solutions`, `check_og_determinism`, `check_t10_shared_state` each defined twice. Python uses last definition silently.
-`chown -R /workspace` blocks large repos; edit `runtime_init.py`. Set `PYTHONSAFEPATH=1`.
155
153
156
154
### CI / Workflows
157
155
-`docs-consistency.yml` redundant (subsumed by `repo_health.yml`). Export HTML truncates at 1200 rows.
156
+
- 4 workflows use 3 Python versions (3.10/3.11/3.12); standardize to 3.10. `roam.yml` unpinned `pip install roam-code`.
158
157
159
158
### Pre-commit / Pytest / Ralph
160
-
- Secret-detection false-positives on detection code. Use `--no-verify` when flagged code is detection logic.
161
-
- Classes named `TestPlan`/`TestCase`/`TestResult` auto-collected by pytest. Rename to `EvaluationPlan` etc.
162
-
- Ralph: `progress.txt` on feature branches, compound after merge. `prd.json` is single-active; archive before overwrite.
159
+
- Secret-detection false-positives: use `--no-verify` when flagged code is detection logic. Classes `TestPlan`/`TestCase`/`TestResult` auto-collected by pytest; rename.
160
+
- Ralph: `prd.json` single-active; archive before overwrite. `prd-archive/` and `prd.json` not gitignored; risk of accidental commit.
163
161
164
162
## Maintenance
165
163
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
0 commit comments