Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions skills/workflow-build/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,11 @@ Apply the CEO Review Gate protocol:

*On RELOOP: return to `builder` (max 3 iterations)*

## Phase 5: Evaluator
## Step: Eval


```bash
factory agent evaluator --task "Run eval: factory eval $PROJECT_PATH. Capture composite score and per-dimension breakdown. Report delta from baseline. Interpret which dimensions improved/regressed.
Read: .factory/reviews/builder-latest.md
Write output to: .factory/reviews/evaluator-latest.md" --project "$PROJECT_PATH" --timeout 600
factory eval "$PROJECT_PATH"
```

### Gate — Precheck (Automated)
Expand Down
6 changes: 2 additions & 4 deletions skills/workflow-design/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,13 +93,11 @@ Apply the CEO Review Gate protocol:

*On RELOOP: return to `builder` (max 3 iterations)*

## Phase 5: Evaluator
## Step: Eval


```bash
factory agent evaluator --task "Run eval: factory eval $PROJECT_PATH. Capture composite score and per-dimension breakdown. Report delta from baseline. Interpret which dimensions improved/regressed.
Read: .factory/reviews/builder-latest.md
Write output to: .factory/reviews/evaluator-latest.md" --project "$PROJECT_PATH" --timeout 600
factory eval "$PROJECT_PATH"
```

### Gate — Precheck (Automated)
Expand Down
6 changes: 2 additions & 4 deletions skills/workflow-improve/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,13 +93,11 @@ Apply the CEO Review Gate protocol:

*On RELOOP: return to `builder` (max 3 iterations)*

## Phase 5: Evaluator
## Step: Eval


```bash
factory agent evaluator --task "Run eval: factory eval $PROJECT_PATH. Capture composite score. Report delta from baseline. Interpret dimension changes.
Read: .factory/reviews/builder-latest.md
Write output to: .factory/reviews/evaluator-latest.md" --project "$PROJECT_PATH" --timeout 600
factory eval "$PROJECT_PATH"
```

### Gate — Precheck (Automated)
Expand Down
2 changes: 1 addition & 1 deletion skills/workflow-refine/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Write output to: .factory/reviews/builder-latest.md" --project "$PROJECT_PATH" -


```bash
factory agent reviewer --task "Verify the refinement. Run all 3 verification sections: 1. Health Check — run factory eval. Report composite score and delta. 2. Code Review — read PR diff, evaluate 7-category checklist. Run factory guard with --check-scope. 3. Adversarial QA — run/test the project, verify the refinement works.
factory agent qa --task "Verify the refinement. Run all 3 verification sections: 1. Health Check — run factory eval. Report composite score and delta. 2. Code Review — read PR diff, evaluate 7-category checklist. Run factory guard with --check-scope. 3. Adversarial QA — run/test the project, verify the refinement works.
Read: .factory/reviews/builder-latest.md
Write output to: .factory/reviews/qa-latest.md" --project "$PROJECT_PATH" --timeout 600
```
Expand Down
6 changes: 3 additions & 3 deletions skills/workflow-research/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The user wants: **$ARGUMENTS**


```bash
factory agent evaluator --task "Run eval and report results." --project "$PROJECT_PATH" --timeout 300
factory eval "$PROJECT_PATH"
```

## Phase 1: Failure Analyst
Expand Down Expand Up @@ -98,11 +98,11 @@ Apply the CEO Review Gate protocol:

*On RELOOP: return to `builder` (max 3 iterations)*

## Step: Evaluator
## Step: Eval


```bash
factory agent evaluator --task "Run eval and report results." --project "$PROJECT_PATH" --timeout 300
factory eval "$PROJECT_PATH"
```

### Gate — Precheck (Automated)
Expand Down
2 changes: 1 addition & 1 deletion skills/workflow-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ factory init $PROJECT_PATH


```bash
factory agent evaluator --task "Run eval and report results." --project "$PROJECT_PATH" --timeout 300
factory eval "$PROJECT_PATH"
```

## Step: Commit
Expand Down
25 changes: 9 additions & 16 deletions tests/test_qa_delegation.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Verifies that:
- The QA prompt covers all 3 verification sections
- The CEO prompt references skill-based routing (mode sections moved to SKILL.md)
- Generated workflow skills delegate eval to QA Agent, not direct factory eval
- Generated workflow skills do not reference nonexistent agent roles
- Builder precedes Evaluator in generated workflow skills (graph ordering)
- Event-based flow validation detects Builder→QA sequencing
"""
Expand Down Expand Up @@ -91,24 +91,17 @@ def test_ceo_prompt_references_skill_routing(
"CEO prompt must reference SKILL.md files"
)

def test_workflow_skills_delegate_eval_to_agents(self) -> None:
"""Generated workflow skills must not contain standalone factory eval calls."""
for skill_dir in SKILLS_DIR.glob("workflow-*"):
skill_path = skill_dir / "SKILL.md"
def test_workflow_skills_use_valid_agent_roles(self) -> None:
"""Workflow skills must not reference nonexistent agent roles."""
invalid_roles = ['factory agent evaluator', 'factory agent reviewer']
for skill_dir in SKILLS_DIR.glob('workflow-*'):
skill_path = skill_dir / 'SKILL.md'
if not skill_path.exists():
continue
content = skill_path.read_text()
for match in re.finditer(r"factory eval", content):
pos = match.start()
preceding = content[:pos]
last_agent_task = preceding.rfind('factory agent')
last_code_block_end = preceding.rfind('```\n')
if last_agent_task > last_code_block_end:
continue
context = content[max(0, pos - 80):pos + 40]
pytest.fail(
f"Direct 'factory eval' in {skill_path.name} outside "
f"agent task. Context: ...{context}..."
for role in invalid_roles:
assert role not in content, (
f'Invalid agent role in {skill_path.name}: {role}'
)


Expand Down
Loading