Skip to content

docs: add expected-behavior specs for all 9 factory agents#790

Open
gx-ai-architect wants to merge 4 commits into
mainfrom
expected-behavior-docs
Open

docs: add expected-behavior specs for all 9 factory agents#790
gx-ai-architect wants to merge 4 commits into
mainfrom
expected-behavior-docs

Conversation

@gx-ai-architect

Copy link
Copy Markdown
Collaborator

Summary

Closes #788

Creates detailed expected-behavior.md files for all 9 factory agents. These docs serve as diagnostic baselines — when an agent misbehaves, compare its execution trace against the expected behavior doc to pinpoint exactly where it diverged.

Files created (9 docs, ~2,500 lines total)

File Lines Workflows covered
docs/expected-behaviors/ceo.md 418 All 8 (Build, Design, Discover, Review, Improve, Research, Refine, Meta)
docs/expected-behaviors/researcher.md 302 Build, Improve, Research, Meta
docs/expected-behaviors/strategist.md 354 Build, Design, Improve, Research, Meta
docs/expected-behaviors/builder.md 315 Build, Design, Improve, Research, Refine, Meta
docs/expected-behaviors/qa.md 301 Improve, Research, Refine
docs/expected-behaviors/archivist.md 382 Build, Design, Improve, Research, Refine, Meta
docs/expected-behaviors/failure-analyst.md 136 Research
docs/expected-behaviors/refiner.md 160 Refine
docs/expected-behaviors/profiler.md 145 Cross-cutting

Each doc contains

  1. Identity & Responsibility — what the agent IS vs IS NOT, relationship to other agents
  2. Per-Workflow Behavior — for each workflow: phase, inputs, ordered steps, outputs, handoffs
  3. Invariants — hard rules (MUST/NEVER/ALWAYS) to check first in any trace
  4. Constraints & Forbidden Actions — exhaustive list of what the agent must not do
  5. Failure Modes & Diagnostic Signals — table format with trace signals for each known failure
  6. Interaction Protocol — output format, CEO review criteria

Verified against issue #783

The CEO doc explicitly covers the build-mode respawn loop bug (#783) with 3 distinct trace signals:

  • results.tsv header-only after build phases completed
  • Respawn input reporting "0/N phases complete" when git log shows N commits
  • current.md overwritten outside Strategist phase

QA verification

All docs verified against actual agent prompts and playbooks:

  • 9/9 accuracy (all claims match actual prompt files)
  • 8/9 completeness (builder.md Design workflow added in fix commit)
  • Cross-agent consistency verified (handoff descriptions match across docs)
  • Evaluator/QA naming ambiguity documented with clarification note

🤖 Generated with Claude Code

xukai92 and others added 4 commits June 25, 2026 21:40
…ler agents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Sentrux Quality Report

Absolute

Scanning ....
[scan] git ls-files: 307 total, 295 kept, 12 dropped (ext:12, meta:0, big:0)
[build_project_map] 295 files, 53 unique dirs, 49 cache misses, 2.0ms
[resolve] 434 resolved, 748 unresolved (of 1182 total specs)
[resolve_imports] project_map 2.1ms, suffix_idx 0.6ms, suffix_resolve 7.0ms, total 9.7ms
[build_graphs] 295 files | maps 0.8ms, imports 9.7ms, calls+inherit 2.5ms, total 13.1ms | 433 import, 4356 call, 0 inherit edges
sentrux check — 2 rules checked

Quality: 4706

✗ [Error] max_cc: 5 function(s) exceed max cyclomatic complexity of 30
    factory/cli.py:cmd_ceo (cc=78)
    factory/study.py:study_project_local (cc=43)
    factory/cli.py:_welcome_wizard (cc=39)
    factory/cli.py:cmd_run (cc=36)
    factory/workflow/validation.py:validate_workflow (cc=31)

✗ 1 violation(s) found

Diff (vs base branch)

Scanning ....
[scan] git ls-files: 307 total, 295 kept, 12 dropped (ext:12, meta:0, big:0)
[build_project_map] 295 files, 53 unique dirs, 49 cache misses, 2.1ms
[resolve] 434 resolved, 748 unresolved (of 1182 total specs)
[resolve_imports] project_map 2.2ms, suffix_idx 0.6ms, suffix_resolve 7.1ms, total 9.8ms
[build_graphs] 295 files | maps 0.8ms, imports 9.9ms, calls+inherit 2.5ms, total 13.2ms | 433 import, 4356 call, 0 inherit edges
sentrux gate — structural regression check

Quality:      4706 -> 4706
Coupling:     0.75 → 0.75
Cycles:       4 → 4
God files:    0 → 0

Distance from Main Sequence: 0.35

✓ No degradation detected

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.78%. Comparing base (04ce092) to head (bf7a4cf).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #790   +/-   ##
=======================================
  Coverage   86.78%   86.78%           
=======================================
  Files          80       80           
  Lines       12134    12134           
=======================================
  Hits        10531    10531           
  Misses       1603     1603           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Expected Behavior .md for each agent in the factory, including detailed responsibility and artifacts and process it follows, and constraints.

2 participants