Skip to content

Persistent agent session UI with orchestration graph visualization#644

Open
gx-ai-architect wants to merge 10 commits into
mainfrom
feat/643-session-persistence
Open

Persistent agent session UI with orchestration graph visualization#644
gx-ai-architect wants to merge 10 commits into
mainfrom
feat/643-session-persistence

Conversation

@gx-ai-architect

@gx-ai-architect gx-ai-architect commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implements #642 — a persistent agent session management UI with graphical orchestration visualization for the factory dashboard.

  • SQLite session persistence (factory/sessions.py) — captures every agent invocation in a local sessions.db with hierarchical parent→child linking, alongside existing Langfuse telemetry (additive, not replacing)
  • Session API endpoints — 6 new routes in the dashboard: cycles list, cycle detail, session detail, graph data, interactive resume, and sessions page
  • Session browser UI (sessions.html) — two-column layout with cycle list + chat-style conversation viewer, using marked.js for markdown and Prism.js for syntax highlighting
  • Agent orchestration graph — interactive DAG visualization using Cytoscape.js + dagre layout, showing agent spawning relationships, sequential ordering, role-colored nodes, click-to-navigate, and hover tooltips

Key design decisions

  • Data-driven, not hardcoded: role colors are hash-based, graph structure is derived from session hierarchy + event timeline — no mode-specific templates. Adding/removing agent roles (e.g., QA Agent: consolidate all post-Builder verification into one agent #630 QA agent) requires zero UI code changes.
  • SQLite, not Langfuse-dependent: the UI works standalone with zero external dependencies. Session capture runs alongside Langfuse telemetry via the same try/except pattern.
  • Vanilla HTML/JS: matches the existing dashboard pattern — no React/Vue build step.

Verified

  • 52+ tests across 3 test files (session CRUD, API endpoints, graph API)
  • E2E verified with a real snake-game build cycle (CEO → researcher → strategist → builder)
  • Interactive session resume preserves agent identity and context
  • All conversation item types captured (messages, tool calls, tool outputs, thinking blocks)
  • Graph correctly shows 4 nodes, 3 spawned edges, 2 sequential edges

Test plan

uv run pytest tests/test_sessions.py tests/test_dashboard_sessions.py tests/test_graph_api.py -v

Manual E2E:

# Start dashboard
uv run factory dashboard --port 8420 --projects-dir /path/to/projects

# Open http://localhost:8420/sessions/<project-name>
# - Left panel: cycle list with role badges and cost stats
# - Right panel: click a cycle → conversation viewer with collapsible tool calls
# - Toggle to Graph view → interactive DAG with click-to-navigate

Closes #642, closes #643, closes #645, closes #646

🤖 Generated with Claude Code

Restore and adapt the session capture layer from PR #569 to run alongside
existing Langfuse telemetry. Agent invocations are now recorded in a local
SQLite database (.factory/sessions.db) with parent/child hierarchy support.

- Add factory/sessions.py with full CRUD, transcript ingestion, and backfill
- Wire _begin_session_safe/_complete_session_safe into runner.py invoke_agent
- Thread FACTORY_SESSION_ID env var for parent→child session linking
- Add cycle session tracking in begin_cycle_session/complete_cycle_session
- Add `factory sessions <path>` CLI subcommand with --cycle and --role filters
- Port 29 unit tests covering session CRUD, hierarchy, and transcript parsing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.47513% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.48%. Comparing base (13a3e43) to head (5cb0b5a).
⚠️ Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
factory/dashboard/app.py 77.38% 38 Missing ⚠️
factory/sessions.py 89.11% 37 Missing ⚠️
factory/cli.py 26.31% 28 Missing ⚠️
factory/agents/runner.py 86.48% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #644      +/-   ##
==========================================
- Coverage   88.23%   87.48%   -0.75%     
==========================================
  Files          70       71       +1     
  Lines       10570    11229     +659     
==========================================
+ Hits         9326     9824     +498     
- Misses       1244     1405     +161     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add 5 session API endpoints to the dashboard (cycles list, cycle detail,
session detail, interactive message, sessions page) and a sessions.html
page with two-column cycle browser, chat-style conversation viewer,
markdown rendering via marked.js, and syntax highlighting via Prism.js.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gx-ai-architect gx-ai-architect changed the title Phase 1: Session persistence layer — SQLite session capture for agent invocations Session persistence layer + API endpoints + browser UI (#643, #645) Jun 21, 2026
…646)

Add interactive DAG visualization of agent orchestration to sessions.html.
Includes graph API endpoint, dagre layout, click-to-navigate, hover
tooltips, and hash-based role coloring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gx-ai-architect

Copy link
Copy Markdown
Collaborator Author

Phase 3: Agent orchestration graph visualization (#646)

Added interactive DAG visualization of agent orchestration to sessions.html:

Changes

  • factory/dashboard/app.py: Added GET /api/projects/{name}/cycles/{cycle_id}/graph endpoint and _build_cycle_graph() helper that derives graph from session hierarchy + events.jsonl
  • factory/dashboard/static/sessions.html: Added Cytoscape.js + dagre CDN scripts, Conversation/Graph toggle, full graph rendering with dagre layout, click-to-navigate, hover tooltips
  • tests/test_graph_api.py: 9 tests covering node count, spawned/sequential edges, timeline, unknown roles, empty cycles, failed status

Key design decisions

  • Node colors via hash function (same roleColor() as conversation view) — roster-agnostic
  • No hardcoded flow templates — graph is purely data-driven from session hierarchy
  • Dagre top-to-bottom layout for the DAG
  • Click node → switch to conversation view, scroll to agent section
  • Node size proportional to duration, border style indicates status (solid=completed, dashed=running, dotted=failed)

All 23 tests pass, lint clean.

@gx-ai-architect gx-ai-architect marked this pull request as ready for review June 21, 2026 15:30
@gx-ai-architect gx-ai-architect changed the title Session persistence layer + API endpoints + browser UI (#643, #645) Persistent agent session UI with orchestration graph visualization Jun 21, 2026
xukai92 and others added 7 commits June 21, 2026 18:36
The send_session_message endpoint was resuming Claude sessions without
the agent's role prompt, causing resumed agents to lose their factory
specialist identity. Now resolves the session's agent_role to its prompt
file, appends any evolved playbook, writes to a temp file, and passes
via --system-prompt-file for reliable large-prompt injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bug 1: Remove hardcoded 8-phase linear pipeline bar from dashboard homepage
that misrepresented the dynamic agent graph workflow. Replace with prominent
Sessions button on project cards and session summary stats (cycle count,
total sessions).

Bug 2: Fix CEO session having zero items, no claude_session_id, and stuck
in 'running' status. Capture CLAUDE_CODE_SESSION_ID env var in
begin_cycle_session() and pass it through to the session row. Pass metadata
with session_id in complete_cycle_session() to trigger transcript ingestion.
Update _discover_claude_session_id() to handle CEO sessions by excluding
known child agent transcripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The session lifecycle functions were defined in runner.py but never
called from cmd_ceo(), so CEO sessions were never persisted to SQLite
and child agents had no parent_id to link to. Wrap both the headless
and interactive code paths with begin/complete_cycle_session calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…etion

The CEO session never gets a claude_session_id because CLAUDE_CODE_SESSION_ID
isn't set in the parent factory process — it only exists inside the spawned
Claude Code subprocess. This fix makes complete_session() discover the
claude_session_id by scanning transcript files when none was provided,
and also prevents COALESCE from overwriting a previously-discovered ID
with NULL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…k in _discover_claude_session_id

The 'factory: ' prefix check was incorrectly matching CEO session titles
like 'factory: i want build snake game', adding them to matched_child_ids
and preventing the CEO fallback path from finding them. Now requires a '/'
in the name to distinguish child agents ('factory: project/role') from CEO
sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… cross-project false matches

The broad fallback in _discover_claude_session_id scanned ALL Claude Code
project directories when the exact one had no candidates. This caused false
matches — e.g., picking up transcripts from the currently running CEO process
for a different project instead of the target worktree's CEO.

Changes:
- Remove the all-directory fallback scan (lines 272-278) — only search the
  exact project directory derived from project_path
- CEO fallback now picks by timestamp proximity to completion time instead
  of just "most recent", preventing stale transcript matches

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r on errors

Backend: send_session_message now checks subprocess returncode and returns
HTTP 410 with error detail when claude --resume fails.

Frontend: sendMessage shows error messages inline in the chat area instead
of alert() dialogs, and always re-enables the input controls via finally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants