Propagate --no-github flag to sub-agents via env var + prompt injection by RobotSail · Pull Request #789 · akashgit/remote-factory

RobotSail · 2026-06-25T21:29:32Z

Closes #787

Changes

factory/cli.py: Set FACTORY_NO_GITHUB=1 env var in cmd_ceo() and cmd_run() when --no-github is passed, so the flag propagates to all subprocess environments
factory/agents/runner.py: In invoke_agent(), after resolving the prompt, check FACTORY_NO_GITHUB env var and append a "GitHub Disabled" directive instructing sub-agents to skip all gh CLI commands and GitHub operations
tests/test_agents.py: Added TestNoGithubPropagation test class with 4 tests verifying prompt injection when env var is set/unset and env var propagation behavior

Set FACTORY_NO_GITHUB=1 env var in cmd_ceo() and cmd_run() when --no-github is passed. In invoke_agent(), check this env var and append a "GitHub Disabled" directive to the agent prompt, instructing sub-agents to skip all gh CLI commands and GitHub operations. Closes akashgit#787 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-06-25T21:32:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (8463ba8) to head (733fb15).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #789   +/-   ##
=======================================
  Coverage   86.78%   86.79%           
=======================================
  Files          80       80           
  Lines       12134    12140    +6     
=======================================
+ Hits        10531    10537    +6     
  Misses       1603     1603

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

RobotSail · 2026-06-25T21:42:49Z

✅ Factory Review: KEEP

Verdict: KEEP
Reason:

Posted by Factory CEO

RobotSail · 2026-06-26T00:33:59Z

CEO Review — Independent Assessment

PR Summary

This PR fixes #787 by propagating the --no-github flag to all sub-agents via two mechanisms:

Env var propagation (factory/cli.py): Sets FACTORY_NO_GITHUB=1 in both cmd_ceo and cmd_run entry points so all child processes inherit it.
Prompt injection (factory/agents/runner.py): After resolving the agent prompt in invoke_agent(), checks the env var and appends a ## GitHub Disabled directive — ensuring every agent (Builder, QA, Researcher, etc.) receives the instruction in its prompt without modifying individual .md prompt files.

What I Verified

Read the full diff: 3 files changed, all within expected scope
No modifications to agent prompt .md files (correct — injection at runner level is cleaner)
Env var check uses strict == "1" comparison, avoiding false positives from empty strings or "0"
The injection point is after resolve_prompt() and playbook injection, so it stacks correctly with existing prompt composition
Confirmed the Claude runner copies os.environ to subprocess env (claude.py:116), so the env var chain is unbroken from CLI → CEO → Builder
The existing CEO task string injection at cli.py:3501-3511 is kept as belt-and-suspenders — intentional redundancy

QA Results

Tests: 2626 passed, 0 failed, 12 skipped
Lint: Clean
Type check: Clean
New tests: 4 added in TestNoGithubPropagation — positive/negative injection tests + env var behavior
Adversarial: 4/4 feature scenarios, 5/5 edge cases passed

Notes

Tests 3-4 (test_env_var_propagates_to_subprocess_env, test_env_var_absent_by_default) are trivially testing Python's os.environ semantics rather than the actual cmd_ceo/cmd_run code paths. Not harmful, but a future improvement could test the CLI entry points directly.
The directive text in runner.py differs slightly from the CEO task text in cli.py:3503-3513 — could be unified for consistency but not required.

Verdict: KEEP

github-actions · 2026-06-26T01:36:29Z

Benchmark Results

swebench

Field	Value
Benchmark	swebench
Instance	`sympy__sympy-20590`
Result	✅ RESOLVED
Score	1
Duration	152s

Full JSON

{
  "benchmark": "swebench",
  "instance_id": "sympy__sympy-20590",
  "solver": "claude-code",
  "passed": 1,
  "total": 1,
  "score": 1,
  "resolved": true,
  "duration_seconds": 152,
  "status": "success",
  "timestamp": "20260626T005914Z",
  "details": {
    "solver": "claude-code",
    "cost_usd": 0.61376175,
    "input_tokens": 24,
    "output_tokens": 1977,
    "cache_read_tokens": 446961,
    "cache_creation_tokens": 53917
  }
}

featurebench

Field	Value
Benchmark	featurebench
Instance	`pypa__packaging.013f3b03.test_metadata.e00b5801.lv1`
Result	❌ NOT RESOLVED
Score	0
Duration	1219s

Full JSON

{
  "benchmark": "featurebench",
  "instance_id": "pypa__packaging.013f3b03.test_metadata.e00b5801.lv1",
  "solver": "factory",
  "passed": 0,
  "total": 1,
  "score": 0,
  "resolved": false,
  "duration_seconds": 1219,
  "status": "success",
  "timestamp": "20260626T005916Z",
  "details": {
    "pass_rate": 0,
    "solver": "factory",
    "cost_usd": 6.452506050000002,
    "input_tokens": 301,
    "output_tokens": 39428,
    "cache_read_tokens": 7080940,
    "cache_creation_tokens": 0
  }
}

programbench

Field	Value
Benchmark	programbench
Instance	`abishekvashok__cmatrix.5c082c6`
Result	❌ NOT RESOLVED
Score	0
Duration	316s

Full JSON

{
  "benchmark": "programbench",
  "instance_id": "abishekvashok__cmatrix.5c082c6",
  "solver": "claude-code",
  "passed": 0,
  "total": 769,
  "score": 0,
  "resolved": false,
  "duration_seconds": 316,
  "status": "success",
  "timestamp": "20260626T005916Z",
  "details": {
    "solver": "claude-code",
    "cost_usd": 0.8677075000000001,
    "input_tokens": 32,
    "output_tokens": 5251,
    "cache_read_tokens": 938020,
    "cache_creation_tokens": 42762
  }
}

terminalbench

Field	Value
Benchmark	terminalbench
Instance	`fix-git`
Result	✅ RESOLVED
Score	1
Duration	71s

Full JSON

{
  "benchmark": "terminalbench",
  "instance_id": "fix-git",
  "solver": "claude-code",
  "passed": 1,
  "total": 1,
  "score": 1,
  "resolved": true,
  "duration_seconds": 71,
  "status": "success",
  "timestamp": "20260626T005916Z",
  "details": {
    "solver": "claude-code",
    "cost_usd": 0.552441,
    "input_tokens": 15,
    "output_tokens": 1399,
    "cache_read_tokens": 0,
    "cache_creation_tokens": 0
  }
}

terminalbench

Field	Value
Benchmark	terminalbench
Instance	`fix-git`
Result	✅ RESOLVED
Score	1
Duration	792s

Full JSON

{
  "benchmark": "terminalbench",
  "instance_id": "fix-git",
  "solver": "factory",
  "passed": 1,
  "total": 1,
  "score": 1,
  "resolved": true,
  "duration_seconds": 792,
  "status": "success",
  "timestamp": "20260626T005918Z",
  "details": {
    "solver": "factory",
    "cost_usd": 0,
    "input_tokens": 0,
    "output_tokens": 0,
    "cache_read_tokens": 0,
    "cache_creation_tokens": 0
  }
}

featurebench

Field	Value
Benchmark	featurebench
Instance	`pypa__packaging.013f3b03.test_metadata.e00b5801.lv1`
Result	❌ NOT RESOLVED
Score	0
Duration	438s

Full JSON

{
  "benchmark": "featurebench",
  "instance_id": "pypa__packaging.013f3b03.test_metadata.e00b5801.lv1",
  "solver": "claude-code",
  "passed": 0,
  "total": 1,
  "score": 0,
  "resolved": false,
  "duration_seconds": 438,
  "status": "success",
  "timestamp": "20260626T005919Z",
  "details": {
    "pass_rate": 0,
    "solver": "claude-code",
    "cost_usd": 1.3952295,
    "input_tokens": 22,
    "output_tokens": 13221,
    "cache_read_tokens": 905434,
    "cache_creation_tokens": 88642
  }
}

swebench

Field	Value
Benchmark	swebench
Instance	`sympy__sympy-20590`
Result	✅ RESOLVED
Score	1
Duration	2216s

Full JSON

{
  "benchmark": "swebench",
  "instance_id": "sympy__sympy-20590",
  "solver": "factory",
  "passed": 1,
  "total": 1,
  "score": 1,
  "resolved": true,
  "duration_seconds": 2216,
  "status": "success",
  "timestamp": "20260626T005919Z",
  "details": {
    "solver": "factory",
    "cost_usd": 14.172166800000003,
    "input_tokens": 652,
    "output_tokens": 92415,
    "cache_read_tokens": 15765593,
    "cache_creation_tokens": 0
  }
}

programbench

Field	Value
Benchmark	programbench
Instance	`abishekvashok__cmatrix.5c082c6`
Result	❌ NOT RESOLVED
Score	0
Duration	1058s

Full JSON

{
  "benchmark": "programbench",
  "instance_id": "abishekvashok__cmatrix.5c082c6",
  "solver": "factory",
  "passed": 0,
  "total": 769,
  "score": 0,
  "resolved": false,
  "duration_seconds": 1058,
  "status": "success",
  "timestamp": "20260626T005920Z",
  "details": {
    "solver": "factory",
    "cost_usd": 6.039037199999999,
    "input_tokens": 221,
    "output_tokens": 31244,
    "cache_read_tokens": 6124069,
    "cache_creation_tokens": 0
  }
}

Overall: 50.0% accuracy (= +0.0% vs main) | $4.30 avg cost | 783s avg duration

Comparison vs Main

Benchmark	Solver	Score	vs Main	Cost	vs Main	Duration	vs Main
swebench	claude-code	1	+0.0% =	$0.61	= $0.00	152s	= 0s
featurebench	factory	0	= 0%	$6.45	= $0.00	1219s	= 0s
programbench	claude-code	0	= 0%	$0.87	= $0.00	316s	= 0s
terminalbench	claude-code	1	+0.0% =	$0.55	= $0.00	71s	= 0s
terminalbench	factory	1	+0.0% =	N/A	N/A	792s	= 0s
featurebench	claude-code	0	= 0%	$1.40	= $0.00	438s	= 0s
swebench	factory	1	+0.0% =	$14.17	= $0.00	2216s	= 0s
programbench	factory	0	= 0%	$6.04	= $0.00	1058s	= 0s

Baseline: latest main branch run per benchmark+solver. ▲ = improvement, ▼ = regression.

How these benchmarks run

Factory solver: Runs factory ceo . --headless --no-github --prompt <task> — full factory loop (research → strategize → build → review). See benchmarks/run-swebench.sh.

Claude Code solver: Runs claude -p <task> --model claude-opus-4-6[1m] --max-turns 200 — single-shot solve. Same script files as factory, switched via --solver flag.

TerminalBench: Uses Harbor framework. Factory runs via custom factory_harbor_agent.py, Claude Code uses Harbor's built-in agent.

ProgramBench: Both solvers run inside a Docker cleanroom container. See benchmarks/run-programbench.sh.

Config: claude-opus-4-6[1m], effort=XHIGH, thinking=128K tokens. See benchmarks/lib.sh.

osilkin98 approved these changes Jun 26, 2026

View reviewed changes

osilkin98 merged commit 9227f47 into akashgit:main Jun 26, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate --no-github flag to sub-agents via env var + prompt injection#789

Propagate --no-github flag to sub-agents via env var + prompt injection#789
osilkin98 merged 1 commit into
akashgit:mainfrom
RobotSail:factory/run-953e2477

RobotSail commented Jun 25, 2026

Uh oh!

codecov Bot commented Jun 25, 2026

Uh oh!

RobotSail commented Jun 25, 2026

Uh oh!

RobotSail commented Jun 26, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RobotSail commented Jun 25, 2026

Changes

Uh oh!

codecov Bot commented Jun 25, 2026

Codecov Report

Uh oh!

RobotSail commented Jun 25, 2026

✅ Factory Review: KEEP

Uh oh!

RobotSail commented Jun 26, 2026

CEO Review — Independent Assessment

PR Summary

What I Verified

QA Results

Notes

Verdict: KEEP

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026

Benchmark Results

swebench

featurebench

programbench

terminalbench

terminalbench

featurebench

swebench

programbench

Comparison vs Main

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants