Propagate --no-github flag to sub-agents via env var + prompt injection#789
Conversation
Set FACTORY_NO_GITHUB=1 env var in cmd_ceo() and cmd_run() when --no-github is passed. In invoke_agent(), check this env var and append a "GitHub Disabled" directive to the agent prompt, instructing sub-agents to skip all gh CLI commands and GitHub operations. Closes akashgit#787 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #789 +/- ##
=======================================
Coverage 86.78% 86.79%
=======================================
Files 80 80
Lines 12134 12140 +6
=======================================
+ Hits 10531 10537 +6
Misses 1603 1603 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
✅ Factory Review: KEEPVerdict: KEEP Posted by Factory CEO |
CEO Review — Independent AssessmentPR SummaryThis PR fixes #787 by propagating the
What I Verified
QA Results
Notes
Verdict: KEEP |
Benchmark Resultsswebench
Full JSON{
"benchmark": "swebench",
"instance_id": "sympy__sympy-20590",
"solver": "claude-code",
"passed": 1,
"total": 1,
"score": 1,
"resolved": true,
"duration_seconds": 152,
"status": "success",
"timestamp": "20260626T005914Z",
"details": {
"solver": "claude-code",
"cost_usd": 0.61376175,
"input_tokens": 24,
"output_tokens": 1977,
"cache_read_tokens": 446961,
"cache_creation_tokens": 53917
}
}featurebench
Full JSON{
"benchmark": "featurebench",
"instance_id": "pypa__packaging.013f3b03.test_metadata.e00b5801.lv1",
"solver": "factory",
"passed": 0,
"total": 1,
"score": 0,
"resolved": false,
"duration_seconds": 1219,
"status": "success",
"timestamp": "20260626T005916Z",
"details": {
"pass_rate": 0,
"solver": "factory",
"cost_usd": 6.452506050000002,
"input_tokens": 301,
"output_tokens": 39428,
"cache_read_tokens": 7080940,
"cache_creation_tokens": 0
}
}programbench
Full JSON{
"benchmark": "programbench",
"instance_id": "abishekvashok__cmatrix.5c082c6",
"solver": "claude-code",
"passed": 0,
"total": 769,
"score": 0,
"resolved": false,
"duration_seconds": 316,
"status": "success",
"timestamp": "20260626T005916Z",
"details": {
"solver": "claude-code",
"cost_usd": 0.8677075000000001,
"input_tokens": 32,
"output_tokens": 5251,
"cache_read_tokens": 938020,
"cache_creation_tokens": 42762
}
}terminalbench
Full JSON{
"benchmark": "terminalbench",
"instance_id": "fix-git",
"solver": "claude-code",
"passed": 1,
"total": 1,
"score": 1,
"resolved": true,
"duration_seconds": 71,
"status": "success",
"timestamp": "20260626T005916Z",
"details": {
"solver": "claude-code",
"cost_usd": 0.552441,
"input_tokens": 15,
"output_tokens": 1399,
"cache_read_tokens": 0,
"cache_creation_tokens": 0
}
}terminalbench
Full JSON{
"benchmark": "terminalbench",
"instance_id": "fix-git",
"solver": "factory",
"passed": 1,
"total": 1,
"score": 1,
"resolved": true,
"duration_seconds": 792,
"status": "success",
"timestamp": "20260626T005918Z",
"details": {
"solver": "factory",
"cost_usd": 0,
"input_tokens": 0,
"output_tokens": 0,
"cache_read_tokens": 0,
"cache_creation_tokens": 0
}
}featurebench
Full JSON{
"benchmark": "featurebench",
"instance_id": "pypa__packaging.013f3b03.test_metadata.e00b5801.lv1",
"solver": "claude-code",
"passed": 0,
"total": 1,
"score": 0,
"resolved": false,
"duration_seconds": 438,
"status": "success",
"timestamp": "20260626T005919Z",
"details": {
"pass_rate": 0,
"solver": "claude-code",
"cost_usd": 1.3952295,
"input_tokens": 22,
"output_tokens": 13221,
"cache_read_tokens": 905434,
"cache_creation_tokens": 88642
}
}swebench
Full JSON{
"benchmark": "swebench",
"instance_id": "sympy__sympy-20590",
"solver": "factory",
"passed": 1,
"total": 1,
"score": 1,
"resolved": true,
"duration_seconds": 2216,
"status": "success",
"timestamp": "20260626T005919Z",
"details": {
"solver": "factory",
"cost_usd": 14.172166800000003,
"input_tokens": 652,
"output_tokens": 92415,
"cache_read_tokens": 15765593,
"cache_creation_tokens": 0
}
}programbench
Full JSON{
"benchmark": "programbench",
"instance_id": "abishekvashok__cmatrix.5c082c6",
"solver": "factory",
"passed": 0,
"total": 769,
"score": 0,
"resolved": false,
"duration_seconds": 1058,
"status": "success",
"timestamp": "20260626T005920Z",
"details": {
"solver": "factory",
"cost_usd": 6.039037199999999,
"input_tokens": 221,
"output_tokens": 31244,
"cache_read_tokens": 6124069,
"cache_creation_tokens": 0
}
}Overall: 50.0% accuracy (= +0.0% vs main) | $4.30 avg cost | 783s avg duration Comparison vs Main
Baseline: latest main branch run per benchmark+solver. ▲ = improvement, ▼ = regression. How these benchmarks runFactory solver: Runs Claude Code solver: Runs TerminalBench: Uses Harbor framework. Factory runs via custom ProgramBench: Both solvers run inside a Docker cleanroom container. See Config: |
Closes #787
Changes
FACTORY_NO_GITHUB=1env var incmd_ceo()andcmd_run()when--no-githubis passed, so the flag propagates to all subprocess environmentsinvoke_agent(), after resolving the prompt, checkFACTORY_NO_GITHUBenv var and append a "GitHub Disabled" directive instructing sub-agents to skip allghCLI commands and GitHub operationsTestNoGithubPropagationtest class with 4 tests verifying prompt injection when env var is set/unset and env var propagation behavior