📊 Lockfile Statistics Analysis - January 19, 2026 #10699

2026-01-19T15:02:18Z

github-actions[bot]
bot Jan 19, 2026

Executive Summary

This comprehensive analysis examines 131 agentic workflow lock files (.lock.yml) in the githubnext/gh-aw repository, revealing patterns in trigger usage, safe output mechanisms, structural characteristics, and MCP server integrations.

Total Lock Files: 131
Total Size: 9.62 MB
Average File Size: 71.7 KB
Analysis Date: January 19, 2026

Key Findings:

97 workflows (74%) use scheduled triggers for automated operations
116 workflows (89%) support manual workflow_dispatch triggering
All 127 workflows with safe outputs use multiple output types for flexibility
Average workflow complexity: 8 jobs with 70 steps each
GitHub MCP server integration found in 26% of workflows

Full Report

File Size Distribution

Size Range	Count	Percentage
< 10 KB	0	0%
10-50 KB	7	5.3%
50-100 KB	116	88.5%
> 100 KB	8	6.1%

Statistics:

Smallest: codex-github-remote-mcp-test.lock.yml (21.9 KB)
Largest: copilot-session-insights.lock.yml (115.2 KB)
Median Size: ~71 KB

The vast majority (88.5%) of lock files fall within the 50-100 KB range, indicating a consistent and well-optimized workflow structure across the repository.

Trigger Analysis

Most Popular Triggers

Trigger Type	Count	Percentage	Usage Pattern
workflow_dispatch	116	88.5%	Manual triggering capability
schedule	97	74.0%	Automated periodic execution
issues	14	10.7%	Issue lifecycle events
issue_comment	14	10.7%	Issue comment interactions
pull_request	10	7.6%	PR lifecycle events
discussion_comment	5	3.8%	Discussion interactions
discussion	4	3.1%	Discussion lifecycle events
workflow_run	2	1.5%	Triggered by other workflows
release	1	0.8%	Release events
push	1	0.8%	Code push events

Common Trigger Combinations

The most prevalent trigger combinations reveal workflow activation patterns:

schedule + workflow_dispatch: 90 workflows (68.7%)
- Pattern: Daily/periodic automation with manual override capability
- Use case: Reports, analysis, maintenance tasks
workflow_dispatch only: 12 workflows (9.2%)
- Pattern: Fully manual, on-demand execution
- Use case: Interactive tools, one-off operations
Multi-event reactive workflows: 3 workflows (2.3%)
- Pattern: discussion + discussion_comment + issue_comment + issues + pull_request
- Use case: Comprehensive event-driven agents
pull_request + schedule + workflow_dispatch: 4 workflows (3.1%)
- Pattern: Both reactive (PR events) and proactive (scheduled)
- Use case: PR analysis with periodic batch processing

Schedule Patterns

Analysis of the 97 scheduled workflows reveals:

Peak Scheduling Times (Weekdays Only):

2 PM UTC (14:00): 4 workflows - Most popular time slot
1 PM UTC (13:00): 4 workflows
11 AM UTC (11:00): 4 workflows
3 PM UTC (15:00): 2 workflows
9 AM UTC (09:00): 2 workflows

Common Patterns:

Weekday-only schedules (* * 1-5): 23 workflows (23.7%)
- Aligns with business hours for operational tasks
Daily execution: 67 workflows run at least once per day
Weekly execution: 4 workflows run weekly (e.g., Sunday, Monday, Wednesday)
Hourly/frequent: 3 workflows run multiple times per day (*/30, */1, */4, */6, */12)

Special Schedules:

Monthly: 1 workflow runs on the 1st of each month (0 9 1 * *)
Bi-daily: 1 workflow runs every 6 hours (2 */6 * * *)
Half-hourly: 1 workflow runs every 30 minutes (*/30 * * * *)
Hourly: 1 workflow runs every hour (49 */1 * * *)

Safe Outputs Analysis

Safe Output Types Distribution

Safe outputs enable workflows to produce structured outputs for downstream processing. Nearly all workflows (127/131 = 97%) use safe output mechanisms.

Type	Count	Percentage	Purpose
missing_tool	126	96.2%	Report unavailable tools/capabilities
noop	126	96.2%	Signal completion with no action needed
missing_data	125	95.4%	Report missing data/information
create_issue	104	79.4%	Create GitHub issues
create_discussion	101	77.1%	Create repository discussions
create_pr	49	37.4%	Create pull requests
add_comment	42	32.1%	Add comments to issues/PRs
create_pr_review	27	20.6%	Add PR review comments
update_issue	8	6.1%	Update existing issues

Key Insights

Error Handling is Universal: 96%+ of workflows include missing_tool, noop, and missing_data outputs, indicating robust error handling and transparency mechanisms.
Dual Output Capability: 77-79% of workflows support both create_issue and create_discussion, providing flexibility in how results are communicated.
PR Workflow Support: 37% support creating PRs, 21% support PR review comments - indicating significant code change automation.
Multiple Output Types: 126 workflows (96%) use multiple safe output types simultaneously, enabling workflows to adapt output based on context and results.

Example Workflows Using Safe Outputs:

agent-performance-analyzer.lock.yml
daily-news.lock.yml
security-review.lock.yml
copilot-pr-nlp-analysis.lock.yml
technical-doc-writer.lock.yml

Structural Characteristics

Job Complexity

Metric	Value
Average Jobs per Workflow	8.09 jobs
Average Steps per Job	69.67 steps
Maximum Steps in Single Job	90 steps
Minimum Steps	31 steps

The high step count (average 70 steps per job) reflects the comprehensive nature of agentic workflows, which include:

Pre-activation checks and firewall validation
Agent execution with extensive tool access
Post-processing for safe outputs
Asset management and cleanup

Typical Lock File Structure

Based on statistical analysis, a typical .lock.yml file has:

Size: ~71.7 KB
Jobs: ~8 jobs
  - activation (pre-checks)
  - agent (main execution)
  - post-processing jobs
Steps per Job: ~70 steps
Permissions:
  - contents: read (97%)
  - issues: write (14.5%)
  - discussions: write (13.7%)
  - pull-requests: write (13.0%)
Triggers:
  - schedule (weekday mornings)
  - workflow_dispatch (manual override)
Timeout: 15-20 minutes average

Permission Patterns

Most Common Permissions:

Permission	Count	Percentage	Access Type
contents: read	127	97.0%	Read
issues: write	19	14.5%	Write
discussions: write	18	13.7%	Write
pull-requests: write	17	13.0%	Write

Permission Distribution:

Read-only workflows: 127 workflows (97%) have contents: read
Write permissions: 54 permission grants are for write access
Minimal permissions: Most workflows follow principle of least privilege, requesting only necessary permissions

The high percentage of read-only content access reflects secure design - workflows primarily analyze and report rather than modify repository content directly.

Tool & MCP Patterns

Most Used MCP Servers

MCP Server	Count	Percentage	Purpose
github	34	26.0%	GitHub API integration
playwright	5	3.8%	Web automation/testing
arxiv	1	0.8%	Academic paper search
deepwiki	1	0.8%	Deep web search

GitHub MCP Server: The dominant external integration (26%) enables workflows to interact with GitHub's APIs for repository metadata, issues, PRs, discussions, and more beyond standard GitHub Actions capabilities.

Web Automation: Playwright MCP server (5 workflows) enables browser automation for testing documentation, analyzing web interfaces, or capturing screenshots.

Research Tools: arxiv and deepwiki servers indicate specialized workflows for academic research and deep web information gathering.

Timeout Configuration

Distribution

Timeout (minutes)	Occurrences	Percentage
10	171	33.6%
15	150	29.5%
20	141	27.7%
30	22	4.3%
45	11	2.2%
5	10	2.0%
60	2	0.4%
600	1	0.2%
12	1	0.2%

Statistics:

Average Timeout: 17.1 minutes
Most Common: 10 minutes (33.6% of timeout settings)
Maximum: 600 minutes (10 hours) - for extremely long-running analysis
Minimum: 5 minutes - for quick checks

The majority of workflows (90.8%) use timeouts between 10-20 minutes, indicating predictable execution times for most agentic tasks.

Interesting Findings

1. Universal Manual Override Capability

88.5% of workflows include workflow_dispatch triggers, enabling manual execution even for fully automated workflows. This design pattern provides operational flexibility for:

Testing and debugging workflow behavior
Running reports on-demand outside normal schedules
Responding to urgent issues immediately

2. Weekday-Focused Automation

The scheduling pattern strongly favors weekdays (Monday-Friday), with 23 workflows explicitly using 1-5 day filters. Peak times are business hours (9 AM - 4 PM UTC), suggesting these workflows are designed to support active development teams rather than run continuously.

3. Comprehensive Error Reporting

96%+ of workflows implement all three error-handling safe outputs (missing_tool, missing_data, noop). This indicates mature error handling practices where workflows transparently report why they couldn't complete tasks rather than silently failing.

4. Size Consistency Despite Variety

Despite 131 different workflows with varied purposes (security scanning, PR analysis, documentation, metrics collection), 88.5% fall within a narrow 50-100 KB size range. This consistency suggests:

Standardized workflow templates
Efficient compilation process
Optimized agent configurations

5. Low Push Trigger Usage

Only 1 workflow uses push triggers, while 10 use pull_request triggers. This reveals the repository's design philosophy: workflows are primarily reactive to explicit actions (PR creation, issue comments) or scheduled for periodic analysis, rather than triggering on every code change.

6. Minimal Cross-Workflow Dependencies

Only 2 workflows use workflow_run triggers (triggered by other workflows). This indicates workflows are largely independent, reducing cascading failures and simplifying debugging.

7. Adaptive Output Mechanisms

The dual capability for both issues and discussions in 77-79% of workflows suggests sophisticated logic for choosing appropriate output formats based on:

Severity of findings (issues for actionable items, discussions for reports)
Audience (developers vs. stakeholders)
Persistence requirements

Historical Trends

Baseline Analysis - This is the first comprehensive statistical analysis of the repository's lock files. Future analyses will compare against this baseline to track:

Growth in workflow count and diversity
Changes in average file sizes (optimization trends)
Shifts in trigger patterns (scheduled vs. reactive)
Evolution of safe output usage
New MCP server integrations

Saved for Future Comparison: Data stored at /tmp/gh-aw/cache-memory/history/2026-01-19.json

Recommendations

1. Optimize Timeout Configurations

With 33.6% of timeouts set to 10 minutes and an average of 17.1 minutes, consider:

Analyzing actual execution times to right-size timeouts
Reducing timeouts for consistently fast workflows (< 5 min actual) to fail faster
Increasing timeouts for workflows that occasionally hit limits

2. Consolidate Schedule Times

The distribution shows 4 workflows at 2 PM, 4 at 1 PM, and 4 at 11 AM UTC. Consider:

Staggering schedules to avoid resource contention
Grouping related workflows to run sequentially
Using workflow dependencies for related analyses

3. Expand MCP Server Usage

Only 26% use the GitHub MCP server despite all workflows operating on GitHub data. Consider:

Documenting GitHub MCP server benefits
Creating templates that include GitHub MCP by default
Exploring additional MCP servers for common tasks (databases, APIs, etc.)

4. Standardize Safe Output Categories

The analysis couldn't extract discussion categories reliably. Recommendation:

Standardize category naming conventions
Document which workflows use which categories
Create category guidelines for new workflows

5. Monitor Workflow Growth

With 131 workflows averaging 71 KB each:

Implement workflow consolidation reviews for overlapping functionality
Consider workflow composition patterns to share common steps
Monitor repository size impact and CI/CD performance

6. Enhance Documentation

Create a workflow catalog documenting:

Purpose and trigger patterns for each workflow
Expected execution time and resource usage
Safe output types and when they're used
Dependencies and interaction patterns

Methodology

Analysis Tools:

Bash scripts for file discovery and pattern extraction
Python for statistical analysis and aggregation
YAML parsing via text processing (grep, sed, awk)

Data Sources:

131 .lock.yml files in .github/workflows/
Extracted triggers, safe outputs, permissions, schedules, timeouts, MCP servers, and structural metrics

Cache Memory:

Analysis scripts persisted in /tmp/gh-aw/cache-memory/scripts/
Historical data saved in /tmp/gh-aw/cache-memory/history/2026-01-19.json
Reusable patterns documented for future analyses

Validation:

Cross-checked file counts, sizes, and patterns manually
Verified statistical calculations against raw data
Tested extraction scripts on sample workflows

Appendix: Workflow Categories by Purpose

Based on naming patterns, the 131 workflows can be categorized as:

Daily Operations (42 workflows): daily-news, daily-code-metrics, daily-team-status, etc.
Code Analysis (18 workflows): security-review, static-analysis-report, code-simplifier, etc.
PR/Issue Management (15 workflows): pr-nitpick-reviewer, issue-triage-agent, auto-triage-issues, etc.
Documentation (12 workflows): technical-doc-writer, docs-noob-tester, unbloat-docs, etc.
Testing & Quality (10 workflows): smoke-claude, smoke-copilot, super-linter, etc.
Reporting (8 workflows): weekly-issue-summary, org-health-report, portfolio-analyst, etc.
Copilot Analysis (7 workflows): copilot-pr-nlp-analysis, copilot-session-insights, etc.
Workflow Management (6 workflows): workflow-health-manager, workflow-normalizer, etc.
Security (5 workflows): security-compliance, security-fix-pr, daily-malicious-code-scan, etc.
Other Specialized (8 workflows): poem-bot, video-analyzer, ubuntu-image-analyzer, etc.

AI generated by Lockfile Statistics Analysis Agent

expires on Jan 26, 2026, 3:02 PM UTC

2026-01-26T16:52:25Z

github-actions[bot]
bot Jan 26, 2026
Author

This discussion was automatically closed because it expired on 2026-01-26T15:02:18.559Z.

0 replies