Skip to content

ci(llm-review): 👷 include repo instruction file in system prompt#56

Merged
jorben merged 3 commits intomasterfrom
feat/llm-review-agent-instructions
Mar 3, 2026
Merged

ci(llm-review): 👷 include repo instruction file in system prompt#56
jorben merged 3 commits intomasterfrom
feat/llm-review-agent-instructions

Conversation

@jorben
Copy link
Collaborator

@jorben jorben commented Mar 3, 2026

Summary

  • Add optional repository instruction file detection in the LLM PR review workflow
  • Use priority order AGENTS.md > AGENT.md > CLAUDE.md and load only one file
  • Append the selected file content as an additional system message in the Anthropic request

Test Plan

  • Confirm branch diff only changes .github/workflows/llm-pr-review.yml
  • Confirm workflow builds system with base prompt + optional repo instruction message
  • Validate local test hooks passed during commit

🤖 Generated with Codex Cli

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR updates the llm-pr-review workflow to optionally include repository instruction context (AGENTS.md / AGENT.md / CLAUDE.md) in the Anthropic system prompt, which is a solid improvement for review quality and repo-specific adherence. The implementation is mostly correct and follows a clear priority order with bounded size. I’d consider it close to merge-ready, with one important robustness fix around shell substring portability.

Important

  • .github/workflows/llm-pr-review.yml:136Bash-specific substring expansion may break on non-bash shells:
    instruction_content_trimmed="${instruction_content:0:$max_instruction_chars}" relies on bash parameter expansion. GitHub Actions run defaults to bash on Ubuntu, but this can become fragile if runner/shell changes or if the step shell is overridden.
    Suggestion: Make truncation shell-agnostic using head -c:
    instruction_content_trimmed="$(head -c "$max_instruction_chars" "$instruction_file")"
    or explicitly set shell: bash at step level to lock behavior.

Suggestion

  • .github/workflows/llm-pr-review.yml:133Avoid loading full file before trimming:
    instruction_content=$(cat "$instruction_file") reads the entire file into memory before truncation. For very large files this is unnecessary.
    Suggestion: Stream directly into the temp file with byte limit:
    {
      printf "Repository instruction file: %s\n\n" "$instruction_file"
      head -c "$max_instruction_chars" "$instruction_file"
    } > /tmp/repo_instruction_prompt.txt

Praise

  • .github/workflows/llm-pr-review.yml:123Clear deterministic priority handling:
    The AGENTS.md > AGENT.md > CLAUDE.md selection logic is explicit and easy to reason about, reducing ambiguity in prompt construction.
  • .github/workflows/llm-pr-review.yml:142Graceful no-file fallback:
    Creating an empty /tmp/repo_instruction_prompt.txt keeps downstream logic simple and avoids branching complexity later in request construction.
  • .github/workflows/llm-pr-review.yml:161Clean conditional JSON composition with jq:
    Appending repo instructions only when non-empty is done safely and maintainably, with minimal risk of malformed request payloads.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR improves the LLM review workflow robustness by moving large payload handling to temp files, adding truncation guards, and optionally appending repository instruction content (AGENTS.md/AGENT.md/CLAUDE.md) to the Anthropic system messages. Overall this is a solid, practical hardening change and is close to merge-ready. I found one correctness issue that should be fixed before merging.

Critical

  • .github/workflows/llm-pr-review.yml:232Review truncation can still exceed GitHub comment limit: After truncating review_text to 58000 bytes, the workflow appends a truncation notice, which pushes the final payload beyond the configured cap. This can still cause GitHub review API rejection for oversized body.
    Suggestion: reserve space for the suffix before truncating, e.g. compute suffix='...\n[Truncated ...]', then truncate to max_review_chars - ${#suffix} and append suffix. Alternatively, enforce a final hard cap after suffix is appended.

Praise

  • .github/workflows/llm-pr-review.yml:47Switched from in-memory JSON blobs to temp files: Writing PR metadata/diff to files reduces shell ARG_MAX risks and avoids brittle large-string handling.
  • .github/workflows/llm-pr-review.yml:146Deterministic instruction-file precedence: The AGENTS.md > AGENT.md > CLAUDE.md selection is clear and predictable, and loading only one avoids prompt bloat.
  • .github/workflows/llm-pr-review.yml:183Clean Anthropic system-message composition: Building system as an array with conditional repo instruction inclusion is a good contract-safe way to extend behavior without breaking the base prompt.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR enhances the LLM review workflow by adding robust file-based payload handling, truncation safeguards, and optional repository instruction injection (AGENTS.md > AGENT.md > CLAUDE.md) into the Anthropic system messages. Overall, the changes are solid and improve reliability for large PRs. I see one blocking correctness issue around multiline content handling that should be fixed before merge.

Critical

  • .github/workflows/llm-pr-review.yml:223Review text truncation can corrupt multiline content:
    review_text is reconstructed with command substitution:
    review_text="$(printf "%s" "$review_text" | head -c "$remaining_chars")"
    In POSIX shells, command substitution strips trailing newlines, which can unintentionally alter Markdown formatting and content length semantics before posting to GitHub. Since this workflow carefully handles size limits, this subtle mutation is a correctness bug.
    Suggestion: keep all truncation operations file-based (as done elsewhere), e.g. write original text to /tmp/review_text_full.txt, truncate with head -c to /tmp/review_text.txt, append truncation note with file ops, and avoid round-tripping large multiline content through shell variables when possible.

Important

  • .github/workflows/llm-pr-review.yml:152Repository instruction file not line-safe if truncated mid-UTF-8/code fence:
    head -c on AGENTS.md can cut in the middle of a multibyte character or markdown/code block, which may degrade instruction parsing quality for the model. Not a security issue, but it can reduce prompt fidelity.
    Suggestion: after head -c, optionally trim to last full line (sed '$d') similar to diff handling, or at least append a clear marker like \n\n[Instruction file truncated] so the model understands truncation.

Praise

  • .github/workflows/llm-pr-review.yml:48-83Good move to temp-file based handling:
    Switching PR metadata/diff from in-memory variables to temp files reduces ARG_MAX and shell quoting risks for large payloads.

  • .github/workflows/llm-pr-review.yml:171-187Clean conditional system message composition:
    The jq composition of system messages is well-structured and avoids brittle string concatenation.

  • .github/workflows/llm-pr-review.yml:88-99 and :217-233Thoughtful truncation observability:
    Emitting ::notice:: with original and capped sizes is excellent for diagnosing workflow behavior without failing runs.

@jorben jorben merged commit 04a802a into master Mar 3, 2026
3 checks passed
@jorben jorben deleted the feat/llm-review-agent-instructions branch March 3, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant