ci(llm-review): 👷 include repo instruction file in system prompt#56
ci(llm-review): 👷 include repo instruction file in system prompt#56
Conversation
There was a problem hiding this comment.
Summary
This PR updates the llm-pr-review workflow to optionally include repository instruction context (AGENTS.md / AGENT.md / CLAUDE.md) in the Anthropic system prompt, which is a solid improvement for review quality and repo-specific adherence. The implementation is mostly correct and follows a clear priority order with bounded size. I’d consider it close to merge-ready, with one important robustness fix around shell substring portability.
Important
.github/workflows/llm-pr-review.yml:136— Bash-specific substring expansion may break on non-bash shells:
instruction_content_trimmed="${instruction_content:0:$max_instruction_chars}"relies on bash parameter expansion. GitHub Actionsrundefaults to bash on Ubuntu, but this can become fragile if runner/shell changes or if the step shell is overridden.
Suggestion: Make truncation shell-agnostic usinghead -c:or explicitly setinstruction_content_trimmed="$(head -c "$max_instruction_chars" "$instruction_file")"shell: bashat step level to lock behavior.
Suggestion
.github/workflows/llm-pr-review.yml:133— Avoid loading full file before trimming:
instruction_content=$(cat "$instruction_file")reads the entire file into memory before truncation. For very large files this is unnecessary.
Suggestion: Stream directly into the temp file with byte limit:{ printf "Repository instruction file: %s\n\n" "$instruction_file" head -c "$max_instruction_chars" "$instruction_file" } > /tmp/repo_instruction_prompt.txt
Praise
.github/workflows/llm-pr-review.yml:123— Clear deterministic priority handling:
TheAGENTS.md > AGENT.md > CLAUDE.mdselection logic is explicit and easy to reason about, reducing ambiguity in prompt construction..github/workflows/llm-pr-review.yml:142— Graceful no-file fallback:
Creating an empty/tmp/repo_instruction_prompt.txtkeeps downstream logic simple and avoids branching complexity later in request construction..github/workflows/llm-pr-review.yml:161— Clean conditional JSON composition withjq:
Appending repo instructions only when non-empty is done safely and maintainably, with minimal risk of malformed request payloads.
There was a problem hiding this comment.
Summary
This PR improves the LLM review workflow robustness by moving large payload handling to temp files, adding truncation guards, and optionally appending repository instruction content (AGENTS.md/AGENT.md/CLAUDE.md) to the Anthropic system messages. Overall this is a solid, practical hardening change and is close to merge-ready. I found one correctness issue that should be fixed before merging.
Critical
.github/workflows/llm-pr-review.yml:232— Review truncation can still exceed GitHub comment limit: After truncatingreview_textto58000bytes, the workflow appends a truncation notice, which pushes the final payload beyond the configured cap. This can still cause GitHub review API rejection for oversized body.
Suggestion: reserve space for the suffix before truncating, e.g. computesuffix='...\n[Truncated ...]', then truncate tomax_review_chars - ${#suffix}and append suffix. Alternatively, enforce a final hard cap after suffix is appended.
Praise
.github/workflows/llm-pr-review.yml:47— Switched from in-memory JSON blobs to temp files: Writing PR metadata/diff to files reduces shell ARG_MAX risks and avoids brittle large-string handling..github/workflows/llm-pr-review.yml:146— Deterministic instruction-file precedence: TheAGENTS.md > AGENT.md > CLAUDE.mdselection is clear and predictable, and loading only one avoids prompt bloat..github/workflows/llm-pr-review.yml:183— Clean Anthropic system-message composition: Buildingsystemas an array with conditional repo instruction inclusion is a good contract-safe way to extend behavior without breaking the base prompt.
There was a problem hiding this comment.
Summary
This PR enhances the LLM review workflow by adding robust file-based payload handling, truncation safeguards, and optional repository instruction injection (AGENTS.md > AGENT.md > CLAUDE.md) into the Anthropic system messages. Overall, the changes are solid and improve reliability for large PRs. I see one blocking correctness issue around multiline content handling that should be fixed before merge.
Critical
.github/workflows/llm-pr-review.yml:223— Review text truncation can corrupt multiline content:
review_textis reconstructed with command substitution:In POSIX shells, command substitution strips trailing newlines, which can unintentionally alter Markdown formatting and content length semantics before posting to GitHub. Since this workflow carefully handles size limits, this subtle mutation is a correctness bug.review_text="$(printf "%s" "$review_text" | head -c "$remaining_chars")"
Suggestion: keep all truncation operations file-based (as done elsewhere), e.g. write original text to/tmp/review_text_full.txt, truncate withhead -cto/tmp/review_text.txt, append truncation note with file ops, and avoid round-tripping large multiline content through shell variables when possible.
Important
.github/workflows/llm-pr-review.yml:152— Repository instruction file not line-safe if truncated mid-UTF-8/code fence:
head -conAGENTS.mdcan cut in the middle of a multibyte character or markdown/code block, which may degrade instruction parsing quality for the model. Not a security issue, but it can reduce prompt fidelity.
Suggestion: afterhead -c, optionally trim to last full line (sed '$d') similar to diff handling, or at least append a clear marker like\n\n[Instruction file truncated]so the model understands truncation.
Praise
-
.github/workflows/llm-pr-review.yml:48-83— Good move to temp-file based handling:
Switching PR metadata/diff from in-memory variables to temp files reduces ARG_MAX and shell quoting risks for large payloads. -
.github/workflows/llm-pr-review.yml:171-187— Clean conditional system message composition:
Thejqcomposition ofsystemmessages is well-structured and avoids brittle string concatenation. -
.github/workflows/llm-pr-review.yml:88-99and:217-233— Thoughtful truncation observability:
Emitting::notice::with original and capped sizes is excellent for diagnosing workflow behavior without failing runs.
Summary
AGENTS.md>AGENT.md>CLAUDE.mdand load only one filesystemmessage in the Anthropic requestTest Plan
.github/workflows/llm-pr-review.ymlsystemwith base prompt + optional repo instruction message🤖 Generated with Codex Cli