ci(llm-review): 👷 include repo instruction file in system prompt by jorben · Pull Request #56 · MarkPDFdown/markpdfdown-desktop

jorben · 2026-03-03T03:29:05Z

Summary

Add optional repository instruction file detection in the LLM PR review workflow
Use priority order AGENTS.md > AGENT.md > CLAUDE.md and load only one file
Append the selected file content as an additional system message in the Anthropic request

Test Plan

Confirm branch diff only changes .github/workflows/llm-pr-review.yml
Confirm workflow builds system with base prompt + optional repo instruction message
Validate local test hooks passed during commit

🤖 Generated with Codex Cli

github-actions

Summary

This PR updates the llm-pr-review workflow to optionally include repository instruction context (AGENTS.md / AGENT.md / CLAUDE.md) in the Anthropic system prompt, which is a solid improvement for review quality and repo-specific adherence. The implementation is mostly correct and follows a clear priority order with bounded size. I’d consider it close to merge-ready, with one important robustness fix around shell substring portability.

Important

.github/workflows/llm-pr-review.yml:136 — Bash-specific substring expansion may break on non-bash shells:
instruction_content_trimmed="${instruction_content:0:$max_instruction_chars}" relies on bash parameter expansion. GitHub Actions run defaults to bash on Ubuntu, but this can become fragile if runner/shell changes or if the step shell is overridden.
Suggestion: Make truncation shell-agnostic using head -c:
```
instruction_content_trimmed="$(head -c "$max_instruction_chars" "$instruction_file")"
```
or explicitly set shell: bash at step level to lock behavior.

Suggestion

.github/workflows/llm-pr-review.yml:133 — Avoid loading full file before trimming:
instruction_content=$(cat "$instruction_file") reads the entire file into memory before truncation. For very large files this is unnecessary.
Suggestion: Stream directly into the temp file with byte limit:
```
{
  printf "Repository instruction file: %s\n\n" "$instruction_file"
  head -c "$max_instruction_chars" "$instruction_file"
} > /tmp/repo_instruction_prompt.txt
```

Praise

.github/workflows/llm-pr-review.yml:123 — Clear deterministic priority handling:
The AGENTS.md > AGENT.md > CLAUDE.md selection logic is explicit and easy to reason about, reducing ambiguity in prompt construction.
.github/workflows/llm-pr-review.yml:142 — Graceful no-file fallback:
Creating an empty /tmp/repo_instruction_prompt.txt keeps downstream logic simple and avoids branching complexity later in request construction.
.github/workflows/llm-pr-review.yml:161 — Clean conditional JSON composition with jq:
Appending repo instructions only when non-empty is done safely and maintainably, with minimal risk of malformed request payloads.

github-actions

Summary

This PR improves the LLM review workflow robustness by moving large payload handling to temp files, adding truncation guards, and optionally appending repository instruction content (AGENTS.md/AGENT.md/CLAUDE.md) to the Anthropic system messages. Overall this is a solid, practical hardening change and is close to merge-ready. I found one correctness issue that should be fixed before merging.

Critical

.github/workflows/llm-pr-review.yml:232 — Review truncation can still exceed GitHub comment limit: After truncating review_text to 58000 bytes, the workflow appends a truncation notice, which pushes the final payload beyond the configured cap. This can still cause GitHub review API rejection for oversized body.
Suggestion: reserve space for the suffix before truncating, e.g. compute suffix='...\n[Truncated ...]', then truncate to max_review_chars - ${#suffix} and append suffix. Alternatively, enforce a final hard cap after suffix is appended.

Praise

.github/workflows/llm-pr-review.yml:47 — Switched from in-memory JSON blobs to temp files: Writing PR metadata/diff to files reduces shell ARG_MAX risks and avoids brittle large-string handling.
.github/workflows/llm-pr-review.yml:146 — Deterministic instruction-file precedence: The AGENTS.md > AGENT.md > CLAUDE.md selection is clear and predictable, and loading only one avoids prompt bloat.
.github/workflows/llm-pr-review.yml:183 — Clean Anthropic system-message composition: Building system as an array with conditional repo instruction inclusion is a good contract-safe way to extend behavior without breaking the base prompt.

github-actions

Summary

This PR enhances the LLM review workflow by adding robust file-based payload handling, truncation safeguards, and optional repository instruction injection (AGENTS.md > AGENT.md > CLAUDE.md) into the Anthropic system messages. Overall, the changes are solid and improve reliability for large PRs. I see one blocking correctness issue around multiline content handling that should be fixed before merge.

Critical

.github/workflows/llm-pr-review.yml:223 — Review text truncation can corrupt multiline content:
review_text is reconstructed with command substitution:
```
review_text="$(printf "%s" "$review_text" | head -c "$remaining_chars")"
```
In POSIX shells, command substitution strips trailing newlines, which can unintentionally alter Markdown formatting and content length semantics before posting to GitHub. Since this workflow carefully handles size limits, this subtle mutation is a correctness bug.
Suggestion: keep all truncation operations file-based (as done elsewhere), e.g. write original text to /tmp/review_text_full.txt, truncate with head -c to /tmp/review_text.txt, append truncation note with file ops, and avoid round-tripping large multiline content through shell variables when possible.

Important

.github/workflows/llm-pr-review.yml:152 — Repository instruction file not line-safe if truncated mid-UTF-8/code fence:
head -c on AGENTS.md can cut in the middle of a multibyte character or markdown/code block, which may degrade instruction parsing quality for the model. Not a security issue, but it can reduce prompt fidelity.
Suggestion: after head -c, optionally trim to last full line (sed '$d') similar to diff handling, or at least append a clear marker like \n\n[Instruction file truncated] so the model understands truncation.

Praise

.github/workflows/llm-pr-review.yml:48-83 — Good move to temp-file based handling:
Switching PR metadata/diff from in-memory variables to temp files reduces ARG_MAX and shell quoting risks for large payloads.
.github/workflows/llm-pr-review.yml:171-187 — Clean conditional system message composition:
The jq composition of system messages is well-structured and avoids brittle string concatenation.
.github/workflows/llm-pr-review.yml:88-99 and :217-233 — Thoughtful truncation observability:
Emitting ::notice:: with original and capped sizes is excellent for diagnosing workflow behavior without failing runs.

ci(llm-review): 👷 inject repo instruction file into system prompt

ac90432

github-actions bot reviewed Mar 3, 2026

View reviewed changes

ci(llm-review): 👷 reduce arg usage with temp files

7afca42

github-actions bot reviewed Mar 3, 2026

View reviewed changes

ci(llm-review): 👷 reserve suffix bytes when truncating review

bceaeea

github-actions bot reviewed Mar 3, 2026

View reviewed changes

jorben merged commit 04a802a into master Mar 3, 2026
3 checks passed

jorben deleted the feat/llm-review-agent-instructions branch March 3, 2026 05:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(llm-review): 👷 include repo instruction file in system prompt#56

ci(llm-review): 👷 include repo instruction file in system prompt#56
jorben merged 3 commits intomasterfrom
feat/llm-review-agent-instructions

jorben commented Mar 3, 2026 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jorben commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Summary

Important

Suggestion

Praise

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Summary

Critical

Praise

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Summary

Critical

Important

Praise

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jorben commented Mar 3, 2026 •

edited

Loading