promptdiff — lint, diff, and score LLM system prompts (works with any provider) #3018
HadiFrt20
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Built a CLI that applies static analysis to LLM system prompts.
If you manage system prompts for OpenAI models,
promptdiffcatches issues that silently degrade output:What it catches:
Semantic diff — not line-by-line. Tells you "word limit tightened 150→100, high impact" with behavioral annotations.
Quality score — 0-100 across structure, specificity, examples, safety, completeness. Usable as a CI gate.
A/B compare — run two prompt versions through GPT-4o (or Claude, Ollama) and score both outputs:
promptdiff compare v1.prompt v2.prompt --input "test query" --model gpt4oRuns locally, 3 deps, 217 tests.
GitHub: https://github.com/HadiFrt20/promptdiff
Beta Was this translation helpful? Give feedback.
All reactions