docs: model graded metrics updates #5285

ladyofcode · 2025-08-18T15:37:17Z

Attempted to improve the metrics definitions to be easier to understand.

…-graded-metrics-updates

gru-agent · 2025-08-18T15:37:29Z

TestGru Assignment

Summary

Link	CommitId	Status	Reason
Detail	`defe7b0`	🚫 Skipped	No files need to be tested {"site/docs/configuration/expected-outputs/model-graded/index.md":"File path does not match include patterns.","site/docs/write-for-promptfoo.md":"File path does not match include patterns.","site/sidebars.js":"File path does not match include patterns."}

History Assignment

Tip

You can @gru-agent and leave your feedback. TestGru will make adjustments based on your input

coderabbitai · 2025-08-18T15:43:17Z

📝 Walkthrough

Walkthrough

Restructures site/docs/configuration/expected-outputs/model-graded/index.md: replaces prior Output-based block with a concise list of assertion types; enumerates factuality relationships; promotes Context-based to an H2 with revised evaluative phrasing; updates conversational guidance; capitalizes “Promptfoo”; minor header/wording tweaks.
Adds a new documentation page site/docs/write-for-promptfoo.md with contributor guidelines, process, compensation, FAQs, and contact details.
Updates site/sidebars.js to include the new write-for-promptfoo doc entry after releases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs/model-graded-metrics-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

site/docs/configuration/expected-outputs/model-graded/index.md (1)
1-4: Add required front matter title and align wording with style (“eval” over “evaluation”)

This page is missing the required title field. Also, the style guide prefers “eval” instead of “evaluation”.

Apply:
 ---
 sidebar_position: 7
-description: 'Comprehensive overview of model-graded evaluation techniques leveraging AI models to assess quality, safety, and accuracy'
+title: Model-graded metrics
+description: 'Comprehensive overview of model-graded eval techniques leveraging AI models to assess quality, safety, and accuracy'
 ---

🧹 Nitpick comments (6)

site/docs/configuration/expected-outputs/model-graded/index.md (2)

39-39: Use “eval” instead of “evaluation” in link text

Style guide prefers “eval” over “evaluation”.
-Context-based assertions are particularly useful for evaluating RAG systems. For complete RAG evaluation examples, see the [RAG Evaluation Guide](/docs/guides/evaluate-rag).
+Context-based assertions are particularly useful for evaluating RAG systems. For complete RAG eval examples, see the [RAG Eval Guide](/docs/guides/evaluate-rag).
159-161: Specify a language for this code block

All code blocks should declare a language for syntax highlighting.
-   ```
-   promptfoo eval --grader openai:gpt-4.1-mini
-   ```
+   ```bash
+   promptfoo eval --grader openai:gpt-4.1-mini
+   ```

site/docs/write-for-promptfoo.md (4)

35-36: Tone and punctuation polish

Avoid “i.e.” in running text and tighten the message.

-We expect the final work to have an authentic voice i.e. some form of personality. Anything entirely AI-generated will be rejected. We can do that ourselves.
+We expect an authentic, human voice. Fully AI-generated drafts will be rejected.

37-38: Clarify update commitment

-We also expect support for up to 30 days to keep the article updated. For example, if a new model version is released the article should be updated to use that instead.
+Please keep the article updated for 30 days after publication. For example, if a new model version is released, update the article to use it.

48-48: Grammar: “from anything” → “within”

-You will receive a response from anything within a few days to a couple of weeks.
+You will receive a response within a few days to a couple of weeks.

50-51: Tighten process language and fix comma splice

-Once your draft is complete you'll need to add it to a GitHub PR, upon which we'll review and provide feedback. Once updates have been made and final checks have passed, we'll publish it on our site. Following this, we'll advertise the work on our social media, newsletter, and Discord.
+Once your draft is complete, add it to a GitHub PR. We'll review and provide feedback. After updates and final checks, we'll publish it on our site and promote it via social media, the newsletter, and Discord.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a1e2486 and defe7b0.

📒 Files selected for processing (3)

site/docs/configuration/expected-outputs/model-graded/index.md (1 hunks)
site/docs/write-for-promptfoo.md (1 hunks)
site/sidebars.js (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

{site/**,examples/**}

📄 CodeRabbit Inference Engine (.cursor/rules/gh-cli-workflow.mdc)

Any pull request that only touches files in 'site/' or 'examples/' directories must use the 'docs:' prefix in the PR title, not 'feat:' or 'fix:'

Files:

site/sidebars.js
site/docs/write-for-promptfoo.md
site/docs/configuration/expected-outputs/model-graded/index.md

site/**

📄 CodeRabbit Inference Engine (.cursor/rules/gh-cli-workflow.mdc)

If the change is a feature, update the relevant documentation under 'site/'

Files:

site/sidebars.js
site/docs/write-for-promptfoo.md
site/docs/configuration/expected-outputs/model-graded/index.md

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Follow consistent import order (Biome will handle import sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks

Files:

site/sidebars.js

site/docs/**/*.md

📄 CodeRabbit Inference Engine (.cursor/rules/docusaurus.mdc)

site/docs/**/*.md: Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Structure content to reveal information progressively: begin with essential actions and information, then provide deeper context as necessary; organize information from most important to least important.
Use action-oriented language: clearly outline actionable steps users should take, use concise and direct language, prefer active voice over passive voice, and use imperative mood for instructions.
Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
The project name can be written as either 'Promptfoo' (capitalized) or 'promptfoo' (lowercase) depending on context: use 'Promptfoo' at the beginning of sentences or in headings, and 'promptfoo' in code examples, terminal commands, or when referring to the package name; be consistent with the chosen capitalization within each document or section.
Each markdown documentation file must include required front matter fields: 'title' (the page title shown in search results and browser tabs) and 'description' (a concise summary of the page content, ideally 150-160 characters).
Only add a title attribute to code blocks that represent complete, runnable files; do not add titles to code fragments, partial examples, or snippets that aren't meant to be used as standalone files; this applies to all code blocks regardless of language.
Use special comment directives to highlight specific lines in code blocks: 'highlight-next-line' highlights the line immediately after the comment, 'highligh...

Files:

site/docs/write-for-promptfoo.md
site/docs/configuration/expected-outputs/model-graded/index.md

🧠 Learnings (8)

📓 Common learnings

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/blog/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/docs/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.mdx : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/blog/**/*.mdx : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.

📚 Learning: 2025-07-18T17:25:57.700Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-07-18T17:25:57.700Z
Learning: Applies to site/** : If the change is a feature, update the relevant documentation under 'site/'

Applied to files:

site/sidebars.js