-
-
Notifications
You must be signed in to change notification settings - Fork 658
docs: model graded metrics updates #5285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
TestGru AssignmentSummary
Tip You can |
📝 WalkthroughWalkthrough
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
site/docs/configuration/expected-outputs/model-graded/index.md (1)
1-4
: Add required front mattertitle
and align wording with style (“eval” over “evaluation”)This page is missing the required
title
field. Also, the style guide prefers “eval” instead of “evaluation”.Apply:
--- sidebar_position: 7 -description: 'Comprehensive overview of model-graded evaluation techniques leveraging AI models to assess quality, safety, and accuracy' +title: Model-graded metrics +description: 'Comprehensive overview of model-graded eval techniques leveraging AI models to assess quality, safety, and accuracy' ---
🧹 Nitpick comments (6)
site/docs/configuration/expected-outputs/model-graded/index.md (2)
39-39
: Use “eval” instead of “evaluation” in link textStyle guide prefers “eval” over “evaluation”.
-Context-based assertions are particularly useful for evaluating RAG systems. For complete RAG evaluation examples, see the [RAG Evaluation Guide](/docs/guides/evaluate-rag). +Context-based assertions are particularly useful for evaluating RAG systems. For complete RAG eval examples, see the [RAG Eval Guide](/docs/guides/evaluate-rag).
159-161
: Specify a language for this code blockAll code blocks should declare a language for syntax highlighting.
- ``` - promptfoo eval --grader openai:gpt-4.1-mini - ``` + ```bash + promptfoo eval --grader openai:gpt-4.1-mini + ```site/docs/write-for-promptfoo.md (4)
35-36
: Tone and punctuation polishAvoid “i.e.” in running text and tighten the message.
-We expect the final work to have an authentic voice i.e. some form of personality. Anything entirely AI-generated will be rejected. We can do that ourselves. +We expect an authentic, human voice. Fully AI-generated drafts will be rejected.
37-38
: Clarify update commitment-We also expect support for up to 30 days to keep the article updated. For example, if a new model version is released the article should be updated to use that instead. +Please keep the article updated for 30 days after publication. For example, if a new model version is released, update the article to use it.
48-48
: Grammar: “from anything” → “within”-You will receive a response from anything within a few days to a couple of weeks. +You will receive a response within a few days to a couple of weeks.
50-51
: Tighten process language and fix comma splice-Once your draft is complete you'll need to add it to a GitHub PR, upon which we'll review and provide feedback. Once updates have been made and final checks have passed, we'll publish it on our site. Following this, we'll advertise the work on our social media, newsletter, and Discord. +Once your draft is complete, add it to a GitHub PR. We'll review and provide feedback. After updates and final checks, we'll publish it on our site and promote it via social media, the newsletter, and Discord.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
site/docs/configuration/expected-outputs/model-graded/index.md
(1 hunks)site/docs/write-for-promptfoo.md
(1 hunks)site/sidebars.js
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
{site/**,examples/**}
📄 CodeRabbit Inference Engine (.cursor/rules/gh-cli-workflow.mdc)
Any pull request that only touches files in 'site/' or 'examples/' directories must use the 'docs:' prefix in the PR title, not 'feat:' or 'fix:'
Files:
site/sidebars.js
site/docs/write-for-promptfoo.md
site/docs/configuration/expected-outputs/model-graded/index.md
site/**
📄 CodeRabbit Inference Engine (.cursor/rules/gh-cli-workflow.mdc)
If the change is a feature, update the relevant documentation under 'site/'
Files:
site/sidebars.js
site/docs/write-for-promptfoo.md
site/docs/configuration/expected-outputs/model-graded/index.md
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx}
: Follow consistent import order (Biome will handle import sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks
Files:
site/sidebars.js
site/docs/**/*.md
📄 CodeRabbit Inference Engine (.cursor/rules/docusaurus.mdc)
site/docs/**/*.md
: Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Structure content to reveal information progressively: begin with essential actions and information, then provide deeper context as necessary; organize information from most important to least important.
Use action-oriented language: clearly outline actionable steps users should take, use concise and direct language, prefer active voice over passive voice, and use imperative mood for instructions.
Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
The project name can be written as either 'Promptfoo' (capitalized) or 'promptfoo' (lowercase) depending on context: use 'Promptfoo' at the beginning of sentences or in headings, and 'promptfoo' in code examples, terminal commands, or when referring to the package name; be consistent with the chosen capitalization within each document or section.
Each markdown documentation file must include required front matter fields: 'title' (the page title shown in search results and browser tabs) and 'description' (a concise summary of the page content, ideally 150-160 characters).
Only add a title attribute to code blocks that represent complete, runnable files; do not add titles to code fragments, partial examples, or snippets that aren't meant to be used as standalone files; this applies to all code blocks regardless of language.
Use special comment directives to highlight specific lines in code blocks: 'highlight-next-line' highlights the line immediately after the comment, 'highligh...
Files:
site/docs/write-for-promptfoo.md
site/docs/configuration/expected-outputs/model-graded/index.md
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/blog/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/docs/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.mdx : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/blog/**/*.mdx : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
📚 Learning: 2025-07-18T17:25:57.700Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-07-18T17:25:57.700Z
Learning: Applies to site/** : If the change is a feature, update the relevant documentation under 'site/'
Applied to files:
site/sidebars.js
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Applied to files:
site/docs/write-for-promptfoo.md
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/blog/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Applied to files:
site/docs/write-for-promptfoo.md
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/docs/**/*.md : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Applied to files:
site/docs/write-for-promptfoo.md
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.mdx : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Applied to files:
site/docs/write-for-promptfoo.md
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/blog/**/*.mdx : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Applied to files:
site/docs/write-for-promptfoo.md
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
PR: promptfoo/promptfoo#0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/docs/**/*.mdx : Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Applied to files:
site/docs/write-for-promptfoo.md
🪛 LanguageTool
site/docs/write-for-promptfoo.md
[grammar] ~15-~15: There might be a mistake here.
Context: ...p-by-step instructions for common tasks. - Integration use cases - How to use Pro...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...rse of building a project or deployment. - Best practices - Tips and tricks for u...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ... tricks for using Promptfoo effectively. - Case studies - Real-world examples of ...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...examples of how Promptfoo is being used. - Academic research - Using Promptfoo in...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ... process of creating published research. - Troubleshooting guides - Solutions to ...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...guides** - Solutions to common problems. - Integration examples - How to use Prom...
(QB_NEW_EN)
[grammar] ~25-~25: Ensure spelling is correct
Context: ...neering topics more over how-to guides, troublshooting, or integrations. ## Writing guideline...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~43-~43: There might be a mistake here.
Context: ... Email [email protected] with: - Topic - Outline - Estimated length - Samples of ...
(QB_NEW_EN)
[grammar] ~44-~44: There might be a mistake here.
Context: [email protected] with: - Topic - Outline - Estimated length - Samples of your writi...
(QB_NEW_EN)
[grammar] ~45-~45: There might be a mistake here.
Context: ...h: - Topic - Outline - Estimated length - Samples of your writing You will receiv...
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: Redteam
- GitHub Check: Test on Node 20.x and windows-latest
- GitHub Check: Redteam Custom Enterprise Server
- GitHub Check: Share Test
- GitHub Check: Test on Node 22.x and ubuntu-latest
- GitHub Check: Test on Node 24.x and ubuntu-latest
- GitHub Check: Test on Node 24.x and macOS-latest
- GitHub Check: Test on Node 22.x and macOS-latest
- GitHub Check: Test on Node 20.x and macOS-latest
- GitHub Check: webui tests
- GitHub Check: Test on Node 20.x and ubuntu-latest
- GitHub Check: Build on Node 20.x
- GitHub Check: Test on Node 18.x and windows-latest
- GitHub Check: Build on Node 22.x
- GitHub Check: Test on Node 18.x and macOS-latest
- GitHub Check: Test on Node 18.x and ubuntu-latest
- GitHub Check: Build on Node 24.x
- GitHub Check: Build on Node 18.x
- GitHub Check: Build Docs
- GitHub Check: Style Check
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
site/sidebars.js (2)
415-418
: Sidebar entry looks correct and well-placedThe new doc id matches the added page and the position after Releases is fine. No structural issues spotted.
415-418
: Docs-only PRs should use adocs:
title prefixPer repository guidelines, PRs that only touch files under
site/
orexamples/
should use thedocs:
prefix in the title. Consider renaming to:docs: model-graded metrics updates
(also note the hyphenation of “model-graded”).site/docs/configuration/expected-outputs/model-graded/index.md (1)
10-27
: Clearer taxonomy for output-based assertions — nice improvementThis succinct list improves scanability and aligns with action-oriented phrasing. Looks good.
## Context-based | ||
|
||
- [`context-recall`](/docs/configuration/expected-outputs/model-graded/context-recall) - ensure that ground truth appears in context | ||
- [`context-relevance`](/docs/configuration/expected-outputs/model-graded/context-relevance) - ensure that context is relevant to original query | ||
- [`context-faithfulness`](/docs/configuration/expected-outputs/model-graded/context-faithfulness) - ensure that LLM output is supported by context | ||
- [`context-recall`](/docs/configuration/expected-outputs/model-graded/context-recall) - Evaluates whether provided context contains information needed for a question or fact. | ||
- [`context-relevance`](/docs/configuration/expected-outputs/model-graded/context-relevance) - Evaluates whether context is relevant to the original query. | ||
- [`context-faithfulness`](/docs/configuration/expected-outputs/model-graded/context-faithfulness) - Evaluates if LLM output is supported by the context. | ||
|
||
Conversational: | ||
## Conversational | ||
|
||
- [`conversation-relevance`](/docs/configuration/expected-outputs/model-graded/conversation-relevance) - ensure that responses remain relevant throughout a conversation | ||
- [`conversation-relevance`](/docs/configuration/expected-outputs/model-graded/conversation-relevance) - Evaluates whether responses remain relevant throughout a conversation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Duplicate “Context-based” sections — consolidate to a single H2
There are two H2 “Context-based” sections: an early summary (Lines 29–37) and a later, detailed section (Lines 291–298+). This will produce duplicate anchors and confuse readers.
Recommend keeping the later detailed section and removing the earlier duplicate header and bullets:
-## Context-based
-
-- [`context-recall`](/docs/configuration/expected-outputs/model-graded/context-recall) - Evaluates whether provided context contains information needed for a question or fact.
-- [`context-relevance`](/docs/configuration/expected-outputs/model-graded/context-relevance) - Evaluates whether context is relevant to the original query.
-- [`context-faithfulness`](/docs/configuration/expected-outputs/model-graded/context-faithfulness) - Evaluates if LLM output is supported by the context.
Also applies to: 291-298
🤖 Prompt for AI Agents
In site/docs/configuration/expected-outputs/model-graded/index.md around lines
29 to 37, there is a duplicate "Context-based" H2 and its bullet list that
repeats a more detailed "Context-based" section later (around lines 291–298);
remove the earlier H2 and its three bullets (the summary at 29–37) so only the
later detailed "Context-based" section remains and update surrounding
spacing/anchors if needed to avoid leftover empty lines.
site/docs/write-for-promptfoo.md
Outdated
--- | ||
id: write-for-promptfoo | ||
title: Write for Promptfoo | ||
sidebar_label: Write for Promptfoo | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add required front matter description
Docs require both title
and description
in front matter. Add a concise 150–160 character description.
---
id: write-for-promptfoo
title: Write for Promptfoo
sidebar_label: Write for Promptfoo
+description: Guidelines, process, and compensation for writing high-quality Promptfoo tutorials, guides, and case studies. Learn how to pitch, submit, and get paid.
---
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
--- | |
id: write-for-promptfoo | |
title: Write for Promptfoo | |
sidebar_label: Write for Promptfoo | |
--- | |
--- | |
id: write-for-promptfoo | |
title: Write for Promptfoo | |
sidebar_label: Write for Promptfoo | |
description: Guidelines, process, and compensation for writing high-quality Promptfoo tutorials, guides, and case studies. Learn how to pitch, submit, and get paid. | |
--- |
🤖 Prompt for AI Agents
In site/docs/write-for-promptfoo.md around lines 1 to 5, the front matter is
missing the required description field; add a concise description string
(150–160 characters) to the YAML front matter under a new description key, e.g.
description: "…" ensuring it is between 150 and 160 characters and properly
quoted so the front matter contains both title and description.
site/docs/write-for-promptfoo.md
Outdated
We pay anything between $50 - $300 for the work being produced. We value engineering topics more over how-to guides, troublshooting, or integrations. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix typo and tighten compensation phrasing
- Spelling: “troublshooting” -> “troubleshooting”
- Grammar: “more over” -> “over”
- Clarity: simplify compensation sentence
-We pay anything between $50 - $300 for the work being produced. We value engineering topics more over how-to guides, troublshooting, or integrations.
+We pay $50 to $300 per article. We prioritize engineering topics over how-to guides, troubleshooting, and integrations.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
We pay anything between $50 - $300 for the work being produced. We value engineering topics more over how-to guides, troublshooting, or integrations. | |
We pay $50 to $300 per article. We prioritize engineering topics over how-to guides, troubleshooting, and integrations. |
🧰 Tools
🪛 LanguageTool
[grammar] ~25-~25: Ensure spelling is correct
Context: ...neering topics more over how-to guides, troublshooting, or integrations. ## Writing guideline...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In site/docs/write-for-promptfoo.md around lines 25 to 26, fix the typo and
tighten the compensation phrasing: change “troublshooting” to “troubleshooting”,
replace “more over” with “over”, and simplify the compensation sentence to
something like “We pay $50–$300 per article.” Ensure the final sentence reads
clearly and concisely and the topics sentence reads “We value engineering topics
over how-to guides, troubleshooting, or integrations.”
Attempted to improve the metrics definitions to be easier to understand.