-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip: eval how to revamp #525
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some flyby comments but overall like the structure
@@ -2,7 +2,7 @@ | |||
sidebar_position: 6 | |||
--- | |||
|
|||
# Export filtered traces from experiment to dataset | |||
# How to export filtered traces from experiment to dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we gain anything by putting How-to in every title? These guides are already in the how-to section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once you're on the page its not clear where you are bc sidebar doesn't expand that deep. personally like that langchain docs are all titled "How to..." but dont feel strongly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is going to be much needed
|
||
You can use the `list_examples` / `listExamples` method to fetch a subset of examples from a dataset to evaluate on. You can refer to guide above to learn more about the different ways to fetch examples. | ||
|
||
One common workflow is to fetch examples that have a certain metadata key-value pair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would comment that we have much more powerful filtering capabilities (particularly for metadata) using the filter parameter: https://docs.smith.langchain.com/evaluation/how_to_guides/datasets/manage_datasets_programmatically#list-examples-by-structured-filter
- [Run an evaluation using the SDK](./how_to_guides/evaluation/evaluate_llm_application) | ||
- [Run an evaluation asynchronously](./how_to_guides/evaluation/async) | ||
- [Run an evaluation comparing two experiments](./how_to_guides/evaluation/evaluate_pairwise) | ||
- [Run an evaluation of a LangChain / LangGraph object](./how_to_guides/evaluation/langchain_runnable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth it to cover LangChain and LangGraph in separate guides?
There's some more interesting things in LG like evaluating a single node, traectory, etc. It might be worth it to call out the documentation for running a larger eval in the LG eval guide.
We have this tutorial I hacked together with Lance a while back but it's definitely overly complicated and I wonder if bits of it can be adapted to a more concise how to guide https://docs.smith.langchain.com/evaluation/tutorials/agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i kinda think this should live in langgraph docs and be linked from here
### Run an evaluation | ||
- [Run an evaluation using the SDK](./how_to_guides/evaluation/evaluate_llm_application) | ||
- [Run an evaluation asynchronously](./how_to_guides/evaluation/async) | ||
- [Run an evaluation comparing two experiments](./how_to_guides/evaluation/evaluate_pairwise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Run a comparative evaluation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agola11 can review this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review
No description provided.