Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: eval how to revamp #525

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

wip: eval how to revamp #525

wants to merge 15 commits into from

Conversation

baskaryan
Copy link
Contributor

No description provided.

Copy link

vercel bot commented Nov 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langsmith-docs ✅ Ready (Inspect) Visit Preview 💬 30 unresolved Nov 19, 2024 10:59pm

Copy link
Contributor

@agola11 agola11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some flyby comments but overall like the structure

@@ -2,7 +2,7 @@
sidebar_position: 6
---

# Export filtered traces from experiment to dataset
# How to export filtered traces from experiment to dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we gain anything by putting How-to in every title? These guides are already in the how-to section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you're on the page its not clear where you are bc sidebar doesn't expand that deep. personally like that langchain docs are all titled "How to..." but dont feel strongly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is going to be much needed


You can use the `list_examples` / `listExamples` method to fetch a subset of examples from a dataset to evaluate on. You can refer to guide above to learn more about the different ways to fetch examples.

One common workflow is to fetch examples that have a certain metadata key-value pair.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would comment that we have much more powerful filtering capabilities (particularly for metadata) using the filter parameter: https://docs.smith.langchain.com/evaluation/how_to_guides/datasets/manage_datasets_programmatically#list-examples-by-structured-filter

- [Run an evaluation using the SDK](./how_to_guides/evaluation/evaluate_llm_application)
- [Run an evaluation asynchronously](./how_to_guides/evaluation/async)
- [Run an evaluation comparing two experiments](./how_to_guides/evaluation/evaluate_pairwise)
- [Run an evaluation of a LangChain / LangGraph object](./how_to_guides/evaluation/langchain_runnable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth it to cover LangChain and LangGraph in separate guides?

There's some more interesting things in LG like evaluating a single node, traectory, etc. It might be worth it to call out the documentation for running a larger eval in the LG eval guide.

We have this tutorial I hacked together with Lance a while back but it's definitely overly complicated and I wonder if bits of it can be adapted to a more concise how to guide https://docs.smith.langchain.com/evaluation/tutorials/agents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kinda think this should live in langgraph docs and be linked from here

### Run an evaluation
- [Run an evaluation using the SDK](./how_to_guides/evaluation/evaluate_llm_application)
- [Run an evaluation asynchronously](./how_to_guides/evaluation/async)
- [Run an evaluation comparing two experiments](./how_to_guides/evaluation/evaluate_pairwise)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Run a comparative evaluation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agola11 can review this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants