wip: eval how to revamp #525

baskaryan · 2024-11-13T01:48:49Z

No description provided.

vercel · 2024-11-13T01:48:52Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langsmith-docs	✅ Ready (Inspect)	Visit Preview	💬 30 unresolved	Nov 19, 2024 10:59pm

agola11

some flyby comments but overall like the structure

agola11 · 2024-11-13T05:21:25Z

docs/evaluation/how_to_guides/datasets/export_filtered_traces_to_dataset.mdx

@@ -2,7 +2,7 @@
 sidebar_position: 6
 ---

-# Export filtered traces from experiment to dataset
+# How to export filtered traces from experiment to dataset


I'm not sure we gain anything by putting How-to in every title? These guides are already in the how-to section

once you're on the page its not clear where you are bc sidebar doesn't expand that deep. personally like that langchain docs are all titled "How to..." but dont feel strongly

agola11 · 2024-11-13T05:23:36Z

docs/evaluation/how_to_guides/evaluation/async.mdx

This one is going to be much needed

agola11 · 2024-11-13T05:25:00Z

docs/evaluation/how_to_guides/evaluation/dataset_subset.mdx

+
+You can use the `list_examples` / `listExamples` method to fetch a subset of examples from a dataset to evaluate on. You can refer to guide above to learn more about the different ways to fetch examples.
+
+One common workflow is to fetch examples that have a certain metadata key-value pair.


would comment that we have much more powerful filtering capabilities (particularly for metadata) using the filter parameter: https://docs.smith.langchain.com/evaluation/how_to_guides/datasets/manage_datasets_programmatically#list-examples-by-structured-filter

agola11 · 2024-11-13T05:27:47Z

docs/evaluation/how_to_guides/index.md

+- [Run an evaluation using the SDK](./how_to_guides/evaluation/evaluate_llm_application)
+- [Run an evaluation asynchronously](./how_to_guides/evaluation/async)
+- [Run an evaluation comparing two experiments](./how_to_guides/evaluation/evaluate_pairwise)
+- [Run an evaluation of a LangChain / LangGraph object](./how_to_guides/evaluation/langchain_runnable)


Probably worth it to cover LangChain and LangGraph in separate guides?

There's some more interesting things in LG like evaluating a single node, traectory, etc. It might be worth it to call out the documentation for running a larger eval in the LG eval guide.

We have this tutorial I hacked together with Lance a while back but it's definitely overly complicated and I wonder if bits of it can be adapted to a more concise how to guide https://docs.smith.langchain.com/evaluation/tutorials/agents

i kinda think this should live in langgraph docs and be linked from here

agola11 · 2024-11-13T05:28:57Z

docs/evaluation/how_to_guides/index.md

+### Run an evaluation
+- [Run an evaluation using the SDK](./how_to_guides/evaluation/evaluate_llm_application)
+- [Run an evaluation asynchronously](./how_to_guides/evaluation/async)
+- [Run an evaluation comparing two experiments](./how_to_guides/evaluation/evaluate_pairwise)


nit: Run a comparative evaluation?

baskaryan · 2024-11-14T03:04:24Z

docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx

@agola11 can review this one

baskaryan · 2024-11-18T18:56:20Z

docs/evaluation/how_to_guides/evaluation/async.mdx

baskaryan · 2024-11-18T18:56:26Z

docs/evaluation/how_to_guides/evaluation/custom_evaluator.mdx

baskaryan · 2024-11-18T18:56:32Z

docs/evaluation/how_to_guides/evaluation/dataset_subset.mdx

baskaryan · 2024-11-18T18:56:36Z

docs/evaluation/how_to_guides/evaluation/dataset_version.mdx

baskaryan · 2024-11-18T18:56:43Z

docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx

baskaryan · 2024-11-18T18:57:07Z

docs/evaluation/how_to_guides/evaluation/metric_type.mdx

baskaryan · 2024-11-18T18:57:13Z

docs/evaluation/how_to_guides/evaluation/multiple_scores.mdx

baskaryan · 2024-11-18T18:57:18Z

docs/evaluation/how_to_guides/evaluation/rate_limiting.mdx

baskaryan · 2024-11-18T18:57:33Z

docs/evaluation/how_to_guides/index.md

baskaryan · 2024-11-18T18:57:43Z

docs/evaluation/index.mdx

wip: eval how to revamp

f7f83a9

vercel bot deployed to Preview November 13, 2024 01:50 View deployment

agola11 reviewed Nov 13, 2024

View reviewed changes

agola11 approved these changes Nov 13, 2024

View reviewed changes

wip

3ae30e2

baskaryan commented Nov 14, 2024

View reviewed changes

docs/evaluation/how_to_guides/evaluation/evaluate_llm_application.mdx Outdated

Copy link

Contributor Author

baskaryan Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agola11 can review this one

vercel bot had a problem deploying to Preview November 14, 2024 03:05 Failure

wip

b7d4341

vercel bot had a problem deploying to Preview November 14, 2024 17:17 Failure

baskaryan added 2 commits November 14, 2024 13:46

wip

e4efc7d

fix

e258077

vercel bot deployed to Preview November 14, 2024 21:49 View deployment

wip

26a8101

vercel bot had a problem deploying to Preview November 15, 2024 00:24 Failure

wip

ef5b3ae

vercel bot had a problem deploying to Preview November 15, 2024 16:21 Failure

wip

ebbd945

vercel bot had a problem deploying to Preview November 15, 2024 17:06 Failure

links

028de02

vercel bot deployed to Preview November 15, 2024 17:09 View deployment

intro

7d021f6

vercel bot deployed to Preview November 15, 2024 17:41 View deployment

wip

7520c48

vercel bot deployed to Preview November 15, 2024 18:17 View deployment

wip

0eebf98

vercel bot deployed to Preview November 15, 2024 23:24 View deployment

wip

25c88a0

vercel bot deployed to Preview November 16, 2024 02:39 View deployment

wip

ac3e7f3

vercel bot had a problem deploying to Preview November 16, 2024 03:33 Failure

wip

bbcbe33

vercel bot deployed to Preview November 17, 2024 21:21 View deployment

baskaryan commented Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: eval how to revamp #525

wip: eval how to revamp #525

baskaryan commented Nov 13, 2024

vercel bot commented Nov 13, 2024 •

edited

Loading

agola11 left a comment

agola11 Nov 13, 2024

baskaryan Nov 13, 2024

agola11 Nov 13, 2024

agola11 Nov 13, 2024

agola11 Nov 13, 2024

baskaryan Nov 13, 2024

agola11 Nov 13, 2024

baskaryan Nov 14, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024

baskaryan Nov 18, 2024


		You can use the `list_examples` / `listExamples` method to fetch a subset of examples from a dataset to evaluate on. You can refer to guide above to learn more about the different ways to fetch examples.

		One common workflow is to fetch examples that have a certain metadata key-value pair.

wip: eval how to revamp #525

Are you sure you want to change the base?

wip: eval how to revamp #525

Conversation

baskaryan commented Nov 13, 2024

vercel bot commented Nov 13, 2024 • edited Loading

agola11 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vercel bot commented Nov 13, 2024 •

edited

Loading