Skip to content

Commit

Permalink
Merge branch 'main' into marc/upgrade-scalar
Browse files Browse the repository at this point in the history
  • Loading branch information
hinthornw authored Jul 16, 2024
2 parents e181a1b + b435570 commit 5ffeea8
Show file tree
Hide file tree
Showing 104 changed files with 2,252 additions and 565 deletions.
2 changes: 1 addition & 1 deletion docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ Check out the following sections to learn more about LangSmith:

- **[User Guide](./user_guide.mdx)**: Learn about the workflows LangSmith supports at each stage of the LLM application lifecycle.
- **[Pricing](/pricing)**: Learn about the pricing model for LangSmith.
- **[Self-Hosting](/category/self-hosting)**: Learn about self-hosting options for LangSmith.
- **[Self-Hosting](./self_hosting)**: Learn about self-hosting options for LangSmith.
- **[Tracing](./tracing/index.mdx)**: Learn about the tracing capabilities of LangSmith.
- **[Evaluation](./evaluation/index.mdx)**: Learn about the evaluation capabilities of LangSmith.

Expand Down
2 changes: 1 addition & 1 deletion docs/self_hosting/usage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ table_of_contents: true
This guide will walk you through the process of using your self-hosted instance of LangSmith.

:::important Self-Hosted LangSmith Instance Required
This guide assumes you have already deployed a self-hosted LangSmith instance. If you have not, please refer to the [kubernetes deployment guide](/self_hosting/kubernetes) or the [docker deployment guide](/self_hosting/docker).
This guide assumes you have already deployed a self-hosted LangSmith instance. If you have not, please refer to the [kubernetes deployment guide](/self_hosting/installation/kubernetes) or the [docker deployment guide](/self_hosting/installation/docker).
:::

### Using your deployment:
Expand Down
4 changes: 4 additions & 0 deletions docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,10 @@ const config = {
type: "search",
position: "right",
},
{
type: "custom-RegionSelector",
position: "right",
},
{
href: "https://smith.langchain.com/",
label: "Go to App",
Expand Down
18 changes: 18 additions & 0 deletions src/components/InstructionsWithCode.js
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,24 @@ export function ShellBlock(content, value = "shell", label = "Shell") {
};
}

export function HelmBlock(content, value = "yaml", label = "Helm") {
return {
value,
label,
content,
language: "yaml",
};
}

export function DockerBlock(content, value = ".env", label = "Docker") {
return {
value,
label,
content,
language: "dockerfile",
};
}

/**
* @param {string} code
* @param {"typescript" | "python"} language
Expand Down
47 changes: 47 additions & 0 deletions src/components/Region.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import React, { useState, useEffect } from "react";

export default function RegionSelector() {
const [selectedRegion, setSelectedRegion] = useState("US");

useEffect(() => {
setSelectedRegion(localStorage.getItem("ls:docs:langsmithRegion") || "US");
}, []);

const handleRegionChange = (region) => {
setSelectedRegion(region);
localStorage.setItem("ls:docs:langsmithRegion", region);
window.dispatchEvent(new Event("storage"));
};

return (
<div className="navbar__item dropdown dropdown--hoverable">
<div aria-haspopup="true" className="navbar__link">
Region
</div>
<ul className="dropdown__menu regions-dropdown">
<li
onClick={() => handleRegionChange("US")}
onKeyDown={() => {}}
role="menuitem"
style={{
color:
selectedRegion === "US" ? "var(--ifm-color-primary)" : "gray",
}}
>
US
</li>
<li
onClick={() => handleRegionChange("EU")}
onKeyDown={() => {}}
role="menuitem"
style={{
color:
selectedRegion === "EU" ? "var(--ifm-color-primary)" : "gray",
}}
>
EU
</li>
</ul>
</div>
);
}
37 changes: 37 additions & 0 deletions src/components/RegionalUrls.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import React, { useState, useEffect } from "react";

const DOMAINS = {
US: {
langsmith: "smith.langchain.com",
api: "api.smith.langchain.com",
},
EU: {
langsmith: "eu.smith.langchain.com",
api: "eu.api.smith.langchain.com",
},
};

export function RegionalUrl({ text, type = "langsmith", suffix = "" }) {

Check warning on line 14 in src/components/RegionalUrls.js

View workflow job for this annotation

GitHub Actions / Check linting

Prefer default export on a file with single export
const [domains, setDomains] = useState(DOMAINS.US);

useEffect(() => {
const storedRegion =
localStorage.getItem("ls:docs:langsmithRegion") || "US";
setDomains(DOMAINS[storedRegion]);
const handleStorageChange = () => {
setDomains(
DOMAINS[localStorage.getItem("ls:docs:langsmithRegion") || "US"]
);
};

window.addEventListener("storage", handleStorageChange);

return () => {
window.removeEventListener("storage", handleStorageChange);
};
}, []);

const domain = domains[type];
const resolvedUrl = `https://${domain}${suffix}`;
return <a href={resolvedUrl}>{text || resolvedUrl}</a>;
}
12 changes: 12 additions & 0 deletions src/css/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,18 @@ html[data-theme="dark"] {
border-top-right-radius: var(--ifm-code-border-radius);
}

.regions-dropdown {
min-width: 50px;
width: 100px;
}

.regions-dropdown > li {
width: "100%";
cursor: pointer;
padding: 5px;
text-align: center;
}

/* media dark mode */
@media (prefers-color-scheme: dark) {
.tabs-container > .code-tabs {
Expand Down
7 changes: 7 additions & 0 deletions src/theme/NavbarItem/ComponentTypes.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import ComponentTypes from "@theme-original/NavbarItem/ComponentTypes";
import RegionSelector from "../../components/Region";

export default {
...ComponentTypes,
"custom-RegionSelector": RegionSelector,
};
42 changes: 24 additions & 18 deletions versioned_docs/version-2.0/concepts/admin/admin.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,11 @@ There are a few important differences between your personal organization and sha
| Collaboration | Cannot invite users | Can invite users |
| Billing: paid plans | Developer plan only | All other plans available |

## Workspaces {#workspaces}
## Workspaces

:::info
Workspaces were formerly called Tenants. Some code and APIs may still reference the old name for a period of time during the transition.
:::

A workspace is a logical grouping of users and resources within an organization. Users may have permissions in a workspace that grant them access to the resources in that workspace, including tracing projects, datasets, annotation queues, and prompts. For more details, see the [setup guide](../../how_to_guides/setup/set_up_workspace.mdx).

Expand All @@ -42,25 +46,27 @@ graph TD

See the table below for details on which features are available in which scope (organization or workspace):

| Resource/Setting | Scope |
| --------------------------------------------------------------------------- | ------------ |
| Trace Projects | Workspace |
| Annotation Queues | Workspace |
| Deployments | Workspace |
| Datasets &amp; Testing | Workspace |
| Prompts | Workspace |
| API Keys | Workspace |
| Settings including Secrets, Feedback config, Models, Rules, and Shared URLs | Workspace |
| User management: Invite User to Workspace | Workspace |
| RBAC: Assigning Workspace Roles | Workspace |
| Data Retention, Usage Limits | Workspace\* |
| Plans and Billing, Credits, Invoices | Organization |
| User management: Invite User to Organization | Organization |
| Adding Workspaces | Organization |
| Assigning Organization Roles | Organization |
| RBAC: Creating/Editing/Deleting Custom Roles | Organization |
| Resource/Setting | Scope |
| --------------------------------------------------------------------------- | ---------------- |
| Trace Projects | Workspace |
| Annotation Queues | Workspace |
| Deployments | Workspace |
| Datasets &amp; Testing | Workspace |
| Prompts | Workspace |
| API Keys | Workspace |
| Settings including Secrets, Feedback config, Models, Rules, and Shared URLs | Workspace |
| User management: Invite User to Workspace | Workspace |
| RBAC: Assigning Workspace Roles | Workspace |
| Data Retention, Usage Limits | Workspace\* |
| Plans and Billing, Credits, Invoices | Organization |
| User management: Invite User to Organization | Organization\*\* |
| Adding Workspaces | Organization |
| Assigning Organization Roles | Organization |
| RBAC: Creating/Editing/Deleting Custom Roles | Organization |

\*&nbsp;Data retention settings and usage limits will be available soon for the organization level as well
\*\*&nbsp;Self-hosted installations may enable workspace-level invites of users to the organization via a feature flag.
See the [self-hosted user management docs](../../self_hosting/configuration/user_management) for details.

## Users

Expand Down
3 changes: 2 additions & 1 deletion versioned_docs/version-2.0/concepts/tracing/tracing.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import { RegionalUrl } from "@site/src/components/RegionalUrls";
import ThemedImage from "@theme/ThemedImage";

# Tracing
Expand Down Expand Up @@ -86,7 +87,7 @@ If you wish to remove a trace from LangSmith sooner than the expiration date, La
This can be accomplished:

- in the LangSmith UI via the "Delete" option on the Project's overflow menu
- via the [Delete Tracer Sessions](https://api.smith.langchain.com/redoc#tag/tracer-sessions/operation/delete_tracer_session_api_v1_sessions__session_id__delete) API endpoint
- via the <RegionalUrl text='Delete Tracer Sessions' type='api' suffix='/redoc#tag/tracer-sessions/operation/delete_tracer_session_api_v1_sessions__session_id__delete' /> API endpoint
- via `delete_project()` (Python) or `deleteProject()` (JS/TS) in the LangSmith SDK

LangSmith does not support self-service deletion of individual traces at this time.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The LangSmith SDK takes steps to minimize the likelihood of reaching these limit

This 429 is the result of reaching your maximum hourly events ingested and is evaluated in a fixed window starting at the beginning of each clock hour in UTC and resets at the top of each new hour.

An event in this context the creation or update of a run. So if run is created, then subsequently updated in the same hourly window, that will count as 2 events against this limit.
An event in this context is the creation or update of a run. So if run is created, then subsequently updated in the same hourly window, that will count as 2 events against this limit.

This is thrown by our application and varies by plan tier, with organizations on our Startup/Plus and Enterprise plan tiers having higher hourly limits than our Free and Developer Plan Tiers which are designed for personal use.

Expand Down
7 changes: 7 additions & 0 deletions versioned_docs/version-2.0/how_to_guides/datasets/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# How-to guides: Datasets

This section contains how-to guides related to working with datasets.

import DocCardList from "@theme/DocCardList";

<DocCardList />
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ You can do this from any 'run' details page by clicking the 'Add to Dataset' but

:::tip
An extremely powerful technique to build datasets is to drill-down into the most interesting traces, such as traces that were tagged with poor user feedback, and add them to a dataset.
For tips on how to filter traces, see the [filtering traces] guide.
For tips on how to filter traces, see the [filtering traces](../monitoring/filter_traces_in_application) guide.
:::

:::tip automations
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -344,3 +344,33 @@ For example, if you have an example with metadata `{"foo": "bar", "baz": "qux"}`
]}
groupId="client-language"
/>

### List examples by structured filter

Similar to how you can use the structured filter query language to [fetch runs](../tracing/export_traces#use-filter-query-language), you can use it to fetch examples.

:::note

This is currently only available in v0.1.83 and later of the Python SDK and v0.1.35 and later of the TypeScript SDK.

Additionally, the structured filter query language is only supported for `metadata` fields.

:::

You can use the `has` operator to fetch examples with metadata fields that contain specific key/value pairs and the `exists` operator to fetch examples with metadata fields that contain a specific key.
Additionally, you can also chain multiple filters together using the `and` operator and negate a filter using the `not` operator.

<CodeTabs
tabs={[
PythonBlock(
`examples = client.list_examples(
dataset_name=dataset_name,
filter='and(not(has(metadata, \\'{"foo": "bar"}\\')), exists(metadata, "tenant_id"))'
)`
),
TypeScriptBlock(
`const examples = await client.listExamples({datasetName: datasetName, filter: 'and(not(has(metadata, \\'{"foo": "bar"}\\')), exists(metadata, "tenant_id"))'});`
),
]}
groupId="client-language"
/>
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
sidebar_position: 4
---

import { RegionalUrl } from "@site/src/components/RegionalUrls";

# Share or unshare a dataset publicly

:::caution
Expand All @@ -24,5 +26,5 @@ To "unshare" a dataset, either
1. Click on **Unshare** by click on **Public** in the upper right hand corner of any publicly shared dataset, then **Unshare** in the dialog.
![](../static/unshare_dataset.png)

2. Navigate to your organization's list of publicly shared dataset, either by clicking on **Settings** -> **Shared URLs** or [this link](https://smith.langchain.com/settings/shared), then click on **Unshare** next to the dataset you want to unshare.
2. Navigate to your organization's list of publicly shared dataset, either by clicking on **Settings** -> **Shared URLs** or <RegionalUrl text='this link' suffix='/settings/shared' />, then click on **Unshare** next to the dataset you want to unshare.
![](../static/unshare_trace_list.png)
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ LLM-as-a-judge evaluators don't always get it right. Because of this, it is ofte

## In the comparison view

In the comparison view, you may click on any feedback tag to bring up the feedback details. From there, click the "edit" icon on the right to bring up the corrections view.
In the comparison view, you may click on any feedback tag to bring up the feedback details. From there, click the "edit" icon on the right to bring up the corrections view. You may then type in your desired score in the text box under "Make correction".
If you would like, you may also attach an explanation to your correction. This is useful if you are using a [few-shot evaluator](./create_few_shot_evaluators) and will be automatically inserted into your few-shot examples
in place of the `few_shot_explanation` prompt variable.

![Audit Evaluator Comparison View](./static/corrections_comparison_view.png)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
sidebar_position: 10
---

# Create few-shot evaluators

Using LLM-as-a-Judge evaluators can be very helpful when you can't evaluate your system programmatically. However, improving/iterating on these prompts can add unnecessary
overhead to the development process of an LLM-based application - you now need to maintain both your application **and** your evaluators. To make this process easier, LangSmith allows
you to automatically collect human corrections on evaluator prompts, which are then inserted into your prompt as few-shot examples.

:::tip Recommended Reading
Before learning how to create few-shot evaluators, it might be helpful to learn how to setup automations (both online and offline) and how to leave corrections on evaluator scores:

- [Set up online evaluations](../monitoring/online_evaluations)
- [Bind an evaluator to a dataset in the UI (offline evaluation)](./bind_evaluator_to_dataset)
- [Audit evaluator scores](./audit_evaluator_scores)

:::

## Create your evaluator

:::tip
The default maximum few-shot examples to use in the prompt is 5. Examples are pulled randomly from your dataset (if you have more than the maximum).

:::

When creating an [online](../monitoring/online_evaluations) or [offline](./bind_evaluator_to_dataset) evaluator - from a tracing project or a dataset, respectively - you will see the option to use corrections as few-shot examples. Note that these types of evaluators
are only supported when using mustache prompts - you will not be able to click this option if your prompt uses f-string formatting. When you select this,
we will auto-create a few-shot prompt for you. Each individual few-shot example will be formatted according to this prompt, and inserted into your main prompt in place of the `{{Few-shot examples}}`
template variable which will be auto-added above. Your few-shot prompt should contain the same variables as your main prompt, plus a `few_shot_explanation` and a score variable which should have the same name
as your output key. For example, if your main prompt has variables `question` and `response`, and your evaluator outputs a `correctness` score, then your few-shot prompt should have `question`, `response`,
`few_shot_explanation`, and `correctness`.

You may also specify the number of few-shot examples to use. The default is 5. If your examples will tend to be very long, you may want to set this number lower to save tokens - whereas if your examples tend
to be short, you can set a higher number in order to give your evaluator more examples to learn from. If you have more examples in your dataset than this number, we will randomly choose them for you.

![Use corrections as few-shot examples](./static/use_corrections_as_few_shot.png)

Note that few-shot examples are not currently supported in evaluators that use Hub prompts.

Once you create your evaluator, we will automatically create a dataset for you, which will be auto-populated with few-shot examples once you start making corrections.

## Make corrections

:::note Main Article
[Audit evaluator scores](./audit_evaluator_scores)
:::
As you start logging traces or running experiments, you will likely disagree with some of the scores that your evaluator has given. When you [make corrections to these scores](./audit_evaluator_scores), you will
begin seeing examples populated inside your corrections dataset. As you make corrections, make sure to attach explanations - these will get populated into your evaluator prompt in place of the `few_shot_explanation` variable.

The inputs to the few-shot examples will be the relevant fields from the inputs, outputs, and reference (if this an offline evaluator) of your chain/dataset.
The outputs will be the corrected evaluator score and the explanations that you created when you left the corrections. Feel free to edit these to your liking. Here is an example of a few-shot example in a corrections dataset:

![Few-shot example](./static/few_shot_example.png)

Note that the corrections may take a minute or two to be populated into your few-shot dataset. Once they are there, future runs of your evaluator will include them in the prompt!

## View your corrections dataset

In order to view your corrections dataset, go to your rule and click "Edit Rule" (or "Edit Evaluator" from a dataset):

![Edit Evaluator](./static/edit_evaluator.png)

If this is an online evaluator (in a tracing project), you will need to click to edit your prompt:

![Edit Prompt](./static/click_to_edit_prompt.png)

From this screen, you will see a button that says "View few-shot dataset". Clicking this will bring you to your dataset of corrections, where you can view and update your few-shot examples:

![View few-shot dataset](./static/view_few_shot_ds.png)
Loading

0 comments on commit 5ffeea8

Please sign in to comment.