Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of a Self-Hosted Overview Doc #286

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion versioned_docs/version-2.0/self_hosting/docker.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Docker
sidebar_position: 2
sidebar_position: 3
table_of_contents: true
---

Expand Down
8 changes: 4 additions & 4 deletions versioned_docs/version-2.0/self_hosting/kubernetes.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Kubernetes
sidebar_position: 1
sidebar_position: 2
table_of_contents: true
---

Expand Down Expand Up @@ -61,19 +61,19 @@ Ensure you have the following tools/items ready. Some items are marked optional
1. You can configure oauth using the `values.yaml` file. You will need to provide a `client_id` and `client_issuer_url` for your OAuth provider.
2. Note, we do rely on the OIDC Authorization Code with PKCE flow. We currently support almost anything that is OIDC compliant however Google does not support this flow.
3. Without OAuth, you will not be able to create users or organizations.
8. External Postgres(optional).
8. External Postgres(optional, but strongly recommended).
1. You can configure external postgres using the `values.yaml` file. You will need to provide connection parameters for your postgres instance.
2. If using a schema other than public, ensure that you do not have any other schemas with the pgcrypto extension enabled, or you must include that in your search path.
3. If your password contains special characters, you may need to url encode your password in the connection string.
4. Note: We do only officially support Postgres versions >= 14.
9. External Redis(optional).
9. External Redis(optional, but strongly recommended).
1. You can configure external redis using the `values.yaml` file. You will need to provide a connection url for your redis instance.
2. If using TLS, ensure that you use `rediss://` instead of `redis://. E.g "rediss://langsmith-redis:6380/0?password=foo"
3. We only official support Redis versions >= 6. We also do not support sharded/clustered Redis(though you can point to a single node in a cluster).
10. External ClickHouse(optional).
1. You can configure external clickhouse using the `values.yaml` file. You will need to provide several connection parameters for your ClickHouse instance.
2. If using TLS, make sure to set `clickhouse.external.tls` to `true`.
3. We only officially support ClickHouse versions >= 23. We also only support standalone ClickHouse or ClickHouse Cloud(not clustered or replicated).
3. We only officially support ClickHouse versions >= 23.9 We also only support standalone ClickHouse (not clustered or replicated) or ClickHouse Cloud.

## Configure your Helm Charts:

Expand Down
72 changes: 72 additions & 0 deletions versioned_docs/version-2.0/self_hosting/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
sidebar_label: Overview
sidebar_position: 1
table_of_contents: true
---

# Self-hosted LangSmith

:::important Enterprise License Required
Self-hosted LangSmith is an add-on to the Enterprise Plan designed for our largest, most security-conscious customers. See our [pricing page](https://www.langchain.com/pricing) for more detail, and contact us at [email protected] if you want to get a license key to trial LangSmith in your environment.
:::

LangSmith can be run via Kubernetes (recommended) or Docker in a Cloud environment that you control.

The LangSmith Service consists of 3 major components that are deployed behind an application load balancer:

- frontend
- backend
- queue

![./static/self_hosted_dataflow.png](./static/self_hosted_dataflow.png)

## Architectual overview

### Frontend

The frontend serves the LangSmith UI and is responsible for making API calls on behalf of a user navigating the LangSmith application via a browser.

### Backend

The backend services LangSmith API calls and is responsible for interacting with the settings datastore (Postgres), the in-memory caching engine (Redis), and the the runs and feedback datastore (ClickHouse). When a trace is submitted to LangSmith from your LLM application, it uses an API key to submit run data detailing the interaction of the application with the LLM including inputs, outputs and metadata. This API call is sent through the load balancer to the LangSmith backend, where the request is checked for authentication, authorization, and data integrity. Once confirmed, the run data is sent to the queue.

### Queue

The queue handles incoming runs and feedback to ensure that they are ingested and persisted into the runs and feedback datastore asychronously, handling checks for data integrity and ensuring successful insert into the datastore, handling retries in situations such as database errors or the temporary inability to connect to the database.

## Frequently Asked Questions

### Can I bring my own Postgres & Redis instance? Do I need to?

You can supply your own Postgres and Redis instances, subject to some limitations. _We strongly recomend this_ so that you can take advantage of existing sizing, scaling, backup provisions, and performance monitoring of these services that your organization may already have.

For Postgres, we support Postgres v14 and higher.

For Redis, we support Redis versions >= 6. We also do not support sharded/clustered Redis, though you **can** point to a single node in a Redis cluster.

### How do I set up authentication?

The default LangSmith deployment is an unauthenticated instance. However, we strongly recommend you configure authentication for the service. We support single-sign on authentication via an OIDC Authorization Code with PKCE flow. Most OIDC compliant identity solutions support this (e.g. Okta, Azure), though Google's OIDC implementation is a notable exception.

Once you have worked with your Identity Provider to configure OIDC+PKCE, details can be found in the Kubernetes or Docker reference docs on how to set the appropriate values in configuration files.

### What is ClickHouse and why do I need it?

ClickHouse is a highly scalable, open-source column-oriented database management system that allows generating analytical data reports in real-time using SQL queries. Given the large potential volume of traces that would be associated with logging large percentages (or even 100%) of application-LLM interactions, such when storing traces for compliance and auditability purposes,
and associated run data that can be populated into LangSmith, meant needing to support high throughput ingestion as well as fast filtering for drill-downs on charts in the user interface. We learned that postgres was unable to support LangSmith's analytical workloads at scale. ClickHouse is specifically designed to handle the kinds of analytical workloads that LangSmith generates.

We include an open-source, single node instance of ClickHouse that writes to a persistent volume that you supply as part of your configuration. If you have restrictions on running stateful services or accessing persistent volumes within your environment, we'd like to understand those requirements better but notably we do support ClickHouse Cloud as well.

For more information about how and why we chose ClickHouse you can learn more [here](https://clickhouse.com/blog/langchain-why-we-choose-clickhouse-to-power-langchain)

### How do I scale my resource?

[We should write a sizing guide and link to it here]

### Has LangSmith been load tested, what scale can you handle?

The same code that we use in deploying LangSmith locally is used to power our SaaS service, which currently receives > xxx M traces per day.

### What is the OpenAI key used for? Do I need to supply it in the config?

The OpenAI key is used for our natural language search feature as well as evaluators. Not that this global setting is optional and you can specify an OpenAI key interactively in the application as well.
2 changes: 1 addition & 1 deletion versioned_docs/version-2.0/self_hosting/release_notes.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Release Notes (Self-Hosted)
sidebar_position: 4
sidebar_position: 5
---

# LangSmith Release Notes
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion versioned_docs/version-2.0/self_hosting/usage.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Usage
sidebar_position: 3
sidebar_position: 4
table_of_contents: true
---

Expand Down
Loading