Skip to content

Commit

Permalink
Docs: update to LLM quickstart (evidentlyai#1181)
Browse files Browse the repository at this point in the history
  • Loading branch information
elenasamuylova committed Jul 2, 2024
1 parent c466a52 commit 286a44c
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 37 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 36 additions & 35 deletions docs/book/get-started/quickstart-llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ You can run this example in Colab or any Python environment.

Install the Evidently Python library.

```
```python
!pip install evidently[llm]
```

Expand All @@ -19,93 +19,94 @@ import pandas as pd
from sklearn import datasets
from evidently.report import Report
from evidently.metric_preset import TextEvals

import nltk
nltk.download('words')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('vader_lexicon')
from evidently.descriptors import *
```

**Optional**. Import components to send evaluation results to Evidently Cloud:
**Optional**. Import the components to send evaluation results to Evidently Cloud:

```python
from evidently.ui.workspace.cloud import CloudWorkspace
```

# 2. Import the toy dataset

Import a toy dataset with e-commerce reviews. It contains a column with "Review_Text" that you'll analyze.
Import a toy dataset with e-commerce reviews. It contains a column with "Review_Text". You will take 100 rows to analyze.

```python
reviews_data = datasets.fetch_openml(name='Womens-E-Commerce-Clothing-Reviews', version=2, as_frame='auto')
reviews_data = datasets.fetch_openml(
name='Womens-E-Commerce-Clothing-Reviews',
version=2, as_frame='auto')
reviews = reviews_data.frame[:100]
```

# 3. Run the evals
# 3. Run your first eval

Run an evaluation Preset to check basic text descriptive text properties:
* text sentiment (scale -1 to 1)
* text length (number of symbols)
* number of sentences in a text
* percentage of out-of-vocabulary words (scale 0 to 100)
* percentage of non-letter characters (scale 0 to 100)
Run a few basic evaluations for all texts in the "Review_Text" column:
* text sentiment (measured on a scale from -1 for negative to 1 for positive)
* text length (returns an absolute number of symbols)

```python
text_evals_report = Report(metrics=[
TextEvals(column_name="Review_Text")
]
)
TextEvals(column_name="Review_Text", descriptors=[
Sentiment(),
TextLength(),
]
),
])

text_evals_report.run(reference_data=None, current_data=reviews)
```

There are more evals to choose from. You can also create custom ones, including LLM-as-a-judge.
There are 20+ built-in evals to choose from. You can also create custom ones, including LLM-as-a-judge. We call the result of each such evaluation a `descriptor`.

View a Report in Python:

```
text_evals_report
```

You will see a summary distribution of results for each evaluation.
You will see the summary results: the distribution of length and sentiment for all evaluated texts.

# 4. Send results to Evidently Cloud

To record and monitor evaluations over time, send them to Evidently Cloud. You'll need an API key.
* Sign up for an [Evidently Cloud account](https://app.evidently.cloud/signup), and create your Organization.
* Click on the **Teams** icon on the left menu. Create a Team - for example, "Personal". Copy and save the team ID. ([Team page](https://app.evidently.cloud/teams)).
* Click the **Key** icon in the left menu to go. Generate and save the token. ([Token page](https://app.evidently.cloud/token)).

Connect to Evidently Cloud using your token.
To record and monitor evaluations over time, send them to Evidently Cloud.
* **Sign up**. Create an [Evidently Cloud account](https://app.evidently.cloud/signup) and your Organization.
* **Add a Team**. Click **Teams** in the left menu. Create a Team, copy and save the Team ID. ([Team page](https://app.evidently.cloud/teams)).
* **Get your API token**. Click the **Key** icon in the left menu to go. Generate and save the token. ([Token page](https://app.evidently.cloud/token)).
* **Connect to Evidently Cloud**. Pass your API key to connect from your Python environment.

```python
ws = CloudWorkspace(token="YOUR_TOKEN_HERE", url="https://app.evidently.cloud")
ws = CloudWorkspace(token="YOUR_API_TOKEN", url="https://app.evidently.cloud")
```

Create a Project inside your Team. Pass the `team_id`:
* **Create a Project**. Create a new Project inside your Team, adding your title and description:

```python
project = ws.create_project("My test project", team_id="YOUR_TEAM_ID")
project.description = "My project description"
project.save()
```

Send the Report to the Cloud:
* **Upload the Report to the Project**. Send the evaluation results:

```python
ws.add_report(project.id, text_evals_report)
```

Go to the Evidently Cloud. Open your Project and head to the "Reports" in the left menu. ([Cloud home](https://app.evidently.cloud/)).
* **View the Report**. Go to the Evidently Cloud. Open your Project and head to the "Reports" in the left menu. ([Cloud home](https://app.evidently.cloud/)).

![](../.gitbook/assets/cloud/toy_text_report_preview.gif)

In the future, you can log ongoing evaluation results to build monitoring panels and send alerts.
# 5. Get a dashboard

Go to the "Dashboard" tab and enter the "Edit" mode. Add a new tab, and select the "Descriptors" template.

You'll see a set of panels that show Sentiment and Text Length with a single data point. As you log ongoing evaluation results, you can track trends and set up alerts.

![](../.gitbook/assets/cloud/add_descriptor_tab.gif)

# Want to see more?

Check out a more in-depth tutorial to learn key workflows. It covers using LLM-as-a-judge, running conditional test suites, monitoring results over time and more.
Check out a more in-depth tutorial to learn key workflows. It covers using LLM-as-a-judge, running conditional test suites, monitoring results over time, and more.

{% content-ref url="tutorial-llm.md" %}
[Evidently LLM Tutorial](tutorial-llm.md).
Expand Down
4 changes: 2 additions & 2 deletions docs/book/monitoring/design_dashboard_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,7 @@ project.dashboard.add_panel(

**Aggregated by Status**. To show the total number of failed Tests (status filter), with daily level aggregation.

```
```python
project.dashboard.add_panel(
DashboardPanelTestSuite(
title="All tests: aggregated",
Expand All @@ -452,7 +452,7 @@ project.dashboard.add_panel(

**Filtered by Test ID**. To show all results for a specified list of Tests (on constant columns, missing values, empty rows) with daily-level aggregation.

```
```python
project.dashboard.add_panel(
DashboardPanelTestSuite(
title="Data quality tests",
Expand Down

0 comments on commit 286a44c

Please sign in to comment.