diff --git a/docs/book/.gitbook/assets/cloud/add_descriptor_tab.gif b/docs/book/.gitbook/assets/cloud/add_descriptor_tab.gif new file mode 100644 index 0000000000..abebd63baa Binary files /dev/null and b/docs/book/.gitbook/assets/cloud/add_descriptor_tab.gif differ diff --git a/docs/book/get-started/quickstart-llm.md b/docs/book/get-started/quickstart-llm.md index 35b8c94744..6cff2d726f 100644 --- a/docs/book/get-started/quickstart-llm.md +++ b/docs/book/get-started/quickstart-llm.md @@ -8,7 +8,7 @@ You can run this example in Colab or any Python environment. Install the Evidently Python library. -``` +```python !pip install evidently[llm] ``` @@ -19,15 +19,10 @@ import pandas as pd from sklearn import datasets from evidently.report import Report from evidently.metric_preset import TextEvals - -import nltk -nltk.download('words') -nltk.download('wordnet') -nltk.download('omw-1.4') -nltk.download('vader_lexicon') +from evidently.descriptors import * ``` -**Optional**. Import components to send evaluation results to Evidently Cloud: +**Optional**. Import the components to send evaluation results to Evidently Cloud: ```python from evidently.ui.workspace.cloud import CloudWorkspace @@ -35,32 +30,34 @@ from evidently.ui.workspace.cloud import CloudWorkspace # 2. Import the toy dataset -Import a toy dataset with e-commerce reviews. It contains a column with "Review_Text" that you'll analyze. +Import a toy dataset with e-commerce reviews. It contains a column with "Review_Text". You will take 100 rows to analyze. ```python -reviews_data = datasets.fetch_openml(name='Womens-E-Commerce-Clothing-Reviews', version=2, as_frame='auto') +reviews_data = datasets.fetch_openml( + name='Womens-E-Commerce-Clothing-Reviews', + version=2, as_frame='auto') reviews = reviews_data.frame[:100] ``` -# 3. Run the evals +# 3. Run your first eval -Run an evaluation Preset to check basic text descriptive text properties: -* text sentiment (scale -1 to 1) -* text length (number of symbols) -* number of sentences in a text -* percentage of out-of-vocabulary words (scale 0 to 100) -* percentage of non-letter characters (scale 0 to 100) +Run a few basic evaluations for all texts in the "Review_Text" column: +* text sentiment (measured on a scale from -1 for negative to 1 for positive) +* text length (returns an absolute number of symbols) ```python text_evals_report = Report(metrics=[ - TextEvals(column_name="Review_Text") - ] -) + TextEvals(column_name="Review_Text", descriptors=[ + Sentiment(), + TextLength(), + ] + ), +]) text_evals_report.run(reference_data=None, current_data=reviews) ``` -There are more evals to choose from. You can also create custom ones, including LLM-as-a-judge. +There are 20+ built-in evals to choose from. You can also create custom ones, including LLM-as-a-judge. We call the result of each such evaluation a `descriptor`. View a Report in Python: @@ -68,22 +65,20 @@ View a Report in Python: text_evals_report ``` -You will see a summary distribution of results for each evaluation. +You will see the summary results: the distribution of length and sentiment for all evaluated texts. # 4. Send results to Evidently Cloud -To record and monitor evaluations over time, send them to Evidently Cloud. You'll need an API key. -* Sign up for an [Evidently Cloud account](https://app.evidently.cloud/signup), and create your Organization. -* Click on the **Teams** icon on the left menu. Create a Team - for example, "Personal". Copy and save the team ID. ([Team page](https://app.evidently.cloud/teams)). -* Click the **Key** icon in the left menu to go. Generate and save the token. ([Token page](https://app.evidently.cloud/token)). - -Connect to Evidently Cloud using your token. +To record and monitor evaluations over time, send them to Evidently Cloud. +* **Sign up**. Create an [Evidently Cloud account](https://app.evidently.cloud/signup) and your Organization. +* **Add a Team**. Click **Teams** in the left menu. Create a Team, copy and save the Team ID. ([Team page](https://app.evidently.cloud/teams)). +* **Get your API token**. Click the **Key** icon in the left menu to go. Generate and save the token. ([Token page](https://app.evidently.cloud/token)). +* **Connect to Evidently Cloud**. Pass your API key to connect from your Python environment. ```python -ws = CloudWorkspace(token="YOUR_TOKEN_HERE", url="https://app.evidently.cloud") +ws = CloudWorkspace(token="YOUR_API_TOKEN", url="https://app.evidently.cloud") ``` - -Create a Project inside your Team. Pass the `team_id`: +* **Create a Project**. Create a new Project inside your Team, adding your title and description: ```python project = ws.create_project("My test project", team_id="YOUR_TEAM_ID") @@ -91,21 +86,27 @@ project.description = "My project description" project.save() ``` -Send the Report to the Cloud: +* **Upload the Report to the Project**. Send the evaluation results: ```python ws.add_report(project.id, text_evals_report) ``` -Go to the Evidently Cloud. Open your Project and head to the "Reports" in the left menu. ([Cloud home](https://app.evidently.cloud/)). +* **View the Report**. Go to the Evidently Cloud. Open your Project and head to the "Reports" in the left menu. ([Cloud home](https://app.evidently.cloud/)). ![](../.gitbook/assets/cloud/toy_text_report_preview.gif) -In the future, you can log ongoing evaluation results to build monitoring panels and send alerts. +# 5. Get a dashboard + +Go to the "Dashboard" tab and enter the "Edit" mode. Add a new tab, and select the "Descriptors" template. + +You'll see a set of panels that show Sentiment and Text Length with a single data point. As you log ongoing evaluation results, you can track trends and set up alerts. + +![](../.gitbook/assets/cloud/add_descriptor_tab.gif) # Want to see more? -Check out a more in-depth tutorial to learn key workflows. It covers using LLM-as-a-judge, running conditional test suites, monitoring results over time and more. +Check out a more in-depth tutorial to learn key workflows. It covers using LLM-as-a-judge, running conditional test suites, monitoring results over time, and more. {% content-ref url="tutorial-llm.md" %} [Evidently LLM Tutorial](tutorial-llm.md). diff --git a/docs/book/monitoring/design_dashboard_api.md b/docs/book/monitoring/design_dashboard_api.md index ed2ef30229..8c1f245284 100644 --- a/docs/book/monitoring/design_dashboard_api.md +++ b/docs/book/monitoring/design_dashboard_api.md @@ -434,7 +434,7 @@ project.dashboard.add_panel( **Aggregated by Status**. To show the total number of failed Tests (status filter), with daily level aggregation. -``` +```python project.dashboard.add_panel( DashboardPanelTestSuite( title="All tests: aggregated", @@ -452,7 +452,7 @@ project.dashboard.add_panel( **Filtered by Test ID**. To show all results for a specified list of Tests (on constant columns, missing values, empty rows) with daily-level aggregation. -``` +```python project.dashboard.add_panel( DashboardPanelTestSuite( title="Data quality tests",