Skip to content

Commit

Permalink
Test Framework documentation (#80)
Browse files Browse the repository at this point in the history
* Test Framework documentation

* docs

chatbot, split&embed, Prompts

* prompt eng doc update

* chatbot documentation page enhancement

* split and embed documentation fixed

* oci_config docs

* final doc updates

---------

Co-authored-by: John Lathouwers <[email protected]>
  • Loading branch information
corradodebari and gotsysdba authored Feb 5, 2025
1 parent 867c385 commit e1d1d39
Show file tree
Hide file tree
Showing 26 changed files with 231 additions and 18 deletions.
51 changes: 48 additions & 3 deletions docs/content/sandbox/chatbot/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,51 @@ Copyright (c) 2023, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
-->

{{% notice style="code" title="10-Sept-2024: Documentation In-Progress..." icon="pen" %}}
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
{{% /notice %}}
Differently from a common LLM playground, that helps to test an LLM on the information on which has been trained on, the OAIM Sandbox works on the chunks retrieved in the Oracle DB 23ai by similarity with the question provided, like in this example:

![Chatbot](images/chatbot.png)

The playground could be used with or without the vector stores available, to check if a pure LLM configured is aware or not about information you are looking for.

You can, first of all:

- **Enable History and Context**: in this way any further question & answer provided will be re-sent in the context to help the LLM to answer with a better grounded info, if it's checked;
- **Clear History**: this button clear the context to better understand the LLM behaviour after a long conversation;

## Chat Model
Depending on the configuration done in the **Configuration**/**Models** page, you can choose one of the **Chat model** enlisted. For each of them you can modify the most important hyper-parameters like:
- Temperature
- Maximum Tokens
- Top P
- Frequency penalty
- Presence penalty

To understand each of them, refers for example on this document: [Concepts for Generative AI](https://docs.oracle.com/en-us/iaas/Content/generative-ai/concepts.htm).

## RAG params

Clicking on the **RAG** checkbox you can quickly turn on/off the knowledge base behind the chatbot, exploiting the Retrieval Augentened Generation pattern implemented into the Oracle AI Microserves Sandbox.

![Playground](images/playground.png)

Then you can set:

- **Enable Re-Ranking**: *under development*;
- **Search Type**: it reflects the two options available on the **Oracle DB 23ai**:
- **Similarity search**
- **Maximal Marginal Relevance**.
- **Top K**: define the number of nearest chunks, found comparing the embedding vector derived by the question with the vectors associated in the vectorstore with each chunk. Take in consideration that a large number of chunk could fill the maximum context size accepted by the LLM becoming useless the text that exceeds that limit.

To search and select one of the vectorstore tables created into the DB and use it for the RAG, you could use one, or the combination of more than one parameter adopted in the chunking process, to filter the desired vectorstore:

- **Embedding Alias**
- **Embedding Model**
- **Chunk Size**
- **Chunk Overlap**
- **Distance Strategy**

Until the following message will not disappear, it means that the final vectorstore is not yet selected:

![Rag Error message](images/ragmessage.png)

The **Reset RAG** button allows you to restart the selection of another vectorestore table.
Binary file added docs/content/sandbox/chatbot/images/chatbot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 21 additions & 3 deletions docs/content/sandbox/configuration/import_settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,26 @@ weight = 40
Copyright (c) 2023, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
-->
## Import Settings

{{% notice style="default" title="10-Sept-2024: Documentation In-Progress..." icon="pen" %}}
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
{{% /notice %}}
Once you are satisfied with a specific configuration for your chatbot, the Sandbox allows you to `Download Settings` as they are in a **.json** format.

![Download settings](images/download-settings.png)

### Configuration -> Import Settings

To import the settings that were downloaded previously, you can navigate to `Configuration -> Settings`:

![Import settings](images/import-settings.png)

After clicking on `Browse files`, you will have to select your Sandbox settings **.json** file:

![Sandbox settings](images/sandbox-settings.png)

The Sandbox will detect the saved configuration and ask you to apply the new settings:

![Apply settings](images/apply-settings.png)

Once you upload the saved settings, the Sandbox will automatically update the chatbot parameters to match the ones you saved previously and that's it!

![Uploaded settings](images/uploaded-settings.png)
48 changes: 45 additions & 3 deletions docs/content/sandbox/configuration/oci_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,49 @@ Copyright (c) 2023, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
-->

{{% notice style="default" title="10-Sept-2024: Documentation In-Progress..." icon="pen" %}}
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
{{% /notice %}}
When using the split/embed functionality of the Sandbox, you can use the OCI Object storage. In this page we provide how to configure the OAIM Sandbox to use it.

## Configuration

The OCI credentials can either be configured using an /.oci/config file or through the **Sandbox** interface.

### Sandbox Interface

To configure the OCI credentials from the Sandbox, navigate to `Configuration -> OCI`:

![OCI config](images/oci-config.png)

Provide the following input:

- **User OCID**: Your personal User OCID [User OCID](#OCI-credentials) that can be retrieved on your OCI tenancy interface
- **Fingerprint**: The **Fingerprint** associated to your OCI private API key
- **Tenancy OCID**: The OCID associated to the tenancy you want to connect to, that can be retrieved on your OCI interface
- **Region**: The tenancy region you want to connect to
- **Key File**: The file path to your OCI private API key

Once all fields are set, click the `Save` button.

### /.oci/config file

If you have the related /.oci/config file configured, the Sandbox will read from the **DEFAULT** profile at startup and load the credentials as follows:

- **User OCID**: Your personal User OCID [User OCID](#OCI-credentials) that can be retrieved on your OCI tenancy interface
- **Fingerprint**: The **Fingerprint** associated to your OCI private API key
- **Tenancy OCID**: The OCID associated to the tenancy you want to connect to, that can be retrieved on your OCI interface
- **Region**: The tenancy region you want to connect to
- **Key File**: The file path to your OCI private API key

Once all fields are set, click the `Save` button.

### OCI credentials

Here's a summary of all the OCI credentials and where to find them:

| Entry | Description and Where to Get the Value | Required? |
| -------------------- | ---------------------------------------- | ----------|
| user | OCID of the user calling the API. To get the value, see [Required Keys and OCIDs](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm#Required_Keys_and_OCIDs) <br> <br> Example: ocid1.user.oc1..<unique_ID>(shortened for brevity) | Yes |
| fingerprint | Fingerprint for the public key that was added to this user. To get the value, see [Required Keys and OCIDs](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm#Required_Keys_and_OCIDs) | Yes |
| key_file | Full path and filename of the private key. <br><br> **Important:** The key pair must be in PEM format. For instructions on generating a key pair in PEM format, see [Required Keys and OCIDs](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm#Required_Keys_and_OCIDs). <br><br>Example (Linux/Mac OS): ~/.oci/oci_api_key.pem. <br><br> Example (Windows): ~/.oci/oci_api_key.pem. <br><br> This corresponds to the file %HOMEDRIVE%%HOMEPATH%\.oci\oci_api_key.pem. | Yes |
| tenancy | OCID of your tenancy. To get the value, see [Required Keys and OCIDs](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm#Required_Keys_and_OCIDs) <br><br> Example: ocid1.tenancy.oc1..<unique_ID> | Yes |
| region | An Oracle Cloud Infrastructure region. See [Regions and Availability Domains](https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm) <br><br> Example: us-ashburn-1 | Yes |
| security_token_file | If session token authentication is being used, then this parameter is required. <br><br> Using this authentication method makes fingerprint, user, and pass_phrase not required. Starting a session with the OCI CLI will populate all of the required parameters for this authentication method. | Conditional |
70 changes: 67 additions & 3 deletions docs/content/sandbox/test_framework/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,71 @@ weight = 30
Copyright (c) 2023, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
-->
Generating a test dataset Q&A pairs through an external LLM accelerates the massive test phase. The platform provides the integration with a framework designed for this scope, called Giskard, that analyze the document to identify also the high level topics related to the Q&A pairs generated, and include them into the test dataset.

{{% notice style="code" title="10-Sept-2024: Documentation In-Progress..." icon="pen" %}}
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
{{% /notice %}}
![Generation](images/generation.png)

The generation phase is optional, but normally very welcome to reduce the cost of the Proof-of-concepts, since requires a huge human effort.

Then, the questions are asked to the agent configured, collecting each answer provided and compared with the correct answer through an LLM, elected as judge, that classifies them and provides a justification for the positive or negative response, in a process described in the following picture.

![Test](images/test.png)


## Test Framework page
From the left side menu, you will access to the page on which, selecting the **Generate new Test Dataset**, you can upload as many pdf documents you want by which will be extracted contexts to be used to generate a defined number of Q&A, as shown in the following snapshot:

![GenerateNew](images/generate.png)

You can choose any of the models available to perform a Q&A generation process, since you could be interested to use an high profile, expensive model for the crucial dataset generation to evaluate the RAG app, eventually with a cheaper llm model to put in production as chat model. This phase not only generate the number of Q&A you need, but it will analyze the document provided extracting a set of topics that could help to classify the questions generated and can help to find the area to be improved.

When the generation is over (it could takes time), as shown in the following snapshot:

![Generate](images/qa_dataset.png)

you can:

* exclude a Q&A: clicking **Hide** you’ll drop the question from the final dataset if you consider it not meaningful;
* modify the text of the **question** and the **Reference answer**: if you are not agree, you can updated the raw text generated, according the **Reference context** that is it fixed, like the **Metadata**.
After your updates, you can download the dataset to store it for next test sessions.

Anyway, the generation process it’s optional. If you already have prepared a JSONL file with your Q&A, according this schema:

* **id**: an alphanumeric unique id like ”2f6d5ec5–4111–4ba3–9569–86a7bec8f971";
* **question**
* **reference_answer**: an example of answer considered right;
* **reference_context**: a piece of document by which has been extracted the question;
* **conversation_history**: it’s an array empty [], at the moment not evaluated;
* **metadata**: a include nested json doc with extra info to be used for analytics aim, and must include the following fields:
- **question_type** **[simple|complex]**;
- **seed_document_id** (numeric);
- **topic**.

you can simply upload it as shown here:

![Upload](images/upload.png)

If you need an example, let’s generate just one Q&A and download it, and add to your own Q&As Test Dataset.

At this point, if you have generated or you will use an existing test dataset, you can run the overall test on a configuration currently selected on the left side:

![Playground](images/playground.png)

The top part is related to the LLM are you going to be used for chat generation, and it includes the most relevant hyper-parameters to use in the call. The lower part it’s related to the Vector Store used in which, apart the **Embedding Model**, **Chunk Size**, **Chunk Overlap** and **Distance Strategy**, that are fixed and coming from the **Split/Embed** process you have to perform before, you can modify:

* **Top K**: how many chunks should be included in the prompt’s context from nearer to the question found;
* **Search Type**: that could be Similarity or Maximal Marginal Relevance. The first one is it commonly used, but the second one it’s related to an Oracle DB23ai feature that allows to exclude similar chunks from the top K and give space in the list to different chunks providing more relevant information.

At the end of the evaluation it will be provided an **Overall Correctness Score**, that’s is simply the percentage of correct answers on the total number of questions submitted:

![Correctness](images/correctness.png)

Moreover, a percentage by topics, the list of failures and the full list of Q&As will be evaluated. To each Q&A included into the test dataset, will be added:

* **agent_answer**: the actual answer provided by the RAG app;
* **correctness**: a flag true/false that evaluates if the agent_answer matches the reference_answer;
* **correctness_reason**: the reason why an answer has been evaluated wrong by the judge LLM.

The list of **Failures**, **Correctness by each Q&A**, as well as a **Report**, could be download and stored for future audit activities.

*In this way you can perform several tests using the same curated test dataset, generated or self-made, looking for the best performance RAG configuration*.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/sandbox/tools/images/embed.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/sandbox/tools/images/prompt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/sandbox/tools/images/split.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 8 additions & 3 deletions docs/content/sandbox/tools/prompt_eng.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ Copyright (c) 2023, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
-->

{{% notice style="default" title="10-Sept-2024: Documentation In-Progress..." icon="pen" %}}
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
{{% /notice %}}
An important key factor that influences the quality of answers it depends from the prompt provided to the LLM, that includes the context information. To customize and test the effect, it’s available the **Prompts** voice of menu that offers pre-configured list of prompt templates that could be customized and associated to the RAG system.

![Prompt](images/prompt.png)

There are three options available:
- **Basic Example** : it is automatically paired with the no-rag, pure LLM chatbot configuration;
- **RAG Example** : it is automatically paired if the RAG checkbox set to True;
- **Custom** : it's applied to any RAG/no-RAG configuration.
45 changes: 42 additions & 3 deletions docs/content/sandbox/tools/split_embed.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,45 @@ Copyright (c) 2023, 2024, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
-->

{{% notice style="default" title="10-Sept-2024: Documentation In-Progress..." icon="pen" %}}
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
{{% /notice %}}
The first phase building a RAG Chatbot starts with the document chunking based on vector embeddings generation, that will be stored into a vector store to be retrieved by vectors distance search and added to the context in order to answer the question grounded to the information provided.

We choose the freedom to exploit LLMs for vector embeddings provided by public services like Cohere, OpenAI, and Perplexity, or running on top a GPU compute node managed by the user and exposed through open source platforms like OLLAMA or HuggingFace, to avoid sharing data with external services that are beyond full customer control.

From the **Split/Embed** voice of the left side menu, you’ll access to the ingestion page:

![Split](images/split.png)

The Load and Split Documents, parts of Split/Embed form, will allow to choose documents (txt,pdf,html,etc.) stored on the Object Storage service available on the Oracle Cloud Infrastructure, on the client’s desktop or getting from URLs, like shown in following snapshot:

![Embed](images/embed.png)

It will be created a “speaking” table, like the TEXT_EMBEDDING_3_SMALL_8191_1639_COSINE in the example. You can create, on the same set of documents, several options of vectorstore table, since nobody normally knows which is the best chunking size, and then test them indipendently.

## Embedding Configuration

Choose one of the **Embedding models available** from the listbox that will depend by the **Configuration/Models** page.
The **Embedding Server** URL associated to the model chosen will be shown. The **Chunk Size (tokens)** will change according the kind of embeddings model selected, as well as the **Chunk Overlap (% of Chunk Size)**.
Then you have to choose one of the **Distance Metric** available in the Oracle DB23ai:
- COSINE
- EUCLIDEAN_DISTANCE
- DOT_PRODUCT
- MAX_INNER_PRODUCT
To understand the meaning of these metrics, please refer to the doc [Vector Distance Metrics](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/vector-distance-metrics.html) in the Oracle DB23ai "*AI Vector Search User's Guide*".

The **Embedding Alias** field let you to add a more meaningful info to the vectorstore table that allows you to have more than one vector table with the same: *model + chunksize + chunk_overlap + distance_strategy* combination.


## Load and Split Documents

The process that starts clicking the **Populate Vector Store** button needs:
- **File Source**: you can include txt,pdf,html documents from one of these sources:
- **OCI**: you can browse and add more than one document into the same vectostore table at a time;
- **Local**: uploading more than one document into the same vectostore table at a time;
- **Web**: upload one txt,pdf,html from the URL provided.

- **Rate Limit (RPM)**: to avoid that a public LLM embedding service bans you for too much requests per second, out of your subscription limits.

The **Vector Store** will show the name of the table will be populated into the DB, according the naming convention that reflects the parameters used.



0 comments on commit e1d1d39

Please sign in to comment.