diff --git a/RAG/01_Basic_RAG/notebook.ipynb b/RAG/01_Basic_RAG/notebook.ipynb index 0daded3..df7c1cc 100644 --- a/RAG/01_Basic_RAG/notebook.ipynb +++ b/RAG/01_Basic_RAG/notebook.ipynb @@ -1,735 +1,3927 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "

Basic RAG

\n", - "
\n", - "\n", - "\"Open\n", - "\n", - "
\n", - "

AI Engineering.academy

\n", - " \n", - " \n", - "
\n", - "\n", - "
\n", - "\n", - "\"Ai\n", - "\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "[![GitHub Stars](https://img.shields.io/github/stars/adithya-s-k/AI-Engineering.academy?style=social)](https://github.com/adithya-s-k/AI-Engineering.academy/stargazers)\n", - "[![GitHub Forks](https://img.shields.io/github/forks/adithya-s-k/AI-Engineering.academy?style=social)](https://github.com/adithya-s-k/AI-Engineering.academy/network/members)\n", - "[![GitHub Issues](https://img.shields.io/github/issues/adithya-s-k/AI-Engineering.academy)](https://github.com/adithya-s-k/AI-Engineering.academy/issues)\n", - "[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/adithya-s-k/AI-Engineering.academy)](https://github.com/adithya-s-k/AI-Engineering.academy/pulls)\n", - "[![License](https://img.shields.io/github/license/adithya-s-k/AI-Engineering.academy)](https://github.com/adithya-s-k/AI-Engineering.academy/blob/main/LICENSE)\n", - "\n", - "
\n", - "\n", - "## Introduction\n", - "Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of large language models with the ability to retrieve relevant information from a knowledge base. This approach enhances the quality and accuracy of generated responses by grounding them in specific, retrieved information.\n", - "\n", - "This notebook aims to provide a clear and concise introduction to RAG, suitable for beginners who want to understand and implement this technology.\n", - "\n", - "### Motivation\n", - "\n", - "Traditional language models generate text based on learned patterns from training data. However, when they are presented with queries that require specific, updated, or niche information, they may struggle to provide accurate responses. RAG addresses this limitation by incorporating a retrieval step that provides the language model with relevant context to generate more informed answers.\n", - "\n", - "### Method Details\n", - "\n", - "#### Document Preprocessing and Vector Store Creation\n", - "\n", - "1. **Document Chunking**: The knowledge base documents (e.g., PDFs, articles) are preprocessed and split into manageable chunks. This is done to create a searchable corpus that can be efficiently used in the retrieval process.\n", - " \n", - "2. **Embedding Generation**: Each chunk is converted into a vector representation using pre-trained embeddings (e.g., OpenAI's embeddings). This allows the documents to be stored in a vector database, such as Qdrant, enabling efficient similarity searches.\n", - "\n", - "#### Retrieval-Augmented Generation Workflow\n", - "\n", - "1. **Query Input**: A user provides a query that needs to be answered.\n", - " \n", - "2. **Retrieval Step**: The query is embedded into a vector using the same embedding model that was used for the documents. A similarity search is then performed in the vector database to find the most relevant document chunks.\n", - "\n", - "3. **Generation Step**: The retrieved document chunks are passed to a large language model (e.g., GPT-4) as additional context. The model uses this context to generate a more accurate and relevant response.\n", - "\n", - "### Key Features of RAG\n", - "\n", - "1. **Contextual Relevance**: By grounding responses in actual retrieved information, RAG models can produce more contextually relevant and accurate answers.\n", - " \n", - "2. **Scalability**: The retrieval step can scale to handle large knowledge bases, allowing the model to draw from vast amounts of information.\n", - "\n", - "3. **Flexibility in Use Cases**: RAG can be adapted for a variety of applications, including question answering, summarization, recommendation systems, and more.\n", - "\n", - "4. **Improved Accuracy**: Combining generation with retrieval often yields more precise results, especially for queries requiring specific or lesser-known information.\n", - "\n", - "### Benefits of this Approach\n", - "\n", - "1. **Combines Strengths of Both Retrieval and Generation**: RAG effectively merges retrieval-based methods with generative models, allowing for both precise fact-finding and natural language generation.\n", - "\n", - "2. **Enhanced Handling of Long-Tail Queries**: It is particularly effective for queries where specific and less frequently occurring information is needed.\n", - "\n", - "3. **Domain Adaptability**: The retrieval mechanism can be tuned to specific domains, ensuring that the generated responses are grounded in the most relevant and accurate domain-specific information.\n", - "\n", - "### Conclusion\n", - "\n", - "Retrieval-Augmented Generation (RAG) represents an innovative fusion of retrieval and generation techniques, significantly enhancing the capabilities of language models by grounding their outputs in relevant external information. This approach can be particularly valuable in scenarios requiring precise, context-aware responses, such as customer support, academic research, and more. As AI continues to evolve, RAG stands out as a powerful method for building more reliable and context-sensitive AI systems.\n", - "\n", - "### Prerequisites\n", - "- Preferably Python 3.11\n", - "- Jupyter Notebook or JupyterLab\n", - "- LLM API Key\n", - " - You can use any llm of your choice in this notebook we have use OpenAI and Gpt-4o-mini\n", - "\n", - "With these steps, you can implement a basic RAG system to enhance the capabilities of language models by incorporating real-world, up-to-date information, improving their effectiveness in various applications." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Setting up the Environment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install llama-index\n", - "!pip install llama-index-vector-stores-qdrant \n", - "!pip install llama-index-readers-file \n", - "!pip install llama-index-embeddings-fastembed \n", - "!pip install llama-index-llms-openai\n", - "!pip install llama-index-llms-groq\n", - "!pip install -U qdrant_client fastembed\n", - "!pip install python-dotenv" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ + "cells": [ { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/adithya/miniconda3/envs/01_Basic_RAG/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n", - "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n" - ] - } - ], - "source": [ - "# Standard library imports\n", - "import logging\n", - "import sys\n", - "import os\n", - "\n", - "# Third-party imports\n", - "from dotenv import load_dotenv\n", - "from IPython.display import Markdown, display\n", - "\n", - "# Qdrant client import\n", - "import qdrant_client\n", - "\n", - "# LlamaIndex core imports\n", - "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n", - "from llama_index.core import Settings\n", - "\n", - "# LlamaIndex vector store import\n", - "from llama_index.vector_stores.qdrant import QdrantVectorStore\n", - "\n", - "# Embedding model imports\n", - "from llama_index.embeddings.fastembed import FastEmbedEmbedding\n", - "from llama_index.embeddings.openai import OpenAIEmbedding\n", - "\n", - "# LLM import\n", - "from llama_index.llms.openai import OpenAI\n", - "from llama_index.llms.groq import Groq\n", - "# Load environment variables\n", - "load_dotenv()\n", - "\n", - "# Get OpenAI API key from environment variables\n", - "OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\")\n", - "GROK_API_KEY = os.getenv(\"GROQ_API_KEY\")\n", - "\n", - "# Setting up Base LLM\n", - "# Settings.llm = OpenAI(\n", - "# model=\"gpt-4o-mini\", temperature=0.1, max_tokens=1024, streaming=True\n", - "# )\n", - "\n", - "Settings.llm = Groq(model=\"llama-3.1-70b-versatile\" , api_key=GROK_API_KEY)\n", - "\n", - "# Set the embedding model\n", - "# Option 1: Use FastEmbed with BAAI/bge-base-en-v1.5 model (default)\n", - "Settings.embed_model = FastEmbedEmbedding(model_name=\"BAAI/bge-base-en-v1.5\")\n", - "\n", - "# Option 2: Use OpenAI's embedding model (commented out)\n", - "# If you want to use OpenAI's embedding model, uncomment the following line:\n", - "# Settings.embed_model = OpenAIEmbedding(embed_batch_size=10, api_key=OPENAI_API_KEY)\n", - "\n", - "# Qdrant configuration (commented out)\n", - "# If you're using Qdrant, uncomment and set these variables:\n", - "# QDRANT_CLOUD_ENDPOINT = os.getenv(\"QDRANT_CLOUD_ENDPOINT\")\n", - "# QDRANT_API_KEY = os.getenv(\"QDRANT_API_KEY\")\n", - "\n", - "# Note: Remember to add QDRANT_CLOUD_ENDPOINT and QDRANT_API_KEY to your .env file if using Qdrant Hosted version" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load the Data" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ + "cell_type": "markdown", + "metadata": { + "id": "0XpuBSCRAUr4" + }, + "source": [ + "
\n", + "

Basic RAG

\n", + "
\n", + "\n", + "\"Open\n", + "\n", + "
\n", + "

AI Engineering.academy

\n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "\n", + "\"Ai\n", + "\n", + "
\n", + "\n", + "\n", + "
\n", + "\n", + "[![GitHub Stars](https://img.shields.io/github/stars/adithya-s-k/AI-Engineering.academy?style=social)](https://github.com/adithya-s-k/AI-Engineering.academy/stargazers)\n", + "[![GitHub Forks](https://img.shields.io/github/forks/adithya-s-k/AI-Engineering.academy?style=social)](https://github.com/adithya-s-k/AI-Engineering.academy/network/members)\n", + "[![GitHub Issues](https://img.shields.io/github/issues/adithya-s-k/AI-Engineering.academy)](https://github.com/adithya-s-k/AI-Engineering.academy/issues)\n", + "[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/adithya-s-k/AI-Engineering.academy)](https://github.com/adithya-s-k/AI-Engineering.academy/pulls)\n", + "[![License](https://img.shields.io/github/license/adithya-s-k/AI-Engineering.academy)](https://github.com/adithya-s-k/AI-Engineering.academy/blob/main/LICENSE)\n", + "\n", + "
\n", + "\n", + "## Introduction\n", + "Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of large language models with the ability to retrieve relevant information from a knowledge base. This approach enhances the quality and accuracy of generated responses by grounding them in specific, retrieved information.\n", + "\n", + "This notebook aims to provide a clear and concise introduction to RAG, suitable for beginners who want to understand and implement this technology.\n", + "\n", + "### Motivation\n", + "\n", + "Traditional language models generate text based on learned patterns from training data. However, when they are presented with queries that require specific, updated, or niche information, they may struggle to provide accurate responses. RAG addresses this limitation by incorporating a retrieval step that provides the language model with relevant context to generate more informed answers.\n", + "\n", + "### Method Details\n", + "\n", + "#### Document Preprocessing and Vector Store Creation\n", + "\n", + "1. **Document Chunking**: The knowledge base documents (e.g., PDFs, articles) are preprocessed and split into manageable chunks. This is done to create a searchable corpus that can be efficiently used in the retrieval process.\n", + " \n", + "2. **Embedding Generation**: Each chunk is converted into a vector representation using pre-trained embeddings (e.g., OpenAI's embeddings). This allows the documents to be stored in a vector database, such as Qdrant, enabling efficient similarity searches.\n", + "\n", + "#### Retrieval-Augmented Generation Workflow\n", + "\n", + "1. **Query Input**: A user provides a query that needs to be answered.\n", + " \n", + "2. **Retrieval Step**: The query is embedded into a vector using the same embedding model that was used for the documents. A similarity search is then performed in the vector database to find the most relevant document chunks.\n", + "\n", + "3. **Generation Step**: The retrieved document chunks are passed to a large language model (e.g., GPT-4) as additional context. The model uses this context to generate a more accurate and relevant response.\n", + "\n", + "### Key Features of RAG\n", + "\n", + "1. **Contextual Relevance**: By grounding responses in actual retrieved information, RAG models can produce more contextually relevant and accurate answers.\n", + " \n", + "2. **Scalability**: The retrieval step can scale to handle large knowledge bases, allowing the model to draw from vast amounts of information.\n", + "\n", + "3. **Flexibility in Use Cases**: RAG can be adapted for a variety of applications, including question answering, summarization, recommendation systems, and more.\n", + "\n", + "4. **Improved Accuracy**: Combining generation with retrieval often yields more precise results, especially for queries requiring specific or lesser-known information.\n", + "\n", + "### Benefits of this Approach\n", + "\n", + "1. **Combines Strengths of Both Retrieval and Generation**: RAG effectively merges retrieval-based methods with generative models, allowing for both precise fact-finding and natural language generation.\n", + "\n", + "2. **Enhanced Handling of Long-Tail Queries**: It is particularly effective for queries where specific and less frequently occurring information is needed.\n", + "\n", + "3. **Domain Adaptability**: The retrieval mechanism can be tuned to specific domains, ensuring that the generated responses are grounded in the most relevant and accurate domain-specific information.\n", + "\n", + "### Conclusion\n", + "\n", + "Retrieval-Augmented Generation (RAG) represents an innovative fusion of retrieval and generation techniques, significantly enhancing the capabilities of language models by grounding their outputs in relevant external information. This approach can be particularly valuable in scenarios requiring precise, context-aware responses, such as customer support, academic research, and more. As AI continues to evolve, RAG stands out as a powerful method for building more reliable and context-sensitive AI systems.\n", + "\n", + "### Prerequisites\n", + "- Preferably Python 3.11\n", + "- Jupyter Notebook or JupyterLab\n", + "- LLM API Key\n", + " - You can use any llm of your choice in this notebook we have use OpenAI and Gpt-4o-mini\n", + "\n", + "With these steps, you can implement a basic RAG system to enhance the capabilities of language models by incorporating real-world, up-to-date information, improving their effectiveness in various applications." + ] + }, { - "name": "stdout", - "output_type": "stream", - "text": [ - "πŸ”ƒ Loading Data\n" - ] + "cell_type": "markdown", + "metadata": { + "id": "WrF1xTGmAUr9" + }, + "source": [ + "## Setting up the Environment" + ] }, { - "name": "stderr", - "output_type": "stream", - "text": [ - "Loading files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 4.27file/s]\n" - ] - } - ], - "source": [ - "# lets loading the documents using SimpleDirectoryReader\n", - "\n", - "print(\"πŸ”ƒ Loading Data\")\n", - "\n", - "from llama_index.core import Document\n", - "reader = SimpleDirectoryReader(\"/content/data\" , recursive=True)\n", - "documents = reader.load_data(show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Setting up Vector Database\n", - "\n", - "We will be using qDrant as the Vector database\n", - "There are 4 ways to initialize qdrant \n", - "\n", - "1. Inmemory\n", - "```python\n", - "client = qdrant_client.QdrantClient(location=\":memory:\")\n", - "```\n", - "2. Disk\n", - "```python\n", - "client = qdrant_client.QdrantClient(path=\"./data\")\n", - "```\n", - "3. Self hosted or Docker\n", - "```python\n", - "\n", - "client = qdrant_client.QdrantClient(\n", - " # url=\"http://:\"\n", - " host=\"localhost\",port=6333\n", - ")\n", - "```\n", - "\n", - "4. Qdrant cloud\n", - "```python\n", - "client = qdrant_client.QdrantClient(\n", - " url=QDRANT_CLOUD_ENDPOINT,\n", - " api_key=QDRANT_API_KEY,\n", - ")\n", - "```\n", - "\n", - "for this notebook we will be using qdrant cloud" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "# creating a qdrant client instance\n", - "\n", - "client = qdrant_client.QdrantClient(\n", - " # you can use :memory: mode for fast and light-weight experiments,\n", - " # it does not require to have Qdrant deployed anywhere\n", - " # but requires qdrant-client >= 1.1.1\n", - " # location=\":memory:\"\n", - " # otherwise set Qdrant instance address with:\n", - " # url=QDRANT_CLOUD_ENDPOINT,\n", - " # otherwise set Qdrant instance with host and port:\n", - " # host=\"localhost\",\n", - " # port=6333\n", - " # set API KEY for Qdrant Cloud\n", - " # api_key=QDRANT_API_KEY,\n", - " path=\"./db/\"\n", - ")\n", - "\n", - "vector_store = QdrantVectorStore(client=client, collection_name=\"01_Basic_RAG\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Ingest Data into vector DB" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "1-L5bLrDAUr-", + "outputId": "58d6833a-dea2-445e-dd3f-b7a334162376", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Collecting gradio\n", + " Downloading gradio-5.3.0-py3-none-any.whl.metadata (15 kB)\n", + "Collecting aiofiles<24.0,>=22.0 (from gradio)\n", + " Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)\n", + "Requirement already satisfied: anyio<5.0,>=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.7.1)\n", + "Collecting fastapi<1.0,>=0.115.2 (from gradio)\n", + " Downloading fastapi-0.115.3-py3-none-any.whl.metadata (27 kB)\n", + "Collecting ffmpy (from gradio)\n", + " Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)\n", + "Collecting gradio-client==1.4.2 (from gradio)\n", + " Downloading gradio_client-1.4.2-py3-none-any.whl.metadata (7.1 kB)\n", + "Requirement already satisfied: httpx>=0.24.1 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.27.2)\n", + "Collecting huggingface-hub>=0.25.1 (from gradio)\n", + " Downloading huggingface_hub-0.26.1-py3-none-any.whl.metadata (13 kB)\n", + "Requirement already satisfied: jinja2<4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.1.4)\n", + "Collecting markupsafe~=2.0 (from gradio)\n", + " Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)\n", + "Requirement already satisfied: numpy<3.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.26.4)\n", + "Collecting orjson~=3.0 (from gradio)\n", + " Downloading orjson-3.10.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (50 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.6/50.6 kB\u001b[0m \u001b[31m3.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from gradio) (24.1)\n", + "Requirement already satisfied: pandas<3.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.2.2)\n", + "Requirement already satisfied: pillow<11.0,>=8.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (10.4.0)\n", + "Requirement already satisfied: pydantic>=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.9.2)\n", + "Collecting pydub (from gradio)\n", + " Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)\n", + "Collecting python-multipart>=0.0.9 (from gradio)\n", + " Downloading python_multipart-0.0.12-py3-none-any.whl.metadata (1.9 kB)\n", + "Requirement already satisfied: pyyaml<7.0,>=5.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.0.2)\n", + "Collecting ruff>=0.2.2 (from gradio)\n", + " Downloading ruff-0.7.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)\n", + "Collecting semantic-version~=2.0 (from gradio)\n", + " Downloading semantic_version-2.10.0-py2.py3-none-any.whl.metadata (9.7 kB)\n", + "Collecting starlette<1.0,>=0.40.0 (from gradio)\n", + " Downloading starlette-0.41.0-py3-none-any.whl.metadata (6.0 kB)\n", + "Collecting tomlkit==0.12.0 (from gradio)\n", + " Downloading tomlkit-0.12.0-py3-none-any.whl.metadata (2.7 kB)\n", + "Requirement already satisfied: typer<1.0,>=0.12 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.12.5)\n", + "Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.12.2)\n", + "Collecting uvicorn>=0.14.0 (from gradio)\n", + " Downloading uvicorn-0.32.0-py3-none-any.whl.metadata (6.6 kB)\n", + "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from gradio-client==1.4.2->gradio) (2024.6.1)\n", + "Collecting websockets<13.0,>=10.0 (from gradio-client==1.4.2->gradio)\n", + " Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)\n", + "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5.0,>=3.0->gradio) (3.10)\n", + "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<5.0,>=3.0->gradio) (1.3.1)\n", + "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5.0,>=3.0->gradio) (1.2.2)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx>=0.24.1->gradio) (2024.8.30)\n", + "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx>=0.24.1->gradio) (1.0.6)\n", + "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx>=0.24.1->gradio) (0.14.0)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.25.1->gradio) (3.16.1)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.25.1->gradio) (2.32.3)\n", + "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.25.1->gradio) (4.66.5)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0,>=1.0->gradio) (2.8.2)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0,>=1.0->gradio) (2024.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0,>=1.0->gradio) (2024.2)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=2.0->gradio) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic>=2.0->gradio) (2.23.4)\n", + "Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from typer<1.0,>=0.12->gradio) (8.1.7)\n", + "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from typer<1.0,>=0.12->gradio) (1.5.4)\n", + "Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.10/dist-packages (from typer<1.0,>=0.12->gradio) (13.9.3)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas<3.0,>=1.0->gradio) (1.16.0)\n", + "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio) (3.0.0)\n", + "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio) (2.18.0)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.25.1->gradio) (3.4.0)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.25.1->gradio) (2.2.3)\n", + "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0,>=0.12->gradio) (0.1.2)\n", + "Downloading gradio-5.3.0-py3-none-any.whl (56.7 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.7/56.7 MB\u001b[0m \u001b[31m13.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading gradio_client-1.4.2-py3-none-any.whl (319 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m319.8/319.8 kB\u001b[0m \u001b[31m20.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading tomlkit-0.12.0-py3-none-any.whl (37 kB)\n", + "Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)\n", + "Downloading fastapi-0.115.3-py3-none-any.whl (94 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m94.6/94.6 kB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading huggingface_hub-0.26.1-py3-none-any.whl (447 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m447.4/447.4 kB\u001b[0m \u001b[31m27.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)\n", + "Downloading orjson-3.10.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (144 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m144.5/144.5 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading python_multipart-0.0.12-py3-none-any.whl (23 kB)\n", + "Downloading ruff-0.7.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.0 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.0/11.0 MB\u001b[0m \u001b[31m83.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)\n", + "Downloading starlette-0.41.0-py3-none-any.whl (73 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m73.2/73.2 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading uvicorn-0.32.0-py3-none-any.whl (63 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m63.7/63.7 kB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading ffmpy-0.4.0-py3-none-any.whl (5.8 kB)\n", + "Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", + "Downloading websockets-12.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (130 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m130.2/130.2 kB\u001b[0m \u001b[31m8.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: pydub, websockets, uvicorn, tomlkit, semantic-version, ruff, python-multipart, orjson, markupsafe, ffmpy, aiofiles, starlette, huggingface-hub, gradio-client, fastapi, gradio\n", + " Attempting uninstall: markupsafe\n", + " Found existing installation: MarkupSafe 3.0.2\n", + " Uninstalling MarkupSafe-3.0.2:\n", + " Successfully uninstalled MarkupSafe-3.0.2\n", + " Attempting uninstall: huggingface-hub\n", + " Found existing installation: huggingface-hub 0.24.7\n", + " Uninstalling huggingface-hub-0.24.7:\n", + " Successfully uninstalled huggingface-hub-0.24.7\n", + "Successfully installed aiofiles-23.2.1 fastapi-0.115.3 ffmpy-0.4.0 gradio-5.3.0 gradio-client-1.4.2 huggingface-hub-0.26.1 markupsafe-2.1.5 orjson-3.10.10 pydub-0.25.1 python-multipart-0.0.12 ruff-0.7.1 semantic-version-2.10.0 starlette-0.41.0 tomlkit-0.12.0 uvicorn-0.32.0 websockets-12.0\n" + ] + }, + { + "output_type": "display_data", + "data": { + "application/vnd.colab-display-data+json": { + "pip_warning": { + "packages": [ + "huggingface_hub" + ] + }, + "id": "730870414f444465ab5355d77327f907" + } + }, + "metadata": {} + } + ], + "source": [ + "# !pip install llama-index\n", + "# !pip install llama-index-vector-stores-qdrant\n", + "# !pip install llama-index-readers-file\n", + "# !pip install llama-index-embeddings-fastembed\n", + "# !pip install llama-index-llms-openai\n", + "# !pip install llama-index-llms-groq\n", + "# !pip install -U qdrant_client fastembed\n", + "# !pip install python-dotenv\n", + "!pip install gradio" + ] + }, { - "name": "stderr", - "output_type": "stream", - "text": [ - "Parsing nodes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 58/58 [00:00<00:00, 555.31it/s]\n", - "Generating embeddings: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 58/58 [00:08<00:00, 7.04it/s]\n" - ] + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "-gTNuZdRAUr_", + "outputId": "c7030ef7-0ab1-47db-cad1-176be615a70d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 313, + "referenced_widgets": [ + "974130d59678476c816490e811dd9d0c", + "3e282f38c7824afa93fcb536aac62fd6", + "799aea4acaeb411192b13f361e8c0024", + "09812b5df16a448393bdf2c041a53de8", + "d0f7116ea17e4fb4923cc164854a00ea", + "ce91819733f148439af76702c4ec09a3", + "283e4de04016498b8db0eb60b39664bf", + "666d2321f0a7453281c1b8708b8ca890", + "5f4f0d28283e4457aa468d9efe988682", + "325a4b2f5c8c444f85f548144278c10c", + "1caf90c44bb3456d865e56e0ac4fe477", + "57096c6b29f24f5296df86eea3d7811a", + "5abbe5f8b0b342d39741a2a5c11d39aa", + "4073077e42f24455b8d0c37f1090f7b2", + "8ed837b12a7e44c19718fac9d3406ad5", + "6e757a7b4b884c358b195c1f3e49c6e4", + "7a4c9b9455a04d068229a3e68ef1e50d", + "6a14951f083d42909375b85b35197210", + "57dc591ae8c144518fa6aa152d0897b0", + "26d985a95cad457e9978a5b4f528f7cb", + "790d06f9d981469b850e9b9eeee8afcb", + "be1cae8bac844e33b69d7b622c0becce", + "6c16feff964343f18b4b5007e5cf22a4", + "d30ee40a1b074b7d879b02e03e8d9e48", + "16dd86e911b0495e825e84abaff5820b", + "de6518c9d6ec49ab8adab3ec75a2109d", + "844df52b584447a388c9f0cf6e338f04", + "dc72f2e2d4e34196a78a52f31be7e382", + "99cc77655cbf45c3ab0ae195bb193e34", + "2730764ae3c64109bd333f082b600c67", + "80f33c34da664ac4b3406da2d3a058e8", + "7009700c43d6467b8d41b4eb3c872145", + "86abce6241b047aaaa00684a1f90f16b", + "0a25e436076543b79b3fb4a761ba57c6", + "5d44ec7885a143d991d3c16d42447087", + "c27491dc14274b42b1f32be1dce232f2", + "cc8884d95ea944738ad0ac43192eb372", + "63b11bf7fdff4bcbbc8250acb7680eb5", + "a21c49e7a463469aaecd413ab00ce37c", + "d889d922ed89481cbb35daaab5f1889f", + "3166e8558de843cea600b0ed1d94e743", + "8bc40e3a76c941cea54146162e91ddee", + "04f2c6951f874c9698d8d865b71f2d25", + "88778e4e837a48878a16e187500cc31d", + "e56b15721e0243119ae5212202d10b09", + "9a330115152e49bdb9e550a1af8ede51", + "4ef8c8119ef440068ae3c5c6363a6637", + "e45411d842064712b4ca9789f2f13dcc", + "b1584a95428344358296f246a448cd88", + "99ddbf36927e4b0b95e76c00784fe445", + "56d8bf40a38e4f1a83e13eb37e4db65c", + "d26a297998bd4a93aa3621aea0255c00", + "20270200548a42a6bca13d1845bf2e45", + "fbeffeaa13e14d25bb6324427564d3f1", + "71ba07b6088649d3977899be6950d244", + "727d670dfd30404490a2e3643708271d", + "dd75c9b373314f68ba2ca97606e91159", + "b4a4a843cdf04b1bb4ad4468b7d8a186", + "7f1c600a40514ef68cee5f4dc7ff8cd2", + "9e6a24d2e3d04691a808b323db222a5a", + "b5923a05957248d3a33958284165a178", + "b61adb0a736b419784f302bdf46f76be", + "973b0db081e449efa9f6d9e9dea24e40", + "7531c8df220842dcada708918c5409fc", + "0c8c10450f1449efbdf552a0531b8860", + "c7ec84334ff14b639443a8dec9648d93" + ] + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n", + "The secret `HF_TOKEN` does not exist in your Colab secrets.\n", + "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n", + "You will be able to reuse this secret in all of your notebooks.\n", + "Please note that authentication is recommended but still optional to access public models or datasets.\n", + " warnings.warn(\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Fetching 5 files: 0%| | 0/5 [00:00" + "source": [ + "# lets loading the documents using SimpleDirectoryReader\n", + "\n", + "print(\"πŸ”ƒ Loading Data\")\n", + "\n", + "from llama_index.core import Document\n", + "reader = SimpleDirectoryReader(\"/content/data\" , recursive=True)\n", + "documents = reader.load_data(show_progress=True)" ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# Setting up Query Engine\n", - "BASE_RAG_QUERY_ENGINE = index.as_query_engine(\n", - " similarity_top_k=5,\n", - " text_qa_template=text_qa_template,\n", - " refine_template=refine_template,)\n", - "\n", - "\n", - "response = BASE_RAG_QUERY_ENGINE.query(\"How many encoders are stacked in the encoder?\")\n", - "display(Markdown(str(response)))" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hOR0NTnEAUsC" + }, + "source": [ + "## Setting up Vector Database\n", + "\n", + "We will be using qDrant as the Vector database\n", + "There are 4 ways to initialize qdrant\n", + "\n", + "1. Inmemory\n", + "```python\n", + "client = qdrant_client.QdrantClient(location=\":memory:\")\n", + "```\n", + "2. Disk\n", + "```python\n", + "client = qdrant_client.QdrantClient(path=\"./data\")\n", + "```\n", + "3. Self hosted or Docker\n", + "```python\n", + "\n", + "client = qdrant_client.QdrantClient(\n", + " # url=\"http://:\"\n", + " host=\"localhost\",port=6333\n", + ")\n", + "```\n", + "\n", + "4. Qdrant cloud\n", + "```python\n", + "client = qdrant_client.QdrantClient(\n", + " url=QDRANT_CLOUD_ENDPOINT,\n", + " api_key=QDRANT_API_KEY,\n", + ")\n", + "```\n", + "\n", + "for this notebook we will be using qdrant cloud" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "INsqBrkUAUsD" + }, + "outputs": [], + "source": [ + "# creating a qdrant client instance\n", + "\n", + "client = qdrant_client.QdrantClient(\n", + " # you can use :memory: mode for fast and light-weight experiments,\n", + " # it does not require to have Qdrant deployed anywhere\n", + " # but requires qdrant-client >= 1.1.1\n", + " # location=\":memory:\"\n", + " # otherwise set Qdrant instance address with:\n", + " # url=QDRANT_CLOUD_ENDPOINT,\n", + " # otherwise set Qdrant instance with host and port:\n", + " # host=\"localhost\",\n", + " # port=6333\n", + " # set API KEY for Qdrant Cloud\n", + " # api_key=QDRANT_API_KEY,\n", + " path=\"./db/\"\n", + ")\n", + "\n", + "vector_store = QdrantVectorStore(client=client, collection_name=\"01_Basic_RAG\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xC9q0MXlAUsD" + }, + "source": [ + "### Ingest Data into vector DB" + ] + }, { - "data": { - "text/markdown": [ - "The number of encoders stacked in the encoder is 6." + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "eVlDR0hLAUsD", + "outputId": "65928b00-0759-46be-d4dc-df6d39769c0b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 116, + "referenced_widgets": [ + "fab75fe1834446daaa1eb8de5391c725", + "ace3256368fc4901a5b162498297c75a", + "74da7a0fc3cb43eab0f0239834f0f001", + "d800e0732c854fa6b3a9862941999f5f", + "17d815ccb3a74a50a3068eb3b3e59180", + "29e41d0369954010956f6562083bc201", + "b0076f5ff4974cb5bfcdca6136930211", + "60271df3d9ce42e49188bc0564f500ee", + "e5560eda7291468b8500fa3af60a6afb", + "80afb7a0373b4238b620321a7fe62ca9", + "c88069201134451b9c1a39729e7d347d", + "6392bd678d564d8e8ed24acf0319f60f", + "036932d2e9de4c378f6fe08a78f58c59", + "d441cf0920d04e19a70ef53e365b14ea", + "3fd01c29df2f4648a33103b14d0336ca", + "38ae2836dfa5448aab9938bc14aa5879", + "23e60470ae384102a22120a7de878652", + "66acdec460044439aed8be3857931fdc", + "4f3bd12c7ec34952abc6f873539341a7", + "24b76b3eb7514cfb89801e9849d84a5c", + "f7bd1242939f433b90bfbd774ee108e9", + "a90b254671844fb6ad6b636d57499ab7" + ] + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Parsing nodes: 0%| | 0/19 [00:00" + "source": [ + "## ingesting data into vector database\n", + "\n", + "## lets set up an ingestion pipeline\n", + "\n", + "from llama_index.core.node_parser import TokenTextSplitter\n", + "from llama_index.core.node_parser import SentenceSplitter\n", + "from llama_index.core.node_parser import MarkdownNodeParser\n", + "from llama_index.core.node_parser import SemanticSplitterNodeParser\n", + "from llama_index.core.ingestion import IngestionPipeline\n", + "\n", + "pipeline = IngestionPipeline(\n", + " transformations=[\n", + " # MarkdownNodeParser(include_metadata=True),\n", + " # TokenTextSplitter(chunk_size=500, chunk_overlap=20),\n", + " SentenceSplitter(chunk_size=1024, chunk_overlap=20),\n", + " # SemanticSplitterNodeParser(buffer_size=1, breakpoint_percentile_threshold=95 , embed_model=Settings.embed_model),\n", + " Settings.embed_model,\n", + " ],\n", + " vector_store=vector_store,\n", + ")\n", + "\n", + "# Ingest directly into a vector db\n", + "nodes = pipeline.run(documents=documents , show_progress=True)\n", + "print(\"Number of chunks added to vector DB :\",len(nodes))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jieE3ifpAUsE" + }, + "source": [ + "## Setting Up Index" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "Qp2hOdxWAUsE" + }, + "outputs": [], + "source": [ + "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sFBs7K19AUsE" + }, + "source": [ + "## Modifying Prompts and Prompt Tuning" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "jF3qGRfIAUsE" + }, + "outputs": [], + "source": [ + "from llama_index.core import ChatPromptTemplate\n", + "\n", + "qa_prompt_str = (\n", + " \"Context information is below.\\n\"\n", + " \"---------------------\\n\"\n", + " \"{context_str}\\n\"\n", + " \"---------------------\\n\"\n", + " \"Given the context information and not prior knowledge, \"\n", + " \"answer the question: {query_str}\\n\"\n", + ")\n", + "\n", + "refine_prompt_str = (\n", + " \"We have the opportunity to refine the original answer \"\n", + " \"(only if needed) with some more context below.\\n\"\n", + " \"------------\\n\"\n", + " \"{context_msg}\\n\"\n", + " \"------------\\n\"\n", + " \"Given the new context, refine the original answer to better \"\n", + " \"answer the question: {query_str}. \"\n", + " \"If the context isn't useful, output the original answer again.\\n\"\n", + " \"Original Answer: {existing_answer}\"\n", + ")\n", + "\n", + "# Text QA Prompt\n", + "chat_text_qa_msgs = [\n", + " (\"system\",\"You are a AI assistant who is well versed with answering questions from the provided context\"),\n", + " (\"user\", qa_prompt_str),\n", + "]\n", + "text_qa_template = ChatPromptTemplate.from_messages(chat_text_qa_msgs)\n", + "\n", + "# Refine Prompt\n", + "chat_refine_msgs = [\n", + " (\"system\",\"Always answer the question, even if the context isn't helpful.\",),\n", + " (\"user\", refine_prompt_str),\n", + "]\n", + "refine_template = ChatPromptTemplate.from_messages(chat_refine_msgs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yuyOPvPLAUsE" + }, + "source": [ + "### Example of Retrivers\n", + "\n", + "- Query Engine\n", + "- Chat Engine" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "fM6BKSmbAUsF", + "outputId": "82435b4c-ad37-428c-a491-fad96738194f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 46 + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/markdown": "Hello." + }, + "metadata": {} + } + ], + "source": [ + "# Setting up Query Engine\n", + "BASE_RAG_QUERY_ENGINE = index.as_query_engine(\n", + " similarity_top_k=5)\n", + "\n", + "\n", + "response = BASE_RAG_QUERY_ENGINE.query(\"hello there\")\n", + "display(Markdown(str(response)))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "qmNXNnLMAUsF", + "outputId": "09666770-f8d6-44ef-aa6e-e1cdcf9f1f7b", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 46 + } + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/markdown": "The datasets used for Tamil-LLaMA include the Oscar dataset, the IndicNLP dataset, the Alpaca dataset, and the OpenOrca dataset." + }, + "metadata": {} + } + ], + "source": [ + "# Setting up Chat Engine\n", + "BASE_RAG_CHAT_ENGINE = index.as_chat_engine()\n", + "\n", + "response = BASE_RAG_CHAT_ENGINE.chat(\"what is the dataset used with tamil llama\")\n", + "display(Markdown(str(response)))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vem2m7IpAUsF" + }, + "source": [ + "### Simple Chat Application with RAG" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "R9CJbiroAUsF" + }, + "outputs": [], + "source": [ + "from typing import List\n", + "from llama_index.core.base.llms.types import ChatMessage, MessageRole\n", + "\n", + "class ChatEngineInterface:\n", + " def __init__(self, index):\n", + " self.chat_engine = index.as_chat_engine()\n", + " self.chat_history: List[ChatMessage] = []\n", + "\n", + " def display_message(self, role: str, content: str):\n", + " if role == \"USER\":\n", + " display(Markdown(f\"**Human:** {content}\"))\n", + " else:\n", + " display(Markdown(f\"**AI:** {content}\"))\n", + "\n", + " def chat(self, message: str) -> str:\n", + " # Create a ChatMessage for the user input\n", + " user_message = ChatMessage(role=MessageRole.USER, content=message)\n", + " self.chat_history.append(user_message)\n", + "\n", + " # Get response from the chat engine\n", + " response = self.chat_engine.chat(message, chat_history=self.chat_history)\n", + "\n", + " # Create a ChatMessage for the AI response\n", + " ai_message = ChatMessage(role=MessageRole.ASSISTANT, content=str(response))\n", + " self.chat_history.append(ai_message)\n", + "\n", + " # Display the conversation\n", + " self.display_message(\"USER\", message)\n", + " self.display_message(\"ASSISTANT\", str(response))\n", + "\n", + " print(\"\\n\" + \"-\"*50 + \"\\n\") # Separator for readability\n", + "\n", + " return str(response)\n", + "\n", + " def get_chat_history(self) -> List[ChatMessage]:\n", + " return self.chat_history" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pRIVhL31AUsF" + }, + "outputs": [], + "source": [ + "chat_interface = ChatEngineInterface(index)\n", + "while True:\n", + " user_input = input(\"You: \").strip()\n", + " if user_input.lower() == 'exit':\n", + " print(\"Thank you for chatting! Goodbye.\")\n", + " break\n", + " chat_interface.chat(user_input)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1I-pBLV4AUsG" + }, + "outputs": [], + "source": [ + "# To view chat history:\n", + "history = chat_interface.get_chat_history()\n", + "for message in history:\n", + " print(f\"{message.role}: {message.content}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ieWzBCYsAUsG" + }, + "source": [ + "## Gradio Applicaiton" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "oX4F3l9fAUsG", + "outputId": "05a7b94d-6e9c-4d5c-ee3a-9cf992ce467d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 646 + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/gradio/components/chatbot.py:223: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.\n", + " warnings.warn(\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Colab notebook detected. To show errors in colab notebook, set debug=True in launch()\n", + "* Running on public URL: https://50a35ce6bac83f9bb6.gradio.live\n", + "\n", + "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "
" + ] + }, + "metadata": {} + } + ], + "source": [ + "import gradio as gr\n", + "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document, Settings\n", + "from llama_index.vector_stores.qdrant import QdrantVectorStore\n", + "from llama_index.embeddings.openai import OpenAIEmbedding\n", + "from llama_index.llms.openai import OpenAI\n", + "import qdrant_client\n", + "import os\n", + "import tempfile\n", + "import shutil\n", + "from typing import List\n", + "from llama_index.core.base.llms.types import ChatMessage, MessageRole\n", + "\n", + "class RAGChatbot:\n", + " def __init__(self):\n", + " self.client = qdrant_client.QdrantClient(path=\"./db_demo/\")\n", + " self.vector_store = None\n", + " self.index = None\n", + " self.chat_engine = None\n", + " self.chat_history = []\n", + " # Initialize vector store and index\n", + " self.vector_store = QdrantVectorStore(\n", + " client=self.client,\n", + " collection_name=\"Demo_RAG\"\n", + " )\n", + "\n", + " # Create the index and ingest documents\n", + " self.index = VectorStoreIndex.from_documents(\n", + " documents,\n", + " vector_store=self.vector_store\n", + " )\n", + "\n", + " # Initialize chat engine\n", + " self.chat_engine = self.index.as_chat_engine(\n", + " streaming=True,\n", + " verbose=True\n", + " )\n", + "\n", + "\n", + " def process_uploaded_files(self, files) -> str:\n", + " try:\n", + " # Create a temporary directory for processing\n", + " with tempfile.TemporaryDirectory() as temp_dir:\n", + " # Save uploaded files to temporary directory\n", + " for file in files:\n", + " shutil.copy(file.name, temp_dir)\n", + "\n", + " # Load documents\n", + " reader = SimpleDirectoryReader(temp_dir)\n", + " documents = reader.load_data()\n", + "\n", + " pipeline = IngestionPipeline(\n", + " transformations=[\n", + " # MarkdownNodeParser(include_metadata=True),\n", + " # TokenTextSplitter(chunk_size=500, chunk_overlap=20),\n", + " SentenceSplitter(chunk_size=1024, chunk_overlap=20),\n", + " # SemanticSplitterNodeParser(buffer_size=1, breakpoint_percentile_threshold=95 , embed_model=Settings.embed_model),\n", + " Settings.embed_model,\n", + " ],\n", + " vector_store=self.vector_store,\n", + " )\n", + "\n", + " # Ingest directly into a vector db\n", + " nodes = pipeline.run(documents=documents , show_progress=True)\n", + "\n", + " return f\"Successfully processed {len(documents)} documents. Ready to chat! and inserted {len(nodes)} into the database\"\n", + "\n", + " except Exception as e:\n", + " return f\"Error processing files: {str(e)}\"\n", + "\n", + " def chat(self, message: str, history: List[List[str]]) -> str:\n", + " if self.chat_engine is None:\n", + " return \"Please upload documents first before starting the chat.\"\n", + "\n", + " try:\n", + " # Convert history to ChatMessage format\n", + " chat_history = []\n", + " for h in history:\n", + " chat_history.extend([\n", + " ChatMessage(role=MessageRole.USER, content=h[0]),\n", + " ChatMessage(role=MessageRole.ASSISTANT, content=h[1])\n", + " ])\n", + "\n", + " # Add current message to history\n", + " chat_history.append(ChatMessage(role=MessageRole.USER, content=message))\n", + "\n", + " # Get response from chat engine\n", + " response = self.chat_engine.chat(message, chat_history=chat_history)\n", + "\n", + " return str(response)\n", + "\n", + " except Exception as e:\n", + " return f\"Error generating response: {str(e)}\"\n", + "\n", + "def create_demo():\n", + " # Initialize the chatbot\n", + " chatbot = RAGChatbot()\n", + "\n", + " with gr.Blocks(theme=gr.themes.Soft()) as demo:\n", + " gr.Markdown(\"# RAG Chatbot\")\n", + " gr.Markdown(\"Upload your documents and start chatting!\")\n", + "\n", + " with gr.Row():\n", + " with gr.Column(scale=1):\n", + " file_output = gr.File(\n", + " file_count=\"multiple\",\n", + " label=\"Upload Documents\",\n", + " file_types=[\".txt\", \".pdf\", \".docx\", \".md\"]\n", + " )\n", + " upload_button = gr.Button(\"Process Documents\")\n", + " status_box = gr.Textbox(label=\"Status\", interactive=False)\n", + "\n", + " with gr.Column(scale=2):\n", + " chatbot_interface = gr.Chatbot(\n", + " label=\"Chat History\",\n", + " height=400,\n", + " bubble_full_width=False,\n", + " )\n", + " with gr.Row():\n", + " msg = gr.Textbox(\n", + " label=\"Type your message\",\n", + " placeholder=\"Ask me anything about the uploaded documents...\",\n", + " lines=2,\n", + " scale=4\n", + " )\n", + " submit_button = gr.Button(\"Submit\", scale=1)\n", + "\n", + " # Event handlers\n", + " upload_button.click(\n", + " fn=chatbot.process_uploaded_files,\n", + " inputs=[file_output],\n", + " outputs=[status_box],\n", + " )\n", + "\n", + " msg.submit(\n", + " fn=chatbot.chat,\n", + " inputs=[msg, chatbot_interface],\n", + " outputs=[chatbot_interface],\n", + " )\n", + "\n", + " clear.click(\n", + " lambda: None,\n", + " None,\n", + " chatbot_interface,\n", + " queue=False\n", + " )\n", + "\n", + " return demo\n", + "\n", + "if __name__ == \"__main__\":\n", + " demo = create_demo()\n", + " demo.launch(share=True)" ] - }, - "metadata": {}, - "output_type": "display_data" } - ], - "source": [ - "# Setting up Chat Engine\n", - "BASE_RAG_CHAT_ENGINE = index.as_chat_engine()\n", - "\n", - "response = BASE_RAG_CHAT_ENGINE.chat(\"How many encoders are stacked in the encoder?\")\n", - "display(Markdown(str(response)))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Simple Chat Application with RAG" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "from typing import List\n", - "from llama_index.core.base.llms.types import ChatMessage, MessageRole\n", - "\n", - "class ChatEngineInterface:\n", - " def __init__(self, index):\n", - " self.chat_engine = index.as_chat_engine()\n", - " self.chat_history: List[ChatMessage] = []\n", - "\n", - " def display_message(self, role: str, content: str):\n", - " if role == \"USER\":\n", - " display(Markdown(f\"**Human:** {content}\"))\n", - " else:\n", - " display(Markdown(f\"**AI:** {content}\"))\n", - "\n", - " def chat(self, message: str) -> str:\n", - " # Create a ChatMessage for the user input\n", - " user_message = ChatMessage(role=MessageRole.USER, content=message)\n", - " self.chat_history.append(user_message)\n", - " \n", - " # Get response from the chat engine\n", - " response = self.chat_engine.chat(message, chat_history=self.chat_history)\n", - " \n", - " # Create a ChatMessage for the AI response\n", - " ai_message = ChatMessage(role=MessageRole.ASSISTANT, content=str(response))\n", - " self.chat_history.append(ai_message)\n", - " \n", - " # Display the conversation\n", - " self.display_message(\"USER\", message)\n", - " self.display_message(\"ASSISTANT\", str(response))\n", - " \n", - " print(\"\\n\" + \"-\"*50 + \"\\n\") # Separator for readability\n", - "\n", - " return str(response)\n", - "\n", - " def get_chat_history(self) -> List[ChatMessage]:\n", - " return self.chat_history" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "chat_interface = ChatEngineInterface(index)\n", - "while True:\n", - " user_input = input(\"You: \").strip()\n", - " if user_input.lower() == 'exit':\n", - " print(\"Thank you for chatting! Goodbye.\")\n", - " break\n", - " chat_interface.chat(user_input)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# To view chat history:\n", - "history = chat_interface.get_chat_history()\n", - "for message in history:\n", - " print(f\"{message.role}: {message.content}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Gradio Applicaiton" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import gradio as gr\n", - "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document, Settings\n", - "from llama_index.vector_stores.qdrant import QdrantVectorStore\n", - "from llama_index.embeddings.openai import OpenAIEmbedding\n", - "from llama_index.llms.openai import OpenAI\n", - "import qdrant_client\n", - "import os\n", - "import tempfile\n", - "import shutil\n", - "from typing import List\n", - "from llama_index.core.base.llms.types import ChatMessage, MessageRole\n", - "\n", - "class RAGChatbot:\n", - " def __init__(self):\n", - " self.client = qdrant_client.QdrantClient(host=\"localhost\", port=6333)\n", - " self.vector_store = None\n", - " self.index = None\n", - " self.chat_engine = None\n", - " self.chat_history = []\n", - " \n", - "\n", - " def process_uploaded_files(self, files) -> str:\n", - " try:\n", - " # Create a temporary directory for processing\n", - " with tempfile.TemporaryDirectory() as temp_dir:\n", - " # Save uploaded files to temporary directory\n", - " for file in files:\n", - " shutil.copy(file.name, temp_dir)\n", - " \n", - " # Load documents\n", - " reader = SimpleDirectoryReader(temp_dir)\n", - " documents = reader.load_data()\n", - " \n", - " # Create new collection name based on timestamp\n", - " import time\n", - " collection_name = f\"chat_collection_{int(time.time())}\"\n", - " \n", - " # Initialize vector store and index\n", - " self.vector_store = QdrantVectorStore(\n", - " client=self.client,\n", - " collection_name=collection_name\n", - " )\n", - " \n", - " # Create the index and ingest documents\n", - " self.index = VectorStoreIndex.from_documents(\n", - " documents,\n", - " vector_store=self.vector_store\n", - " )\n", - " \n", - " # Initialize chat engine\n", - " self.chat_engine = self.index.as_chat_engine(\n", - " streaming=True,\n", - " verbose=True\n", - " )\n", - " \n", - " return f\"Successfully processed {len(documents)} documents. Ready to chat!\"\n", - " \n", - " except Exception as e:\n", - " return f\"Error processing files: {str(e)}\"\n", - "\n", - " def chat(self, message: str, history: List[List[str]]) -> str:\n", - " if self.chat_engine is None:\n", - " return \"Please upload documents first before starting the chat.\"\n", - " \n", - " try:\n", - " # Convert history to ChatMessage format\n", - " chat_history = []\n", - " for h in history:\n", - " chat_history.extend([\n", - " ChatMessage(role=MessageRole.USER, content=h[0]),\n", - " ChatMessage(role=MessageRole.ASSISTANT, content=h[1])\n", - " ])\n", - " \n", - " # Add current message to history\n", - " chat_history.append(ChatMessage(role=MessageRole.USER, content=message))\n", - " \n", - " # Get response from chat engine\n", - " response = self.chat_engine.chat(message, chat_history=chat_history)\n", - " \n", - " return str(response)\n", - " \n", - " except Exception as e:\n", - " return f\"Error generating response: {str(e)}\"\n", - "\n", - "def create_demo():\n", - " # Initialize the chatbot\n", - " chatbot = RAGChatbot()\n", - " \n", - " with gr.Blocks(theme=gr.themes.Soft()) as demo:\n", - " gr.Markdown(\"# RAG Chatbot\")\n", - " gr.Markdown(\"Upload your documents and start chatting!\")\n", - " \n", - " with gr.Row():\n", - " with gr.Column(scale=1):\n", - " file_output = gr.File(\n", - " file_count=\"multiple\",\n", - " label=\"Upload Documents\",\n", - " file_types=[\".txt\", \".pdf\", \".docx\", \".md\"]\n", - " )\n", - " upload_button = gr.Button(\"Process Documents\")\n", - " status_box = gr.Textbox(label=\"Status\", interactive=False)\n", - " \n", - " with gr.Column(scale=2):\n", - " chatbot_interface = gr.Chatbot(\n", - " label=\"Chat History\",\n", - " height=400,\n", - " bubble_full_width=False,\n", - " )\n", - " msg = gr.Textbox(\n", - " label=\"Type your message\",\n", - " placeholder=\"Ask me anything about the uploaded documents...\",\n", - " lines=2\n", - " )\n", - " clear = gr.Button(\"Clear\")\n", - " \n", - " # Event handlers\n", - " upload_button.click(\n", - " fn=chatbot.process_uploaded_files,\n", - " inputs=[file_output],\n", - " outputs=[status_box],\n", - " )\n", - " \n", - " msg.submit(\n", - " fn=chatbot.chat,\n", - " inputs=[msg, chatbot_interface],\n", - " outputs=[chatbot_interface],\n", - " )\n", - " \n", - " clear.click(\n", - " lambda: None,\n", - " None,\n", - " chatbot_interface,\n", - " queue=False\n", - " )\n", - " \n", - " return demo\n", - "\n", - "if __name__ == \"__main__\":\n", - " demo = create_demo()\n", - " demo.launch(share=True)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "llm-venv", - "language": "python", - "name": "python3" + ], + "metadata": { + "kernelspec": { + "display_name": "llm-venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + }, + "colab": { + "provenance": [] + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "974130d59678476c816490e811dd9d0c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_3e282f38c7824afa93fcb536aac62fd6", + "IPY_MODEL_799aea4acaeb411192b13f361e8c0024", + "IPY_MODEL_09812b5df16a448393bdf2c041a53de8" + ], + "layout": "IPY_MODEL_d0f7116ea17e4fb4923cc164854a00ea" + } + }, + "3e282f38c7824afa93fcb536aac62fd6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ce91819733f148439af76702c4ec09a3", + "placeholder": "​", + "style": "IPY_MODEL_283e4de04016498b8db0eb60b39664bf", + "value": "Fetching 5 files: 100%" + } + }, + "799aea4acaeb411192b13f361e8c0024": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_666d2321f0a7453281c1b8708b8ca890", + "max": 5, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5f4f0d28283e4457aa468d9efe988682", + "value": 5 + } + }, + "09812b5df16a448393bdf2c041a53de8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_325a4b2f5c8c444f85f548144278c10c", + "placeholder": "​", + "style": "IPY_MODEL_1caf90c44bb3456d865e56e0ac4fe477", + "value": " 5/5 [00:05<00:00,  3.20s/it]" + } + }, + "d0f7116ea17e4fb4923cc164854a00ea": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ce91819733f148439af76702c4ec09a3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "283e4de04016498b8db0eb60b39664bf": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "666d2321f0a7453281c1b8708b8ca890": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5f4f0d28283e4457aa468d9efe988682": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "325a4b2f5c8c444f85f548144278c10c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1caf90c44bb3456d865e56e0ac4fe477": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "57096c6b29f24f5296df86eea3d7811a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_5abbe5f8b0b342d39741a2a5c11d39aa", + "IPY_MODEL_4073077e42f24455b8d0c37f1090f7b2", + "IPY_MODEL_8ed837b12a7e44c19718fac9d3406ad5" + ], + "layout": "IPY_MODEL_6e757a7b4b884c358b195c1f3e49c6e4" + } + }, + "5abbe5f8b0b342d39741a2a5c11d39aa": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7a4c9b9455a04d068229a3e68ef1e50d", + "placeholder": "​", + "style": "IPY_MODEL_6a14951f083d42909375b85b35197210", + "value": "special_tokens_map.json: 100%" + } + }, + "4073077e42f24455b8d0c37f1090f7b2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_57dc591ae8c144518fa6aa152d0897b0", + "max": 695, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_26d985a95cad457e9978a5b4f528f7cb", + "value": 695 + } + }, + "8ed837b12a7e44c19718fac9d3406ad5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_790d06f9d981469b850e9b9eeee8afcb", + "placeholder": "​", + "style": "IPY_MODEL_be1cae8bac844e33b69d7b622c0becce", + "value": " 695/695 [00:00<00:00, 7.90kB/s]" + } + }, + "6e757a7b4b884c358b195c1f3e49c6e4": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7a4c9b9455a04d068229a3e68ef1e50d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6a14951f083d42909375b85b35197210": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "57dc591ae8c144518fa6aa152d0897b0": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "26d985a95cad457e9978a5b4f528f7cb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "790d06f9d981469b850e9b9eeee8afcb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "be1cae8bac844e33b69d7b622c0becce": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6c16feff964343f18b4b5007e5cf22a4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d30ee40a1b074b7d879b02e03e8d9e48", + "IPY_MODEL_16dd86e911b0495e825e84abaff5820b", + "IPY_MODEL_de6518c9d6ec49ab8adab3ec75a2109d" + ], + "layout": "IPY_MODEL_844df52b584447a388c9f0cf6e338f04" + } + }, + "d30ee40a1b074b7d879b02e03e8d9e48": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_dc72f2e2d4e34196a78a52f31be7e382", + "placeholder": "​", + "style": "IPY_MODEL_99cc77655cbf45c3ab0ae195bb193e34", + "value": "config.json: 100%" + } + }, + "16dd86e911b0495e825e84abaff5820b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2730764ae3c64109bd333f082b600c67", + "max": 740, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_80f33c34da664ac4b3406da2d3a058e8", + "value": 740 + } + }, + "de6518c9d6ec49ab8adab3ec75a2109d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7009700c43d6467b8d41b4eb3c872145", + "placeholder": "​", + "style": "IPY_MODEL_86abce6241b047aaaa00684a1f90f16b", + "value": " 740/740 [00:00<00:00, 7.17kB/s]" + } + }, + "844df52b584447a388c9f0cf6e338f04": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dc72f2e2d4e34196a78a52f31be7e382": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "99cc77655cbf45c3ab0ae195bb193e34": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2730764ae3c64109bd333f082b600c67": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "80f33c34da664ac4b3406da2d3a058e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7009700c43d6467b8d41b4eb3c872145": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "86abce6241b047aaaa00684a1f90f16b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "0a25e436076543b79b3fb4a761ba57c6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_5d44ec7885a143d991d3c16d42447087", + "IPY_MODEL_c27491dc14274b42b1f32be1dce232f2", + "IPY_MODEL_cc8884d95ea944738ad0ac43192eb372" + ], + "layout": "IPY_MODEL_63b11bf7fdff4bcbbc8250acb7680eb5" + } + }, + "5d44ec7885a143d991d3c16d42447087": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a21c49e7a463469aaecd413ab00ce37c", + "placeholder": "​", + "style": "IPY_MODEL_d889d922ed89481cbb35daaab5f1889f", + "value": "tokenizer.json: 100%" + } + }, + "c27491dc14274b42b1f32be1dce232f2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3166e8558de843cea600b0ed1d94e743", + "max": 711396, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_8bc40e3a76c941cea54146162e91ddee", + "value": 711396 + } + }, + "cc8884d95ea944738ad0ac43192eb372": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_04f2c6951f874c9698d8d865b71f2d25", + "placeholder": "​", + "style": "IPY_MODEL_88778e4e837a48878a16e187500cc31d", + "value": " 711k/711k [00:00<00:00, 3.17MB/s]" + } + }, + "63b11bf7fdff4bcbbc8250acb7680eb5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a21c49e7a463469aaecd413ab00ce37c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d889d922ed89481cbb35daaab5f1889f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3166e8558de843cea600b0ed1d94e743": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8bc40e3a76c941cea54146162e91ddee": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "04f2c6951f874c9698d8d865b71f2d25": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "88778e4e837a48878a16e187500cc31d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e56b15721e0243119ae5212202d10b09": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9a330115152e49bdb9e550a1af8ede51", + "IPY_MODEL_4ef8c8119ef440068ae3c5c6363a6637", + "IPY_MODEL_e45411d842064712b4ca9789f2f13dcc" + ], + "layout": "IPY_MODEL_b1584a95428344358296f246a448cd88" + } + }, + "9a330115152e49bdb9e550a1af8ede51": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_99ddbf36927e4b0b95e76c00784fe445", + "placeholder": "​", + "style": "IPY_MODEL_56d8bf40a38e4f1a83e13eb37e4db65c", + "value": "model_optimized.onnx: 100%" + } + }, + "4ef8c8119ef440068ae3c5c6363a6637": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d26a297998bd4a93aa3621aea0255c00", + "max": 217824172, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_20270200548a42a6bca13d1845bf2e45", + "value": 217824172 + } + }, + "e45411d842064712b4ca9789f2f13dcc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fbeffeaa13e14d25bb6324427564d3f1", + "placeholder": "​", + "style": "IPY_MODEL_71ba07b6088649d3977899be6950d244", + "value": " 218M/218M [00:05<00:00, 42.9MB/s]" + } + }, + "b1584a95428344358296f246a448cd88": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "99ddbf36927e4b0b95e76c00784fe445": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "56d8bf40a38e4f1a83e13eb37e4db65c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d26a297998bd4a93aa3621aea0255c00": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "20270200548a42a6bca13d1845bf2e45": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "fbeffeaa13e14d25bb6324427564d3f1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "71ba07b6088649d3977899be6950d244": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "727d670dfd30404490a2e3643708271d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_dd75c9b373314f68ba2ca97606e91159", + "IPY_MODEL_b4a4a843cdf04b1bb4ad4468b7d8a186", + "IPY_MODEL_7f1c600a40514ef68cee5f4dc7ff8cd2" + ], + "layout": "IPY_MODEL_9e6a24d2e3d04691a808b323db222a5a" + } + }, + "dd75c9b373314f68ba2ca97606e91159": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b5923a05957248d3a33958284165a178", + "placeholder": "​", + "style": "IPY_MODEL_b61adb0a736b419784f302bdf46f76be", + "value": "tokenizer_config.json: 100%" + } + }, + "b4a4a843cdf04b1bb4ad4468b7d8a186": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_973b0db081e449efa9f6d9e9dea24e40", + "max": 1242, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_7531c8df220842dcada708918c5409fc", + "value": 1242 + } + }, + "7f1c600a40514ef68cee5f4dc7ff8cd2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0c8c10450f1449efbdf552a0531b8860", + "placeholder": "​", + "style": "IPY_MODEL_c7ec84334ff14b639443a8dec9648d93", + "value": " 1.24k/1.24k [00:00<00:00, 15.6kB/s]" + } + }, + "9e6a24d2e3d04691a808b323db222a5a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b5923a05957248d3a33958284165a178": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b61adb0a736b419784f302bdf46f76be": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "973b0db081e449efa9f6d9e9dea24e40": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7531c8df220842dcada708918c5409fc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0c8c10450f1449efbdf552a0531b8860": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c7ec84334ff14b639443a8dec9648d93": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "fab75fe1834446daaa1eb8de5391c725": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ace3256368fc4901a5b162498297c75a", + "IPY_MODEL_74da7a0fc3cb43eab0f0239834f0f001", + "IPY_MODEL_d800e0732c854fa6b3a9862941999f5f" + ], + "layout": "IPY_MODEL_17d815ccb3a74a50a3068eb3b3e59180" + } + }, + "ace3256368fc4901a5b162498297c75a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_29e41d0369954010956f6562083bc201", + "placeholder": "​", + "style": "IPY_MODEL_b0076f5ff4974cb5bfcdca6136930211", + "value": "Parsing nodes: 100%" + } + }, + "74da7a0fc3cb43eab0f0239834f0f001": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_60271df3d9ce42e49188bc0564f500ee", + "max": 19, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_e5560eda7291468b8500fa3af60a6afb", + "value": 19 + } + }, + "d800e0732c854fa6b3a9862941999f5f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_80afb7a0373b4238b620321a7fe62ca9", + "placeholder": "​", + "style": "IPY_MODEL_c88069201134451b9c1a39729e7d347d", + "value": " 19/19 [00:00<00:00, 223.86it/s]" + } + }, + "17d815ccb3a74a50a3068eb3b3e59180": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "29e41d0369954010956f6562083bc201": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b0076f5ff4974cb5bfcdca6136930211": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "60271df3d9ce42e49188bc0564f500ee": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e5560eda7291468b8500fa3af60a6afb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "80afb7a0373b4238b620321a7fe62ca9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c88069201134451b9c1a39729e7d347d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "6392bd678d564d8e8ed24acf0319f60f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_036932d2e9de4c378f6fe08a78f58c59", + "IPY_MODEL_d441cf0920d04e19a70ef53e365b14ea", + "IPY_MODEL_3fd01c29df2f4648a33103b14d0336ca" + ], + "layout": "IPY_MODEL_38ae2836dfa5448aab9938bc14aa5879" + } + }, + "036932d2e9de4c378f6fe08a78f58c59": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_23e60470ae384102a22120a7de878652", + "placeholder": "​", + "style": "IPY_MODEL_66acdec460044439aed8be3857931fdc", + "value": "Generating embeddings: 100%" + } + }, + "d441cf0920d04e19a70ef53e365b14ea": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4f3bd12c7ec34952abc6f873539341a7", + "max": 30, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_24b76b3eb7514cfb89801e9849d84a5c", + "value": 30 + } + }, + "3fd01c29df2f4648a33103b14d0336ca": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f7bd1242939f433b90bfbd774ee108e9", + "placeholder": "​", + "style": "IPY_MODEL_a90b254671844fb6ad6b636d57499ab7", + "value": " 30/30 [00:50<00:00,  1.66s/it]" + } + }, + "38ae2836dfa5448aab9938bc14aa5879": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "23e60470ae384102a22120a7de878652": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "66acdec460044439aed8be3857931fdc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4f3bd12c7ec34952abc6f873539341a7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "24b76b3eb7514cfb89801e9849d84a5c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f7bd1242939f433b90bfbd774ee108e9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a90b254671844fb6ad6b636d57499ab7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } + } }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.14" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file