Quickstart Guide for `pycolbertdb`

This quickstart guide provides instructions on how to use the pycolbertdb package to integrate ColbertDB with LlamaIndex, leveraging OpenAI's GPT-4 model for processing and querying documents.

Prerequisites

Ensure you have the following installed and configured:

Python 3.x
An OpenAI API key
Environment variables configured for ColbertDB

Installation

Install the necessary packages

pip install pycolbertdb -U
pip install llama-index
pip install llama-index-readers-web
pip install requests
pip install python-dotenv

Code Example

Below is an example of how to use the pycolbertdb package to fetch, process, and query documents.

Import Dependencies

Start by importing the necessary dependencies.

import os
from dotenv import load_dotenv
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core import Document, PromptTemplate
from llama_index.llms.openai import OpenAI

from pycolbertdb.client import Colbertdb
from pycolbertdb.models import CreateCollectionDocument
from pycolbertdb.helpers import from_llama_index_documents

Load Environment Variables

Load your environment variables from a .env file.

load_dotenv()
URL = os.getenv('COLBERTDB_URL')
API_KEY = os.getenv('COLBERTDB_API_KEY')
STORE_NAME = os.getenv('COLBERTDB_STORE_NAME')
OPEN_AI_KEY = os.getenv('OPENAI_API_KEY')

URLS = ['https://en.wikipedia.org/wiki/Onigiri']

Initialize Clients

Initialize the ColbertDB and OpenAI clients.

client = Colbertdb(url=URL, api_key=API_KEY, store_name=STORE_NAME)
open_ai_client = OpenAI(model="gpt-4-turbo", api_key=OPEN_AI_KEY)

qa_prompt_tmpl_str = """\
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Please write the answer in the style of {tone_name}
Query: {query_str}
Answer: \
"""

prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)

Fetch and Process Documents

Fetch and process HTML content from the specified URLs.

docs = from_llama_index_documents(SimpleWebPageReader(html_to_text=True).load_data(URLS))

Create a Collection in ColbertDB

Create a new collection in ColbertDB with the processed documents.

collection = client.create_collection(documents=docs, name='rice_ball_facts', options={"force_create": True})

Search the Collection

Perform a search query on the created collection.

result = collection.search(query="What are some popular fillings for onigiri?", k=3)

Generate a Response Using OpenAI

Format the retrieved documents and generate a response using OpenAI.

context = ''
for document in result.documents:
    print("Source: " + document.metadata['source'] + "\n", document.content)
    context += (document.content + "\n\n")

prompt = prompt_tmpl.format(context_str=context, tone_name="shakespeare", query_str="What are some typical onigiri fillings")
response = open_ai_client.complete(prompt)
print(response)

Add New Documents to the Collection

Fetch additional documents and add them to the existing collection.

new_docs = SimpleWebPageReader(html_to_text=True).load_data(["https://en.wikipedia.org/wiki/Kewpie_(mayonnaise)"])
new_formatted = [{"content": doc.text, "metadata": {"source": doc.id_}} for doc in new_docs[0:2]]

collection = collection.add_documents(documents=new_formatted)

Search the Updated Collection

Perform a new search query on the updated collection.

new_result = collection.search(query="When was kewpie mayo founded?", k=3)
new_context = ''
for document in new_result.documents:
    print("Source: " + document.metadata['source'] + "\n", document.content)
    new_context += (document.content + "\n\n")

prompt = prompt_tmpl.format(context_str=new_context, tone_name="bruce springsteen", query_str="When and where was kewpie mayo founded")
new_response = open_ai_client.complete(prompt)
print(new_response)

Conclusion

This guide provides a quickstart overview of using the pycolbertdb package for document processing and querying. Customize the prompt and collection as needed for your specific use case.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
pycolbertdb		pycolbertdb
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quickstart Guide for `pycolbertdb`

Prerequisites

Installation

Code Example

Import Dependencies

Load Environment Variables

Initialize Clients

Fetch and Process Documents

Create a Collection in ColbertDB

Search the Collection

Generate a Response Using OpenAI

Add New Documents to the Collection

Search the Updated Collection

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

colbertdb/pycolbertdb

Folders and files

Latest commit

History

Repository files navigation

Quickstart Guide for pycolbertdb

Prerequisites

Installation

Code Example

Import Dependencies

Load Environment Variables

Initialize Clients

Fetch and Process Documents

Create a Collection in ColbertDB

Search the Collection

Generate a Response Using OpenAI

Add New Documents to the Collection

Search the Updated Collection

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Quickstart Guide for `pycolbertdb`

Packages