semantic_qa

A small semantic Q&A demo using langchain and openai

Prerequisites

The only hard requirements are:

Python 3.10+ with pip and virtualenv.
An OpenAI API key. Although this will require setting up a payment plan with a credit card, per-call costs are very low.

Although it's not a pre-requisite, having a CUDA-compatible GPU is strongly advised to generate text embeddings locally using larger models.

Dependencies

After cloning this repo, create and activate a Python virtual environment, then install the required Python packages using pip:

PS > virtualenv venv
PS > venv\scripts\activate.ps1
(venv) PS > pip install -r requirements_dev.txt
(venv) PS > pip install -r requirements.txt

$ virtualenv venv
$ source venv/bin/activate
(venv) $ pip install -r requirements_dev.txt
(venv) $ pip install -r requirements.txt

If you have a CUDA-capable GPU, install the right torch packages as described at https://pytorch.org/get-started/locally/

pip install -U --force-reinstall --no-deps torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Document corpus

Choose whatever document folder you fancy. Why not try a local copy of Godel's security policies from Sharepoint?

Vector database

The demo currently supports the following vector stores:

The first one uses file-based SQLite for storage and does not require any work. All the others need some set up, detailed in the links above.

Embeddings generator models

The demo currently supports:

Calling the OpenAI embeddings API, which requires an API key and a payment plan, using the model "text-embedding-ada-002" by default
Generating embeddings locally using torch and a pre-trained model downloaded from Hugging Face. The default model is "all-MiniLM-L6-v2"
Generating embeddings locally using torch and one of the pre-trained Instructor models. The default used is "hkunlp/instructor-large"

Running the demo

(venv) $> python semantic_qa.py

The first time it runs, leave REBUILD = True to ensure the script iterates over the files in the corpus and generates the embeddings. In successive runs, you can change REBUILD = False and just test different values of QUERY_STR or tweaks to the GPT prompt.

Web UI

(venv) $> chainlit run ./chainlit_app.py -w

This will run a small web UI on port 8000 by default.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.chainlit		.chainlit
corpus		corpus
public		public
vector_stores_howtos		vector_stores_howtos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chainlit.md		chainlit.md
chainlit_app.py		chainlit_app.py
config.toml		config.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
semantic_qa.py		semantic_qa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

semantic_qa

Prerequisites

Dependencies

Document corpus

Vector database

Embeddings generator models

Running the demo

Web UI

About

Releases

Packages

Languages

License

GodelTech/semantic_qa

Folders and files

Latest commit

History

Repository files navigation

semantic_qa

Prerequisites

Dependencies

Document corpus

Vector database

Embeddings generator models

Running the demo

Web UI

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages