GuPT is the name of the project developed by a student group for the course Machine Learning for Natural Language Processing (DIT247). The system leverages extracted information from Gothenburg University’s (GU) bachelor’s and master’s courses (~590) and programs (~90), including relevant details from their websites and syllabus PDFs. This data is used as input to GuPT, which then employs a Retrieval-Augmented Generation (RAG) approach to respond to user queries.
GuPT’s RAG model is built using LangChain, OpenAI embeddings, and ChatGPT4o-mini. By utilizing multi-querying and logic routing, GuPT can handle ambiguous questions and provide both specific and general answers regarding GU courses and programs. The goal is to offer a tool that efficiently provides information on entry requirements, learning objectives, and assessment methods, thereby reducing confusion and administrative workload.
Access our interactive demo and start asking questions about GU courses and programs.
- Features
- Getting Started
- Installation
- Usage
- Data Collection
- Architecture
- Evaluation
- Technologies Used
- Video Presentation
- Natural Language Querying: Ask questions about GU courses and programs in plain English.
- Contextual RAG System: Retrieves relevant information from a local database of course and program details.
- Multi-Querying and Logic Routing: Handles ambiguous queries and routes them through various queries to get precise answers.
- Scalable: Built to handle a large volume of course and program data.
- Efficient Retrieval: Reduces time spent searching for course or program information manually.
These instructions will help you set up a local copy of GuPT for development and testing purposes.
- Python 3.8+: Ensure you have Python installed.
- pip: Python package manager.
- OpenAI API Key: Required for embedding and text generation. Obtain one from OpenAI's website.
- Clone the Repository
git clone https://github.com/faerazo/DIT247-NLP-Final-Project.git
cd DIT247-NLP-Final-Project
- Set Up Your
.env
File
Create a file named .env
in the project root and include your OpenAI API Key:
OPENAI_API_KEY=[YOUR_API_KEY]
- Install Required Libraries
pip install -r requirements.txt
Once you have the environment set up and the necessary dependencies installed, you can run GuPT and interact with the RAG Chatbot.
- Start the GuPT RAG Chatbot
python rag.py
- Ask Your Questions
Simply type your question or query into the chatbot interface or use one of the provided template questions.
Data from the GU courses and programs is crawled from the GU website and stored in the data
folder. The process is summarized in the following diagram:
The architecture of GuPT is shown in the following diagram:
To evaluate GuPT’s responses on the test set (or a subset of it), run the following command:
python run_evaluation.py --subset 3
Where --subset 3
indicates the subset of the test data you want to evaluate. Adjust this value as needed.