Our Problem statement is “Running GenAI on Intel AI Laptops and Simple LLM Inference on CPU and fine-tuning of LLM Models using Intel® OpenVINO™.” The challenge lies in efficiently running Generative AI applications and performing LLM inference on Intel AI Laptops and CPUs, while maintaining high performance without specialized hardware. Additionally, fine-tuning LLM models using Intel® OpenVINO™ for real-time applications requires addressing computational efficiency and resource constraints.
This project leverages Intel® OpenVINO™ to optimize and execute GenAI and LLM inference on Intel AI Laptops' CPUs, minimizing the reliance on GPUs and enabling efficient, high-performance AI deployment in consumer-grade environments. By fine-tuning LLM models with OpenVINO™, we aim to enhance the performance and accessibility of AI applications. Specifically, we have developed a text generation chatbot using TinyLlama/TinyLlama-1.1B-Chat-v1.0 to showcase these capabilities.
- Rahul Biju (Team Leader): CPU Inference
- Nandakrishnan A: Model Optimization and Quantization
- Nandana S Nair: Project Report
- Krishna Sagar P: Project Report
- Rahul Zachariah: User Interface Implementation
1. Clone the repository.
git clone https://github.com/Rahul-Biju-03/Technix.git
2. Move into the project directory.
cd Technix
3. Install all the required libraries, by installing the requirements.txt file.
pip install -r requirements.txt
4. (Optional) Running it in a virtual environment.
- Downloading and installing virtualenv.
pip install virtualenv
- Create the virtual environment in Python 3.
virtualenv -p path\to\your\python.exe test_env
- Activate the test environment.
For Windows:
test_env\Scripts\Activate
For Unix:
source test_env/bin/activate
5. Converting and Quantizing TinyLlama Model with OpenVINO.
- This script outlines the steps to convert the TinyLlama model from its original format to ONNX, and subsequently quantize it using OpenVINO for optimized performance.
python Conversion_and_Optimisation.py
6. Benchmarking Original and Quantized TinyLlama Model with OpenVINO
- This script benchmarks the performance and memory usage of the original TinyLlama model against the quantized version using OpenVINO, including model size calculations and inference time measurements.
python CPU_INFERENCE.py
7. TinyLlama Chatbot with Gradio Interface
- This script sets up a TinyLlama chatbot with a Gradio interface, including preprocessing and postprocessing functions for improved text handling.
python Chatbot.py
Below are two images illustrating the chatbot interface on a mobile device.