PrivateGPT is a production-ready AI project that allows users to chat over documents, etc.; by integrating it with ipex-llm
, users can now easily leverage local LLMs running on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max).
See the demo of privateGPT running Mistral:7B on Intel Arc A770 below.
You could also click here to watch the demo video. |
Follow the steps in Run Ollama on Intel GPU Guide to install and run Ollama on Intel GPU. Ensure that ollama serve
is running correctly and can be accessed through a local URL (e.g., https://127.0.0.1:11434
) or a remote URL (e.g., http://your_ip:11434
).
We recommend pulling the desired model before proceeding with PrivateGPT. For instance, to pull the Mistral:7B model, you can use the following command:
ollama pull mistral:7b
You can either clone the repository or download the source zip from github:
git clone https://github.com/zylon-ai/private-gpt
Execute the following commands in a terminal to install the dependencies of PrivateGPT:
cd private-gpt
pip install poetry
pip install ffmpy==0.3.1
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
For more details, refer to the PrivateGPT installation Guide.
To configure PrivateGPT to use Ollama for running local LLMs, you should edit the private-gpt/settings-ollama.yaml
file. Modify the ollama
section by setting the llm_model
and embedding_model
you wish to use, and updating the api_base
and embedding_api_base
to direct to your Ollama URL.
Below is an example of how settings-ollama.yaml
should look.
Note
settings-ollama.yaml
is loaded when the Ollama profile is specified in the PGPT_PROFILES environment variable. This can override configurations from the default settings.yaml
.
For more information on configuring PrivateGPT, please visit the PrivateGPT Main Concepts page.
Please ensure that the Ollama server continues to run in a terminal while you're using the PrivateGPT.
Run below commands to start the service in another terminal:
-
For Linux users:
export no_proxy=localhost,127.0.0.1 PGPT_PROFILES=ollama make run
Note:
Setting
PGPT_PROFILES=ollama
will load the configuration fromsettings.yaml
andsettings-ollama.yaml
. -
For Windows users:
set no_proxy=localhost,127.0.0.1 set PGPT_PROFILES=ollama make run
Note:
Setting
PGPT_PROFILES=ollama
will load the configuration fromsettings.yaml
andsettings-ollama.yaml
.
Upon successful deployment, you will see logs in the terminal similar to the following:
Open a browser (if it doesn't open automatically) and navigate to the URL displayed in the terminal. If it shows http://0.0.0.0:8001, you can access it locally via http://127.0.0.1:8001
or remotely via http://your_ip:8001
.
To chat with the LLM, select the "LLM Chat" option located in the upper left corner of the page. Type your messages at the bottom of the page and click the "Submit" button to receive responses from the model.
To interact with documents, select the "Query Files" option in the upper left corner of the page. Click the "Upload File(s)" button to upload documents. After the documents have been vectorized, you can type your messages at the bottom of the page and click the "Submit" button to receive responses from the model based on the uploaded content.