-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
First complete draft of pgsql vector article
- Loading branch information
1 parent
9b6dfd8
commit 0a86865
Showing
1 changed file
with
80 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,31 +1,34 @@ | ||
+++ | ||
title = 'Install Django with PostgreSQL and PGVector on Upsun' | ||
date = 2024-07-17T11:00:00+02:00 | ||
draft = true | ||
draft = false | ||
+++ | ||
|
||
The order of the imports and settings in your settings.py file determines which settings take precedence. In Django, settings defined later override earlier settings. Given this, the final definition of DATABASES will depend on whether PLATFORM_DB_RELATIONSHIP is set and the Upsun environment variables are present. | ||
In my article ["Install Django with SQLite on Upsun"](/posts/install-django-sqlite-upsun/), I explain why I love the [Upsun PaaS](https://upsun.com) and some of the great features you get when you use it. The next step in developing a production-worthy Django site on Upsun is to move to using an enterprise grade database, PostgreSQL. In this tutorial, I also show how to install and use the [PGVector](https://github.com/pgvector/pgvector) extension because the apps that I'm building need the ability to do semantic queries on vectors, in part to do Retrieval Augmented Generation (RAG) with Large Language Models (LLM) such as ChatGPT or Claude. | ||
|
||
Here is a summary of how the precedence works: | ||
Here is a summary of steps that I explained in the previous tutorial. | ||
|
||
Initial Definition in settings.py: Your initial database settings are defined in the main settings.py file. | ||
Override in settings_psh.py: If PLATFORM_DB_RELATIONSHIP is set and the Upsun environment variables are present, settings_psh.py will override the initial DATABASES definition. | ||
## 1. Prepare the environment and start a Django project | ||
|
||
```bash | ||
mkdir upsun_django_postgresql | ||
cd upsun_django_postgresql | ||
python -m venv myenv | ||
source myenv/bin/activate | ||
pip install django | ||
pip install gunicorn | ||
pip install psycopg2 | ||
|
||
pip freeze > requirements.txt | ||
|
||
cat requirements.txt | ||
asgiref==3.8.1 | ||
Django==5.0.7 | ||
gunicorn==22.0.0 | ||
packaging==24.1 | ||
psycopg2==2.9.9 | ||
sqlparse==0.5.0 | ||
django-admin startproject myproject | ||
cd myproject | ||
python manage.py startapp myapp | ||
cd .. | ||
``` | ||
|
||
## 2. Make the project compatible with Upsun | ||
|
||
Assuming you have started an Upsun project and installed the Upsun CLI, run this in the root directory (`upsun_django_postgresql`): | ||
|
||
```bash | ||
upsun project:init | ||
``` | ||
|
@@ -60,6 +63,13 @@ Use arrows to move, space to select, type to filter | |
[ ] OpenSearch | ||
``` | ||
|
||
This command has done the following: | ||
|
||
1. Added `.upsun/config.yaml` which is where Upsun settings live. | ||
2. Added `myproject/myproject/settings_psh.py`, code which reads Upsun environmental variables. | ||
3. Added a line at the end of `myproject/myproject/settings.py` to include the `settings_psh.py`. | ||
|
||
This is a summary of the parts of `.upsun/config.yaml` that pertain to the PostgreSQL database. The important parts are the service definition, which results in a PostgreSQL database server running in its own container, and the `postgresql` relationship, which instructs Upsun to do everything needed to allow the Django app to connect to the database server. | ||
|
||
```yaml | ||
applications: | ||
|
@@ -79,6 +89,12 @@ services: | |
type: postgresql:15 | ||
``` | ||
{{% notice info %}} | ||
The `DATABASES` definition in `settings.py` will be overridden by the `DATABASES` definition in `settings_psh.py` if the environmental variable `PLATFORM_DB_RELATIONSHIP` is set and one of the compatible database engines is specified. The `config.yaml` example above meets both of these conditions, so the database configuration will come from the Upsun environment. | ||
{{% /notice %}} | ||
|
||
|
||
{{% notice warning %}} | ||
As of 2024-07-17, `upsun project:init` has a bug when generating the `settings_psh.py` file. The variable seen on line 54 should be all lowercase. | ||
|
||
|
@@ -93,6 +109,44 @@ The bug has been reported and fixed upstream, and will be fixed in upcoming rele | |
|
||
{{% /notice %}} | ||
|
||
## 3. Put it in Git and Upsun it | ||
|
||
```bash | ||
git add . | ||
git commit -m "Initial deployment of Django with PostgreSQL" | ||
upsun project:set-remote # select the pre-created Upsun project from the list | ||
upsun push | ||
``` | ||
|
||
This will build a new Django site with PostgreSQL on Upsun. We can create the super user and confirm that the database has been populated with all of the initial tables. | ||
|
||
|
||
```bash | ||
upsun ssh | ||
_ _ | ||
| | | |_ __ ____ _ _ _ | ||
| |_| | '_ (_-< || | ' \ | ||
\___/| .__/__/\_,_|_||_| | ||
|_| | ||
Welcome to Upsun. | ||
Environment: main-bvxea6i | ||
Branch: main | ||
Project: oes36x5dtgp2u | ||
web@upsun_django_sqlite.0:~$ | ||
``` | ||
|
||
You're now on the command line of your web environment. You can interact with your Django environment now. | ||
|
||
```bash | ||
python myproject/manage.py createsuperuser | ||
exit | ||
``` | ||
|
||
Now lets interact directly with the PostgreSQL database: | ||
|
||
```bash | ||
upsun sql | ||
|
@@ -119,6 +173,8 @@ main=> \q | |
Connection to ssh.eu-5.platform.sh closed. | ||
``` | ||
|
||
## 4. Install the PGVector extension and test it | ||
|
||
PostgreSQL has a [lot of extensions available](https://docs.upsun.com/add-services/postgresql.html#available-extensions). One that I'm particularly interested in is [PGVector](https://github.com/pgvector/pgvector). Vector databases in combination with Large Language Models (LLM), aka. "AI", have given developers new ways to search for semantically similar texts. This has application for many fields, especially any LLM-based applications that need Retrieval Augmented Generation (RAG). PostgreSQl on Upsun supports this perfectly, and this is how you configure it to be installed. | ||
|
||
In `.upsun/config.yaml`, update the `services` definition like this: | ||
|
@@ -145,7 +201,7 @@ The careful observer will note that I also bumped the PostgreSQL version from 15 | |
|
||
Below is a sequence of SQL commands to create a new table using `pgvector`, insert a few embeddings into it, and perform a similarity search query to test its functionality. | ||
|
||
### Step 1: Create a Table with a Vector Column | ||
### Create a table with a vector column | ||
|
||
First, create a table with a column for storing vector embeddings. Here, we'll create a table named `items` with an `id` and a `embedding` column. | ||
|
||
|
@@ -156,7 +212,7 @@ CREATE TABLE items ( | |
); | ||
``` | ||
|
||
### Step 2: Insert Embeddings into the Table | ||
### Insert embeddings into the table | ||
|
||
Next, insert some sample embeddings into the `items` table. Here, we add three example embeddings. | ||
|
||
|
@@ -167,7 +223,7 @@ INSERT INTO items (embedding) VALUES | |
('[0.4, 0.3, 0.1]'); | ||
``` | ||
|
||
### Step 3: Perform a Similarity Search Query | ||
### Perform a similarity search query | ||
|
||
To test that the setup is working, perform a similarity search query. Here, we find the top 3 most similar embeddings to a given query vector `[0.2, 0.2, 0.2]`. | ||
|
||
|
@@ -215,4 +271,10 @@ Run these commands in your `psql` session, which you can open from the Upsun CLI | |
(3 rows) | ||
main=> | ||
``` | ||
``` | ||
|
||
## Conclusion | ||
|
||
This tutorial shows how to create a Django project to use PostgreSQL with the PGVector extension on the Upsun platform. You now have a robust, production-ready environment capable of handling semantic vector queries essential for advanced applications like Retrieval Augmented Generation with Large Language Models. | ||
|
||
Thanks for reading the tutorial! If you have any questions, my email address is [email protected] and you can [find me on LinkedIn](https://www.linkedin.com/in/roberttdouglass/). There is also an [Upsun Discord forum](https://discord.gg/PkMc2pVCDV) where I hang out, and you're welcome to find me there. |