Skip to content

Commit

Permalink
First complete draft of pgsql vector article
Browse files Browse the repository at this point in the history
  • Loading branch information
robertDouglass committed Jul 17, 2024
1 parent 9b6dfd8 commit 0a86865
Showing 1 changed file with 80 additions and 18 deletions.
98 changes: 80 additions & 18 deletions hugosite/content/posts/install-django-postgresql-pgvector-upsun.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,34 @@
+++
title = 'Install Django with PostgreSQL and PGVector on Upsun'
date = 2024-07-17T11:00:00+02:00
draft = true
draft = false
+++

The order of the imports and settings in your settings.py file determines which settings take precedence. In Django, settings defined later override earlier settings. Given this, the final definition of DATABASES will depend on whether PLATFORM_DB_RELATIONSHIP is set and the Upsun environment variables are present.
In my article ["Install Django with SQLite on Upsun"](/posts/install-django-sqlite-upsun/), I explain why I love the [Upsun PaaS](https://upsun.com) and some of the great features you get when you use it. The next step in developing a production-worthy Django site on Upsun is to move to using an enterprise grade database, PostgreSQL. In this tutorial, I also show how to install and use the [PGVector](https://github.com/pgvector/pgvector) extension because the apps that I'm building need the ability to do semantic queries on vectors, in part to do Retrieval Augmented Generation (RAG) with Large Language Models (LLM) such as ChatGPT or Claude.

Here is a summary of how the precedence works:
Here is a summary of steps that I explained in the previous tutorial.

Initial Definition in settings.py: Your initial database settings are defined in the main settings.py file.
Override in settings_psh.py: If PLATFORM_DB_RELATIONSHIP is set and the Upsun environment variables are present, settings_psh.py will override the initial DATABASES definition.
## 1. Prepare the environment and start a Django project

```bash
mkdir upsun_django_postgresql
cd upsun_django_postgresql
python -m venv myenv
source myenv/bin/activate
pip install django
pip install gunicorn
pip install psycopg2

pip freeze > requirements.txt

cat requirements.txt
asgiref==3.8.1
Django==5.0.7
gunicorn==22.0.0
packaging==24.1
psycopg2==2.9.9
sqlparse==0.5.0
django-admin startproject myproject
cd myproject
python manage.py startapp myapp
cd ..
```

## 2. Make the project compatible with Upsun

Assuming you have started an Upsun project and installed the Upsun CLI, run this in the root directory (`upsun_django_postgresql`):

```bash
upsun project:init
```
Expand Down Expand Up @@ -60,6 +63,13 @@ Use arrows to move, space to select, type to filter
[ ] OpenSearch
```

This command has done the following:

1. Added `.upsun/config.yaml` which is where Upsun settings live.
2. Added `myproject/myproject/settings_psh.py`, code which reads Upsun environmental variables.
3. Added a line at the end of `myproject/myproject/settings.py` to include the `settings_psh.py`.

This is a summary of the parts of `.upsun/config.yaml` that pertain to the PostgreSQL database. The important parts are the service definition, which results in a PostgreSQL database server running in its own container, and the `postgresql` relationship, which instructs Upsun to do everything needed to allow the Django app to connect to the database server.

```yaml
applications:
Expand All @@ -79,6 +89,12 @@ services:
type: postgresql:15
```
{{% notice info %}}
The `DATABASES` definition in `settings.py` will be overridden by the `DATABASES` definition in `settings_psh.py` if the environmental variable `PLATFORM_DB_RELATIONSHIP` is set and one of the compatible database engines is specified. The `config.yaml` example above meets both of these conditions, so the database configuration will come from the Upsun environment.
{{% /notice %}}


{{% notice warning %}}
As of 2024-07-17, `upsun project:init` has a bug when generating the `settings_psh.py` file. The variable seen on line 54 should be all lowercase.

Expand All @@ -93,6 +109,44 @@ The bug has been reported and fixed upstream, and will be fixed in upcoming rele

{{% /notice %}}

## 3. Put it in Git and Upsun it

```bash
git add .
git commit -m "Initial deployment of Django with PostgreSQL"
upsun project:set-remote # select the pre-created Upsun project from the list
upsun push
```

This will build a new Django site with PostgreSQL on Upsun. We can create the super user and confirm that the database has been populated with all of the initial tables.


```bash
upsun ssh
_ _
| | | |_ __ ____ _ _ _
| |_| | '_ (_-< || | ' \
\___/| .__/__/\_,_|_||_|
|_|
Welcome to Upsun.
Environment: main-bvxea6i
Branch: main
Project: oes36x5dtgp2u
web@upsun_django_sqlite.0:~$
```

You're now on the command line of your web environment. You can interact with your Django environment now.

```bash
python myproject/manage.py createsuperuser
exit
```

Now lets interact directly with the PostgreSQL database:

```bash
upsun sql
Expand All @@ -119,6 +173,8 @@ main=> \q
Connection to ssh.eu-5.platform.sh closed.
```

## 4. Install the PGVector extension and test it

PostgreSQL has a [lot of extensions available](https://docs.upsun.com/add-services/postgresql.html#available-extensions). One that I'm particularly interested in is [PGVector](https://github.com/pgvector/pgvector). Vector databases in combination with Large Language Models (LLM), aka. "AI", have given developers new ways to search for semantically similar texts. This has application for many fields, especially any LLM-based applications that need Retrieval Augmented Generation (RAG). PostgreSQl on Upsun supports this perfectly, and this is how you configure it to be installed.

In `.upsun/config.yaml`, update the `services` definition like this:
Expand All @@ -145,7 +201,7 @@ The careful observer will note that I also bumped the PostgreSQL version from 15

Below is a sequence of SQL commands to create a new table using `pgvector`, insert a few embeddings into it, and perform a similarity search query to test its functionality.

### Step 1: Create a Table with a Vector Column
### Create a table with a vector column

First, create a table with a column for storing vector embeddings. Here, we'll create a table named `items` with an `id` and a `embedding` column.

Expand All @@ -156,7 +212,7 @@ CREATE TABLE items (
);
```

### Step 2: Insert Embeddings into the Table
### Insert embeddings into the table

Next, insert some sample embeddings into the `items` table. Here, we add three example embeddings.

Expand All @@ -167,7 +223,7 @@ INSERT INTO items (embedding) VALUES
('[0.4, 0.3, 0.1]');
```

### Step 3: Perform a Similarity Search Query
### Perform a similarity search query

To test that the setup is working, perform a similarity search query. Here, we find the top 3 most similar embeddings to a given query vector `[0.2, 0.2, 0.2]`.

Expand Down Expand Up @@ -215,4 +271,10 @@ Run these commands in your `psql` session, which you can open from the Upsun CLI
(3 rows)
main=>
```
```

## Conclusion

This tutorial shows how to create a Django project to use PostgreSQL with the PGVector extension on the Upsun platform. You now have a robust, production-ready environment capable of handling semantic vector queries essential for advanced applications like Retrieval Augmented Generation with Large Language Models.

Thanks for reading the tutorial! If you have any questions, my email address is [email protected] and you can [find me on LinkedIn](https://www.linkedin.com/in/roberttdouglass/). There is also an [Upsun Discord forum](https://discord.gg/PkMc2pVCDV) where I hang out, and you're welcome to find me there.

0 comments on commit 0a86865

Please sign in to comment.