Skip to content

feat: if upgrade 17 -> 17 modify upgrade process #1583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9e18547
feat: if upgrade 17 -> 17 or 17-orioledb -> 17-orioledb do not run th…
samrose May 2, 2025
a1a0b6e
feat: handle all cases of SERVER_LC_COLLATE and SERVER_LC_CTYPE
samrose May 5, 2025
1dbf0fe
chore: bump version for testing
samrose May 5, 2025
fdcd1c4
fix: explixit set on 17/oriole
samrose May 6, 2025
cf59660
feat: handling max_slot_wal_keep_size for pg 17 was needed as well
samrose May 6, 2025
11dab86
feat: binary upgrades require max_slot_wal_keep_size to be -1 during …
samrose May 6, 2025
3ec9818
fix: Better to override that during the upgrade process by specifying…
samrose May 6, 2025
715afe0
fix: cover only pg 17
samrose May 7, 2025
b079397
chore: bump version
samrose May 7, 2025
1a70efe
fix: rm oriole handling
samrose May 7, 2025
0a9fb88
fix: do not need max_slot_wal_keep_size on old version
samrose May 7, 2025
86c93bd
fix: temp config on new-options too
samrose May 7, 2025
cf74e75
test: bump test version
samrose May 7, 2025
1ca8473
fix: remove unbound var
samrose May 7, 2025
cfa3f19
chore: remove complete.sh change that should not have been committed
samrose May 7, 2025
ecff047
chore: bump for testing
samrose May 7, 2025
8d2ccb3
chore: stash code
samrose May 9, 2025
05d8a6f
feat: working pg 17 upgrade
samrose May 9, 2025
7876ed8
chore: bump version
samrose May 9, 2025
1352205
feat: pg 15 handling
samrose May 9, 2025
23b9d7c
chore: bump versions
samrose May 9, 2025
7a38abb
chore: versions for release
samrose May 10, 2025
6754a60
feat: rm oriole handling, refine 15 -> 17 config
samrose May 10, 2025
b81245b
chore: bump ver for testing
samrose May 10, 2025
888bd47
feat: make sure old pg stops if not force stop
samrose May 11, 2025
c27f987
chore: bump version
samrose May 11, 2025
be98ca1
chore: cleanup + bump version for test
samrose May 12, 2025
ed3ea78
chore: release bump
samrose May 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 60 additions & 8 deletions ansible/files/admin_api_scripts/pg_upgrade_scripts/initiate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,14 @@ LOG_FILE="/var/log/pg-upgrade-initiate.log"

POST_UPGRADE_EXTENSION_SCRIPT="/tmp/pg_upgrade/pg_upgrade_extensions.sql"
POST_UPGRADE_POSTGRES_PERMS_SCRIPT="/tmp/pg_upgrade/pg_upgrade_postgres_perms.sql"
OLD_PGVERSION=$(run_sql -A -t -c "SHOW server_version;")
OLD_PGVERSION=$(pg_config --version | sed 's/PostgreSQL \([0-9]*\.[0-9]*\).*/\1/')

# Skip locale settings if both versions are PostgreSQL 17+
if ! [[ "$OLD_PGVERSION" =~ ^17.* && "$PGVERSION" =~ ^17.* ]]; then
SERVER_LC_COLLATE=$(run_sql -A -t -c "SHOW lc_collate;")
SERVER_LC_CTYPE=$(run_sql -A -t -c "SHOW lc_ctype;")
fi

SERVER_LC_COLLATE=$(run_sql -A -t -c "SHOW lc_collate;")
SERVER_LC_CTYPE=$(run_sql -A -t -c "SHOW lc_ctype;")
SERVER_ENCODING=$(run_sql -A -t -c "SHOW server_encoding;")

POSTGRES_CONFIG_PATH="/etc/postgresql/postgresql.conf"
Expand Down Expand Up @@ -251,7 +255,12 @@ function initiate_upgrade {
if [ -n "$IS_LOCAL_UPGRADE" ]; then
mkdir -p "$PG_UPGRADE_BIN_DIR"
mkdir -p /tmp/persistent/
echo "a7189a68ed4ea78c1e73991b5f271043636cf074" > "$PG_UPGRADE_BIN_DIR/nix_flake_version"
if [ -n "$NIX_FLAKE_VERSION" ]; then
echo "$NIX_FLAKE_VERSION" > "$PG_UPGRADE_BIN_DIR/nix_flake_version"
else
echo "a7189a68ed4ea78c1e73991b5f271043636cf074" > "$PG_UPGRADE_BIN_DIR/nix_flake_version"
fi

tar -czf "/tmp/persistent/pg_upgrade_bin.tar.gz" -C "/tmp/pg_upgrade_bin" .
rm -rf /tmp/pg_upgrade_bin/
fi
Expand Down Expand Up @@ -394,9 +403,14 @@ function initiate_upgrade {
rm -rf "${PGDATANEW:?}/"

if [ "$IS_NIX_UPGRADE" = "true" ]; then
LC_ALL=en_US.UTF-8 LC_CTYPE=$SERVER_LC_CTYPE LC_COLLATE=$SERVER_LC_COLLATE LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -c ". /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh && $PGBINNEW/initdb --encoding=$SERVER_ENCODING --lc-collate=$SERVER_LC_COLLATE --lc-ctype=$SERVER_LC_CTYPE -L $PGSHARENEW -D $PGDATANEW/ --username=supabase_admin" -s "$SHELL" postgres
if [[ "$PGVERSION" =~ ^17.* ]]; then
LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -c ". /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh && $PGBINNEW/initdb --encoding=$SERVER_ENCODING --locale-provider=icu --icu-locale=en_US.UTF-8 -L $PGSHARENEW -D $PGDATANEW/ --username=supabase_admin" -s "$SHELL" postgres
else
LC_ALL=en_US.UTF-8 LC_CTYPE=$SERVER_LC_CTYPE LC_COLLATE=$SERVER_LC_COLLATE LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -c ". /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh && $PGBINNEW/initdb --encoding=$SERVER_ENCODING --lc-collate=$SERVER_LC_COLLATE --lc-ctype=$SERVER_LC_CTYPE -L $PGSHARENEW -D $PGDATANEW/ --username=supabase_admin" -s "$SHELL" postgres
fi
else
su -c "$PGBINNEW/initdb -L $PGSHARENEW -D $PGDATANEW/ --username=supabase_admin" -s "$SHELL" postgres

fi

# This line avoids the need to supply the supabase_admin password on the old
Expand All @@ -409,6 +423,25 @@ $(cat /etc/postgresql/pg_hba.conf)" > /etc/postgresql/pg_hba.conf
run_sql -c "select pg_reload_conf();"
fi

TMP_CONFIG="/tmp/pg_upgrade/postgresql.conf"
cp "$POSTGRES_CONFIG_PATH" "$TMP_CONFIG"

# Check if max_slot_wal_keep_size exists in the config
if grep -q "max_slot_wal_keep_size" "$TMP_CONFIG"; then
# Find and replace the existing setting
sed -i 's/^\s*max_slot_wal_keep_size\s*=.*$/max_slot_wal_keep_size = -1/' "$TMP_CONFIG"
else
# Add the setting if not found
echo "max_slot_wal_keep_size = -1" >> "$TMP_CONFIG"
fi

# Remove db_user_namespace if upgrading from PG15
if [[ "$OLD_PGVERSION" =~ ^15.* && "$PGVERSION" =~ ^17.* ]]; then
sed -i '/^db_user_namespace/d' "$TMP_CONFIG"
fi

chown postgres:postgres "$TMP_CONFIG"

UPGRADE_COMMAND=$(cat <<EOF
time ${PGBINNEW}/pg_upgrade \
--old-bindir="${PGBINOLD}" \
Expand All @@ -417,17 +450,23 @@ $(cat /etc/postgresql/pg_hba.conf)" > /etc/postgresql/pg_hba.conf
--new-datadir=${PGDATANEW} \
--username=supabase_admin \
--jobs="${WORKERS}" -r \
--old-options='-c config_file=${POSTGRES_CONFIG_PATH}' \
--old-options="-c config_file=$TMP_CONFIG" \
--old-options="-c shared_preload_libraries='${SHARED_PRELOAD_LIBRARIES}'" \
--new-options="-c data_directory=${PGDATANEW}" \
--new-options="-c config_file=$TMP_CONFIG" \
--new-options="-c shared_preload_libraries='${SHARED_PRELOAD_LIBRARIES}'"
EOF
)

if [ "$IS_NIX_BASED_SYSTEM" = "true" ]; then
UPGRADE_COMMAND=". /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh && $UPGRADE_COMMAND"
fi
GRN_PLUGINS_DIR=/var/lib/postgresql/.nix-profile/lib/groonga/plugins LC_ALL=en_US.UTF-8 LC_CTYPE=$SERVER_LC_CTYPE LC_COLLATE=$SERVER_LC_COLLATE LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -pc "$UPGRADE_COMMAND --check" -s "$SHELL" postgres

if [[ "$PGVERSION" =~ ^17.* ]]; then
GRN_PLUGINS_DIR=/var/lib/postgresql/.nix-profile/lib/groonga/plugins LC_ALL=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -pc "$UPGRADE_COMMAND --check" -s "$SHELL" postgres
else
GRN_PLUGINS_DIR=/var/lib/postgresql/.nix-profile/lib/groonga/plugins LC_ALL=en_US.UTF-8 LC_CTYPE=$SERVER_LC_CTYPE LC_COLLATE=$SERVER_LC_COLLATE LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -pc "$UPGRADE_COMMAND --check" -s "$SHELL" postgres
fi

echo "10. Stopping postgres; running pg_upgrade"
# Extra work to ensure postgres is actually stopped
Expand All @@ -439,11 +478,24 @@ EOF

sleep 3
systemctl stop postgresql

# Additional check to ensure postgres is really stopped
if [ -f "${PGDATAOLD}/postmaster.pid" ]; then
echo "PostgreSQL still running, forcing stop..."
pid=$(head -n 1 "${PGDATAOLD}/postmaster.pid")
kill -9 "$pid" || true
rm -f "${PGDATAOLD}/postmaster.pid"
fi
else
CI_stop_postgres
fi

GRN_PLUGINS_DIR=/var/lib/postgresql/.nix-profile/lib/groonga/plugins LC_ALL=en_US.UTF-8 LC_CTYPE=$SERVER_LC_CTYPE LC_COLLATE=$SERVER_LC_COLLATE LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -pc "$UPGRADE_COMMAND" -s "$SHELL" postgres
# Start the old PostgreSQL instance with version-specific options
if [[ "$PGVERSION" =~ ^17.* ]]; then
GRN_PLUGINS_DIR=/var/lib/postgresql/.nix-profile/lib/groonga/plugins LC_ALL=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -pc "$UPGRADE_COMMAND" -s "$SHELL" postgres
else
GRN_PLUGINS_DIR=/var/lib/postgresql/.nix-profile/lib/groonga/plugins LC_ALL=en_US.UTF-8 LC_CTYPE=$SERVER_LC_CTYPE LC_COLLATE=$SERVER_LC_COLLATE LANGUAGE=en_US.UTF-8 LANG=en_US.UTF-8 LOCALE_ARCHIVE=/usr/lib/locale/locale-archive su -pc "$UPGRADE_COMMAND" -s "$SHELL" postgres
fi

# copying custom configurations
echo "11. Copying custom configurations"
Expand Down
6 changes: 3 additions & 3 deletions ansible/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ postgres_major:

# Full version strings for each major version
postgres_release:
postgresorioledb-17: "17.0.1.078-orioledb"
postgres17: "17.4.1.028"
postgres15: "15.8.1.085"
postgresorioledb-17: "17.0.1.079-orioledb"
postgres17: "17.4.1.029"
postgres15: "15.8.1.086"

# Non Postgres Extensions
pgbouncer_release: "1.19.0"
Expand Down
115 changes: 115 additions & 0 deletions nix/docs/testing-pg-upgrade-scripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Testing PostgreSQL Upgrade Scripts

This document describes how to test changes to the PostgreSQL upgrade scripts on a running machine.

## Prerequisites

- A running PostgreSQL instance
- Access to the Supabase Postgres repository
- Permissions to run GitHub Actions workflows
- ssh access to the ec2 instance

## Development Workflow

1. **Make Changes to Upgrade Scripts**
- Make your changes to the scripts in `ansible/files/admin_api_scripts/pg_upgrade_scripts/`
- Commit and push your changes to your feature branch
- For quick testing, you can also edit the script directly on the server at `/etc/adminapi/pg_upgrade_scripts/initiate.sh`

2. **Publish Script Changes** (Only needed for deploying to new instances)
- Go to [publish-nix-pgupgrade-scripts.yml](https://github.com/supabase/postgres/actions/workflows/publish-nix-pgupgrade-scripts.yml)
- Click "Run workflow"
- Select your branch
- Run the workflow

3. **Publish Binary Flake Version** (Only needed for deploying to new instances)
- Go to [publish-nix-pgupgrade-bin-flake-version.yml](https://github.com/supabase/postgres/actions/workflows/publish-nix-pgupgrade-bin-flake-version.yml)
- Click "Run workflow"
- Select your branch
- Run the workflow
- Note: Make sure the flake version includes the PostgreSQL version you're testing (e.g., 17)

4. **Test on Running Machine**
ssh into the machine
```bash
# Stop PostgreSQL
sudo systemctl stop postgresql

# Run the upgrade script in local mode with your desired flake version
sudo NIX_FLAKE_VERSION="your-flake-version-here" IS_LOCAL_UPGRADE=true /etc/adminapi/pg_upgrade_scripts/initiate.sh 17
```
Note: This will use the version of the script that exists at `/etc/adminapi/pg_upgrade_scripts/initiate.sh` on the server.
The script should be run as the ubuntu user with sudo privileges. The script will handle switching to the postgres user when needed.

In local mode:
- The script at `/etc/adminapi/pg_upgrade_scripts/initiate.sh` will be used (your edited version)
- Only the PostgreSQL binaries will be downloaded from the specified flake version
- No new upgrade scripts will be downloaded
- You can override the flake version by setting the NIX_FLAKE_VERSION environment variable
- If NIX_FLAKE_VERSION is not set, it will use the default flake version

5. **Monitor Progress**
```bash
# Watch the upgrade log
tail -f /var/log/pg-upgrade-initiate.log
```

6. **Check Results**
In local mode, the script will:
- Create a new data directory at `/data_migration/pgdata`
- Run pg_upgrade to test the upgrade process
- Generate SQL files in `/data_migration/sql/` for any needed post-upgrade steps
- Log the results in `/var/log/pg-upgrade-initiate.log`

To verify success:
```bash
# Check the upgrade log for completion
grep "Upgrade complete" /var/log/pg-upgrade-initiate.log

# Check for any generated SQL files
ls -l /data_migration/sql/

# Check the new data directory
ls -l /data_migration/pgdata/
```

Note: The instance will not be upgraded to the new version in local mode. This is just a test run to verify the upgrade process works correctly.

## Important Notes

- The `IS_LOCAL_UPGRADE=true` flag makes the script run in the foreground and skip disk mounting steps
- The script will use the existing data directory
- All output is logged to `/var/log/pg-upgrade-initiate.log`
- The script will automatically restart PostgreSQL after completion or failure
- For testing, you can edit the script directly on the server - the GitHub Actions workflows are only needed for deploying to new instances
- Run the script as the ubuntu user with sudo privileges - the script will handle user switching internally
- Local mode is for testing only - it will not actually upgrade the instance
- The Nix flake version must include the PostgreSQL version you're testing (e.g., 17)
- In local mode, only the PostgreSQL binaries are downloaded from the flake - the upgrade scripts are used from the local filesystem
- You can override the flake version by setting the NIX_FLAKE_VERSION environment variable when running the script

## Troubleshooting

If the upgrade fails:
1. Check the logs at `/var/log/pg-upgrade-initiate.log`
2. Look for any error messages in the PostgreSQL logs
3. The script will attempt to clean up and restore the original state
4. If you see an error about missing Nix flake attributes, make sure the flake version includes the PostgreSQL version you're testing

Common Errors:
- `error: flake 'github:supabase/postgres/...' does not provide attribute 'packages.aarch64-linux.psql_17/bin'`
- This means the Nix flake version doesn't include PostgreSQL 17 binaries
- You need to specify a flake version that includes your target version
- You can find valid flake versions by looking at the commit history of the publish-nix-pgupgrade-bin-flake-version.yml workflow

## Cleanup

After testing:
1. The script will automatically clean up temporary files
2. PostgreSQL will be restarted
3. The original configuration will be restored

## References

- [publish-nix-pgupgrade-scripts.yml](https://github.com/supabase/postgres/actions/workflows/publish-nix-pgupgrade-scripts.yml)
- [publish-nix-pgupgrade-bin-flake-version.yml](https://github.com/supabase/postgres/actions/workflows/publish-nix-pgupgrade-bin-flake-version.yml)