Skip to content

Restore WP1 from a backup

benoit74 edited this page Nov 1, 2024 · 17 revisions

Aside from a Zimfarm instance which is not covered in this documentation, WP1 relies on a compute instance (mwcurator) and a Trove DB.

This documentation details the procedure to restore everything from scratch. Depending on the failure we are encountering, some parts can obviously be skipped.

Prerequisites

  • admin access to the target wikimedia cloud project (currently mwoffliner)
  • a machine (probably in the cloud) with
    • significant bandwidth to borg servers (contains the backup) and wikimedia cloud (target of the restore)
    • sufficient space to hold a database backup on local disk (will be stored there temporarily)
    • mariadb CLI (use apt install mariadb-client on Debian)
    • SSH credentials on this machine to access wikimedia cloud machine with SSH (you can add a temporary new key on your user at https://idm.wikimedia.org/keymanagement/

It is also recommended to use on this machine:

  • screen so that long-running processes are not stopped should your SSH connection being dropped (use apt install screen on Debian)
  • pv to track mariadb command progress (use apt install pv on Debian)

All CLI steps described below are expected to be done on this machine. Anything done in a browser can obviously be done on any machine.

Recreate mwcurator cloud VPS instance and database

You may need to request additional quota in your cloud project from Wikitech, especially if the original machines are still running. See https://phabricator.wikimedia.org/T375977 for inspiration on how to do it (this is the ticket we opened to request increased quota while building this documentation and testing the restore procedure).

  1. Go to https://horizon.wikimedia.org/
  2. Select mwoffliner as your project
  3. Re-create the application server
    1. Under Compute -> Instances
    2. Select “Launch Instance”
    3. Under “Source” select appropriate image (currently we use debian-11-bullseye)
    4. Under “Flavor” select an appropriate number of vCPUs, RAM and disk (currently we use g4.cores8.ram16.disk20)
    5. Under ”Security groups”, add the web security group to the instance to expose ports 80 and 443 (more details in these instructions.
  4. Re-create the database server
    1. Under Database -> Instances
    2. Select “Launch Instance”
    3. Under “Volume Size”, choose an appropriate number of GB to handle the size of the database (currently DB is configured with 75 GB)
    4. Under “Datastore”, choose mariadb
    5. Under “Flavor”, choose an appropriate number of vCPUs, RAM and disk (currently we use g4.cores2.ram4.disk20 | 4GB RAM).
    6. Under “Initialize Databases”
      1. Initial Databases: enwp10_prod
      2. Initial Admin User: wp1
      3. Password: generate a secure password externally and paste in. Write it down, you'll need it later of course.
  5. Under “Advanced”
    1. Configuration Group: wp1-db-import (will make the db import faster, make sure to remove this config after the restore)

Connect to server via SSH

If necessary, alter your ~/.ssh/config file to add new stuff.

Typical configuration looks like this (replace <your_username> with your Wikimedia cloud SSH user, see https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances for help):

Host bastion.wmcloud.org
    HostName bastion.wmcloud.org
    User <your_username>

Host login.toolforge.org
    HostName login.toolforge.org
    User <your_username>

Host mwcurator
     HostName mwcurator.mwoffliner.eqiad1.wikimedia.cloud
     ProxyJump bastion.wmcloud.org
     User <your_username>

You should now be able to run this command:

ssh mwcurator

login.toolforge.org will be used to access the database.

Restore the database from a backup

Set-up credentials

Backups are in borgbase. To download them, you need the read-only credentials:

# those are all static values you need to enter
# those are all for the _slave_ (aka readonly) bitwarden account
export BW_CLIENTID=user.xxxxxxxxx
export BW_CLIENTSECRET=xxxxxxxxxxxx
export BW_PASSWORD=xxxxxxxxxxxx

Select a backup

docker run -v $PWD/data/restore:/restore:rw -e BW_CLIENTID=$BW_CLIENTID -e BW_CLIENTSECRET=$BW_CLIENTSECRET -e BW_PASSWORD=$BW_PASSWORD ghcr.io/kiwix/borg-backup restore --name wp1db --list

wp1db is the name of the Borgbase repository in which we archive the WP1 backups.

Output would look like

List avaible archives ...
Remote: Warning: Permanently added the ECDSA host key for IP address '94.130.217.50' to the list of known hosts.
Warning: Attempting to access a previously unknown unencrypted repository!
Do you want to continue? [yN] yes (from BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK)
wp1db__backup__2022-12-31T04:00:51   Sat, 2022-12-31 04:00:53 [6bf09f64fa4fd04215bb07f47e2bea7a217e83ce664b09bf6f5af4c35bdf5db8]
wp1db__backup__2023-10-31T04:12:10   Tue, 2023-10-31 04:12:13 [0ef5c9bfee8d5694136d32895a21965a1d79471a2ab2b0552ff9617a8b692579]
wp1db__backup__2023-11-30T04:04:01   Thu, 2023-11-30 04:04:03 [b7f6962236a5755a27b838f6848d24251ddd38238e3c513a02881defe8aa581c]
...
wp1db__backup__2024-10-27T04:01:41   Sun, 2024-10-27 04:01:43 [f63300f6eb97a8e41079f115eab1a6c23188772490211ce4e630ed49b622fe4b]
wp1db__backup__2024-10-28T04:03:10   Mon, 2024-10-28 04:03:13 [a99d5a77c156c26ab20da82de3f68759fffcfd098be9de575322713a77557a9d]
wp1db__backup__2024-10-29T04:01:27   Tue, 2024-10-29 04:01:29 [23da787371ea4921913b2d299b643a35dcb4e8adf9c037923ef815cbad72e984]```

Choose one based on its date.

Note: the archive name is the first column (stops at first space), e.g. wp1db__backup__2024-10-29T04:01:27.

Extract a Backup file

With your selected archive name, download+extract it to your filesystem:

docker run -v $PWD/data/restore:/restore:rw -e BW_CLIENTID=$BW_CLIENTID -e BW_CLIENTSECRET=$BW_CLIENTSECRET -e BW_PASSWORD=$BW_PASSWORD ghcr.io/kiwix/borg-backup restore --name wp1db --extract "wp1db__backup__2024-10-29T04:01:27"

Wiki backup will be extracted to $PWD/data/restore in this example. It contains:

  • a dump of the Mysql database

If needed (e.g. on a Linux box), ensure that you own all restored files:

sudo chown -R $(id -u -n):$(id -g -n) $PWD/data

Restore the backup to DB instance

Find the hostname of your new Trove (database) instance that you created above. This is in Horizon under Databases -> Instances. Click the instance name and you should see something like this:

image

Set up an SSH tunnel to that database host, through your toolforge bastion. For this you need a free TCP port on your machine, noted <LOCAL_PORT> in command below. By default, you can use 3306 but any free (and bindable) TCP port will work. You can use the command:

ssh -L <LOCAL_PORT>:ofi3zurkdgo.svc.trove.eqiad1.wikimedia.cloud:3306 login.toolforge.org

NOTE: See the Wikimedia Cloud docs for more information on setting up the tunnel. You will need to have Tools or Toolforge credentials set up in your ~/.ssh/config file for this to work. See this help file for details on setting up SSH.

Find your backup file. Mine was in $PWD/data/restore/root/.borgmatic/mysql_databases/tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/ but if the production trove hostname changes, yours could be different.

Use the following command, entering the password you chose above, to start restoring the database:

pv data/restore/root/.borgmatic/mysql_databases/tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/enwp10_prod | mariadb -h 127.0.0.1 -P <LOCAL_PORT> -u wp1 -p enwp10_prod

With pv, you will have a visual progress bar. In testing, this took about 7 hours.

Set up the replacement application server

  1. Install docker using these directions.
  2. Create the following directories:
sudo mkdir -p /data/wp1bot /data/code/ /data/wp1bot/db/ /srv/log/wp1bot/ /srv/data/wp1bot/

Note that the /srv directory is an NFS mount. The /data directory, on the original server, is an attached Cinder volume that is needed for other operations on that server (wp1_selection_tools), but is not needed for WP1 service itself. These paths are hardcoded in docker-compose.yml but could be updated there if you're having trouble creating the directories.

  1. cd /data/code and checkout the wp1 repository: sudo git clone https://github.com/openzim/wp1.git

Restore credentials.py on worker machine

Ideally, you should have a backup of credentials.py. If so, restore it at /data/wp1bot/credentials.py. If not follow steps below (and/or read documentation in the example file credentials.py.example).

  1. Copy the example credentials: sudo cp /data/code/wp1/wp1/credentials.py.example /data/wp1bot/credentials.py
  2. Edit the file (sudo nano /data/wp1bot/credentials.py), providing the necessary values (commented out) and deleting the keys: Environment.DEVELOPMENT, Environment.TEST, and the existing empty Environment.PRODUCTION key.
  3. Edit the ENV = line to read ENV = Environment.PRODUCTION
    1. WIKIDB is the Wikipedia replica db, also known as enwiki_p. The credentials are your project toolforge credentials, which can be found by logging into ssh login.toolforge.org and reading the file replica.my.cnf.
    2. WP10DB is the application database that you restored to Trove. User should be wp1, password is the password you set when you restored, host is the Trove host (ofi3zurkdgo.svc.trove.eqiad1.wikimedia.cloud in our example). You can leave out the port (it defaults to 3306 which is where Trove is running).
    3. 'REDIS': { 'host': 'redis', 'port': 6379 }
    4. 'API': { 'user': 'WP 1.0 bot@WP_1.0_Bot', 'pass': ??? }, TODO: figure out how we would find/reset this password.
    5. 'MWOAUTH': If you've lost this credential, you will need to register a new OAuth application. The client secret cannot be recovered from any Wikimedia web interface.
    6. 'SESSION': { 'secret_key': 'any sufficiently long string of random characters, like a password' }. If you wish users to not be logged out, you need to set the same 'secret_key' as the previous application server.
    7. For 'CLIENT_URL', it should stay the same as the example values. If these values change, you will need to update the VIRTUAL_HOST keys in docker-compose.yml.
    8. 'STORAGE' is the AWS S3 config, where we store created selections and ZIMs. These should be available from Kiwix.
    9. 'ZIMFARM' is the credentials for the Zimfarm that is used to create ZIMs. Get these credentials from Kiwix.

Restore wp1db_backup.env

Ideally, you should have a backup of .wp1db_backup.env. If so, restore it at /data/wp1bot/db/.wp1db_backup.env. If not use sample content bellow.

Sample with redacted secrets:

BORGBASE_NAME=wp1db
BW_CLIENTID=user.*************************************
BW_CLIENTSECRET=******************************
BW_PASSWORD=****************
[email protected]
DATABASES=mysql://***:*****************=****************@tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/enwp10_prod
BACKUP_HOUR=4
BACKUP_MINUTE=0

Restore yoyo.ini

Ideally, you should have a backup of yoyo.ini. If so, restore it at /data/wp1bot/db/yoyo.ini. If not use sample content bellow.

Sample with redacted secrets:

[DEFAULT]
sources = /usr/src/app/db/migrations/
migration_table = _yoyo_migration
batch_mode = off
verbosity = 0
database = mysql://***:********************************@tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/enwp10_prod

Mapping the IP/DNS

The application server is mapped to a floating IP address, which allows it to be mapped to its domain, wp1.openzim.org. If you are restoring to a new application server, you should go to the floating IP management screen and detach the IP from the existing (crashed) server and reattach it to your new server.

Starting the app server

  1. Follow the deploy instructions in the README, starting with 'Pull the docker images from docker hub'.

Note that SSL certificates are automatically retrieved by lets-encrypt. This will work only once the domains (wp1.openzim.org, api.wp1.openzim.org) are correctly routed to the machine. If needed, restart the container to restart certificates issuance.