-
-
Notifications
You must be signed in to change notification settings - Fork 0
Restore WP1 from a backup
Aside from a Zimfarm instance which is not covered in this documentation, WP1 relies on a compute instance (mwcurator) and a Trove DB.
This documentation details the procedure to restore everything from scratch. Depending on the failure we are encountering, some parts can obviously be skipped.
- admin access to the target wikimedia cloud project (currently
mwoffliner
) - a machine (probably in the cloud) with
- significant bandwidth to borg servers (contains the backup) and wikimedia cloud (target of the restore)
- sufficient space to hold a database backup on local disk (will be stored there temporarily)
- mariadb CLI (use
apt install mariadb-client
on Debian) - SSH credentials on this machine to access wikimedia cloud machine with SSH (you can add a temporary new key on your user at https://idm.wikimedia.org/keymanagement/
It is also recommended to use on this machine:
-
screen
so that long-running processes are not stopped should your SSH connection being dropped (useapt install screen
on Debian) -
pv
to track mariadb command progress (useapt install pv
on Debian)
All CLI steps described below are expected to be done on this machine. Anything done in a browser can obviously be done on any machine.
You may need to request additional quota in your cloud project from Wikitech, especially if the original machines are still running. See https://phabricator.wikimedia.org/T375977 for inspiration on how to do it (this is the ticket we opened to request increased quota while building this documentation and testing the restore procedure).
- Go to https://horizon.wikimedia.org/
- Select
mwoffliner
as your project - Re-create the application server
- Under Compute -> Instances
- Select “Launch Instance”
- Under “Source” select appropriate image (currently we use debian-11-bullseye)
- Under “Flavor” select an appropriate number of vCPUs, RAM and disk (currently we use g4.cores8.ram16.disk20)
- Under ”Security groups”, add the
web
security group to the instance to expose ports 80 and 443 (more details in these instructions.
- Re-create the database server
- Under Database -> Instances
- Select “Launch Instance”
- Under “Volume Size”, choose an appropriate number of GB to handle the size of the database (currently DB is configured with 75 GB)
- Under “Datastore”, choose
mariadb
- Under “Flavor”, choose an appropriate number of vCPUs, RAM and disk (currently we use
g4.cores2.ram4.disk20 | 4GB RAM
). - Under “Initialize Databases”
- Initial Databases:
enwp10_prod
- Initial Admin User:
wp1
- Password: generate a secure password externally and paste in. Write it down, you'll need it later of course.
- Initial Databases:
- Under “Advanced”
- Configuration Group:
wp1-db-import
(will make the db import faster, make sure to remove this config after the restore)
- Configuration Group:
If necessary, alter your ~/.ssh/config file to add new stuff.
Typical configuration looks like this (replace <your_username>
with your Wikimedia cloud SSH user, see https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances for help):
Host bastion.wmcloud.org
HostName bastion.wmcloud.org
User <your_username>
Host login.toolforge.org
HostName login.toolforge.org
User <your_username>
Host mwcurator
HostName mwcurator.mwoffliner.eqiad1.wikimedia.cloud
ProxyJump bastion.wmcloud.org
User <your_username>
You should now be able to run this command:
ssh mwcurator
login.toolforge.org will be used to access the database.
Backups are in borgbase. To download them, you need the read-only credentials:
# those are all static values you need to enter
# those are all for the _slave_ (aka readonly) bitwarden account
export BW_CLIENTID=user.xxxxxxxxx
export BW_CLIENTSECRET=xxxxxxxxxxxx
export BW_PASSWORD=xxxxxxxxxxxx
docker run -v $PWD/data/restore:/restore:rw -e BW_CLIENTID=$BW_CLIENTID -e BW_CLIENTSECRET=$BW_CLIENTSECRET -e BW_PASSWORD=$BW_PASSWORD ghcr.io/kiwix/borg-backup restore --name wp1db --list
wp1db
is the name of the Borgbase repository in which we archive the WP1 backups.
Output would look like
List avaible archives ...
Remote: Warning: Permanently added the ECDSA host key for IP address '94.130.217.50' to the list of known hosts.
Warning: Attempting to access a previously unknown unencrypted repository!
Do you want to continue? [yN] yes (from BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK)
wp1db__backup__2022-12-31T04:00:51 Sat, 2022-12-31 04:00:53 [6bf09f64fa4fd04215bb07f47e2bea7a217e83ce664b09bf6f5af4c35bdf5db8]
wp1db__backup__2023-10-31T04:12:10 Tue, 2023-10-31 04:12:13 [0ef5c9bfee8d5694136d32895a21965a1d79471a2ab2b0552ff9617a8b692579]
wp1db__backup__2023-11-30T04:04:01 Thu, 2023-11-30 04:04:03 [b7f6962236a5755a27b838f6848d24251ddd38238e3c513a02881defe8aa581c]
...
wp1db__backup__2024-10-27T04:01:41 Sun, 2024-10-27 04:01:43 [f63300f6eb97a8e41079f115eab1a6c23188772490211ce4e630ed49b622fe4b]
wp1db__backup__2024-10-28T04:03:10 Mon, 2024-10-28 04:03:13 [a99d5a77c156c26ab20da82de3f68759fffcfd098be9de575322713a77557a9d]
wp1db__backup__2024-10-29T04:01:27 Tue, 2024-10-29 04:01:29 [23da787371ea4921913b2d299b643a35dcb4e8adf9c037923ef815cbad72e984]```
Choose one based on its date.
Note: the archive name is the first column (stops at first space), e.g. wp1db__backup__2024-10-29T04:01:27
.
With your selected archive name, download+extract it to your filesystem:
docker run -v $PWD/data/restore:/restore:rw -e BW_CLIENTID=$BW_CLIENTID -e BW_CLIENTSECRET=$BW_CLIENTSECRET -e BW_PASSWORD=$BW_PASSWORD ghcr.io/kiwix/borg-backup restore --name wp1db --extract "wp1db__backup__2024-10-29T04:01:27"
Wiki backup will be extracted to $PWD/data/restore
in this example. It contains:
- a dump of the Mysql database
If needed (e.g. on a Linux box), ensure that you own all restored files:
sudo chown -R $(id -u -n):$(id -g -n) $PWD/data
Find the hostname of your new Trove (database) instance that you created above. This is in Horizon under Databases -> Instances. Click the instance name and you should see something like this:
Set up an SSH tunnel to that database host, through your toolforge bastion. For this you need a free TCP port on your machine, noted <LOCAL_PORT>
in command below. By default, you can use 3306 but any free (and bindable) TCP port will work. You can use the command:
ssh -L <LOCAL_PORT>:ofi3zurkdgo.svc.trove.eqiad1.wikimedia.cloud:3306 login.toolforge.org
NOTE: See the Wikimedia Cloud docs for more information on setting up the tunnel. You will need to have Tools or Toolforge credentials set up in your ~/.ssh/config
file for this to work. See this help file for details on setting up SSH.
Find your backup file. Mine was in $PWD/data/restore/root/.borgmatic/mysql_databases/tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/
but if the production trove hostname changes, yours could be different.
Use the following command, entering the password you chose above, to start restoring the database:
pv data/restore/root/.borgmatic/mysql_databases/tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/enwp10_prod | mariadb -h 127.0.0.1 -P <LOCAL_PORT> -u wp1 -p enwp10_prod
With pv
, you will have a visual progress bar. In testing, this took about 7 hours.
- Install docker using these directions.
- Create the following directories:
sudo mkdir -p /data/wp1bot /data/code/ /data/wp1bot/db/ /srv/log/wp1bot/ /srv/data/wp1bot/
Note that the /srv
directory is an NFS mount. The /data
directory, on the original server, is an attached Cinder volume that is needed for other operations on that server (wp1_selection_tools), but is not needed for WP1 service itself. These paths are hardcoded in docker-compose.yml
but could be updated there if you're having trouble creating the directories.
-
cd /data/code
and checkout the wp1 repository:sudo git clone https://github.com/openzim/wp1.git
Ideally, you should have a backup of credentials.py. If so, restore it at /data/wp1bot/credentials.py
. If not follow steps below (and/or read documentation in the example file credentials.py.example
).
- Copy the example credentials:
sudo cp /data/code/wp1/wp1/credentials.py.example /data/wp1bot/credentials.py
- Edit the file (
sudo nano /data/wp1bot/credentials.py
), providing the necessary values (commented out) and deleting the keys:Environment.DEVELOPMENT
,Environment.TEST
, and the existing emptyEnvironment.PRODUCTION
key. - Edit the
ENV =
line to readENV = Environment.PRODUCTION
-
WIKIDB
is the Wikipedia replica db, also known asenwiki_p
. The credentials are your project toolforge credentials, which can be found by logging intossh login.toolforge.org
and reading the filereplica.my.cnf
. -
WP10DB
is the application database that you restored to Trove. User should bewp1
, password is the password you set when you restored, host is the Trove host (ofi3zurkdgo.svc.trove.eqiad1.wikimedia.cloud
in our example). You can leave out the port (it defaults to 3306 which is where Trove is running). 'REDIS': { 'host': 'redis', 'port': 6379 }
-
'API': { 'user': 'WP 1.0 bot@WP_1.0_Bot', 'pass': ??? }
, TODO: figure out how we would find/reset this password. -
'MWOAUTH'
: If you've lost this credential, you will need to register a new OAuth application. The client secret cannot be recovered from any Wikimedia web interface. -
'SESSION': { 'secret_key': 'any sufficiently long string of random characters, like a password' }
. If you wish users to not be logged out, you need to set the same 'secret_key' as the previous application server. - For
'CLIENT_URL'
, it should stay the same as the example values. If these values change, you will need to update theVIRTUAL_HOST
keys indocker-compose.yml
. -
'STORAGE'
is the AWS S3 config, where we store created selections and ZIMs. These should be available from Kiwix. -
'ZIMFARM'
is the credentials for the Zimfarm that is used to create ZIMs. Get these credentials from Kiwix.
-
Ideally, you should have a backup of .wp1db_backup.env. If so, restore it at /data/wp1bot/db/.wp1db_backup.env. If not use sample content bellow.
Sample with redacted secrets:
BORGBASE_NAME=wp1db
BW_CLIENTID=user.*************************************
BW_CLIENTSECRET=******************************
BW_PASSWORD=****************
[email protected]
DATABASES=mysql://***:*****************=****************@tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/enwp10_prod
BACKUP_HOUR=4
BACKUP_MINUTE=0
Ideally, you should have a backup of yoyo.ini. If so, restore it at /data/wp1bot/db/yoyo.ini. If not use sample content bellow.
Sample with redacted secrets:
[DEFAULT]
sources = /usr/src/app/db/migrations/
migration_table = _yoyo_migration
batch_mode = off
verbosity = 0
database = mysql://***:********************************@tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/enwp10_prod
The application server is mapped to a floating IP address, which allows it to be mapped to its domain, wp1.openzim.org
. If you are restoring to a new application server, you should go to the floating IP management screen and detach the IP from the existing (crashed) server and reattach it to your new server.
- Follow the deploy instructions in the README, starting with 'Pull the docker images from docker hub'.
Note that SSL certificates are automatically retrieved by lets-encrypt
. This will work only once the domains (wp1.openzim.org, api.wp1.openzim.org) are correctly routed to the machine. If needed, restart the container to restart certificates issuance.