You will need AWS API credentials and an SSH key to locally provision a cluster.
Following the AWS documentation
use aws-configure
to create a ~/.aws/credentials
file, or make one manually. Enter the access
key and secret key and leave the default region and output format blank. The file should be a text file of the form:
[default]
aws_access_key_id = ABCDEF...
aws_secret_access_key = A1B2C3...
The ssh key is a 2048-bit RSA private key which will enable you to access your AWS resources. To create a key pair with ssh-keygen, execute:
ssh-keygen -m PEM -t rsa -b 2048 -C "<NAME OF SSH KEY>"
This key name exists in a namespace shared with co-workers. Please include your name in the key name.
You will be prompted to enter a file to save the key in:
~/.ssh/"<NAME_OF_SSH_KEY>"
DO NOT enter a passphrase. Passphrase-protected keys will not work with AWS
This will also create a public key file:
~/.ssh/"<NAME_OF_SSH_KEY>".pub
You must upload the ssh public key to all regions (remove from the list any regions that already have the ssh key). You can automate that process with a script like this one:
for a in us-east-1 us-east-2 us-west-1 us-west-2 ap-south-1 ap-northeast-1 ap-southeast-1 ap-southeast-2 ap-northeast-2 eu-central-1 eu-west-1 eu-west-2 ; do aws ec2 import-key-pair --key-name NAME_OF_SSH_KEY --public-key-material file://~/.ssh/NAME_OF_SSH_KEY.pub --region $a ; done
To view the list of all ssh key pairs in all regions, you can run this script:
for a in us-east-1 us-east-2 us-west-1 us-west-2 ap-south-1 ap-northeast-2 ap-southeast-1 ap-southeast-2 ap-northeast-1 eu-central-1 eu-west-1 eu-west-2 ; do echo $a ; aws ec2 describe-key-pairs --region $a ; done
To ensure that your SSH agent has key access, execute:
ssh-add /path/to/keyfile
(Example: ssh-add ~/.ssh/my_ssh_key
)
Check out the required repos into an easily accessible location.
DSI:
git clone [email protected]:10gen/dsi.git
Mongo:
git clone [email protected]:mongodb/mongo.git
Various benchmark clients supported by DSI:
git clone [email protected]:mongodb-labs/YCSB.git
git clone [email protected]:mongodb-labs/py-tpcc.git
git clone [email protected]:mongodb-labs/benchmarks.git # sysbench benchmarks
git clone [email protected]:mongodb/genny.git
Note: If you check out benchmark clients to your workstation, you can tell DSI their path in your bootstrap.yml file:
overrides:
workload_setup:
local_repos:
ycsb: ...
tpcc: ...
benchmarks: ...
genny: ...
DSI uses Terraform. To save time, you can install it in your path.
You must have version 0.12.16, despite newer versions possibly available.
Hint:
terraform version
If desired, create and activate a virtualenv to store required dependencies:
virtualenv venv; source venv/bin/activate
To install necessary dependencies, run:
pip install --user -r PATH_TO_DSI/requirements.txt
When finished using DSI, to escape the virtualenv, run:
deactivate
To set up a work directory:
DSI=./dsi
WORK=any-path
$EDITOR $DSI/configurations/bootstrap/bootstrap.example.yml
$DSI/bin/bootstrap.py --directory $WORK --bootstrap-file configurations/bootstrap/bootstrap.example.yml
cd $WORK
WORK can be an arbitrary directory path of your choosing. It will be created by bootstrap.py if it doesn’t already exist and the environment will be set up within it. If --directory is not used, the environment will be set up within the current working directory.
At this point you have a functioning DSI working directory and can provision a cluster. From this point forward, we assume you are in the working directory.
NOTE: You are provisioning resources in AWS. You need to clean them up later. See below for how to do that.
To provision some servers, execute:
infrastructure_provisioning.py
This will allocate your requested cluster and also apply some common operating system configuration, such
as mount and format disks and configure ulimits. The input configuration for this step is in the file
infrastructure_provisioning.yml
. Information about the infrastructure that gets provisioned is
located in infrastructure_provisioning.out.yml
.
To setup the workloads, execute:
workload_setup.py
This will setup the hosts for various kinds of workload types specified in workload_setup.yml. Note that the setup is only done for matching types specified in test.run.type in test_control.py. This step only has to be run once, even if you re-deploy the mongodb cluster and rerun tests.
Note that the benchmark client repositories are uploaded to the workload client host at this step. This is significant if you edit your benchmark client files later.
In the working directory, execute:
mongodb_setup.py
This will start a MongoDB cluster as specified in mongodb_setup.yml. It will download and install
the binary archive specified with mongodb_binary_archive
key.
To supply your own binary, such as from your Evergreen compile task, add its URL to
mongodb_setup.yml
:
mongodb_binary_archive: http://s3.amazonaws.com/mciuploads/dsi/<patch_info>.tar.gz
If you want to upload your own binary (such as via SCP), then you must set this option to the empty string: "". In that case this step will simply start mongodb using ~/mongodb/bin/mongod (or mongos)
The tests to run are specified in test_control.yml
. To run the tests, in the working directory, execute:
test_control.py
Running the tests will create a directory called reports/
with the results from the run, mongod.log and diagnostic.data.
You can simply connect to all the machines using conn.py
from the working directory. See infrastructure_provisioning.out.yml
for a list of all the machines that have been allocated and their ip addresses. For instance, to connect to the workload client host:
conn.py wc
Other targets you can connect to if desired:
- wc: The client machine running the workload
- md.N: Server instance N (for mongod)
- ms.N: Server instance N (for mongos)
- cs.N: Server instance N (for config servers)
The simplest way is to execute:
infrastructure_teardown.py
This will output a message confirming that your resources were destroyed:
Destroy complete! Resources: 8 destroyed.
Note: The terraform state of your cluster is stored in your work directory. Don't delete the directory before you have successfully executed infrastructure_teardown.py
Note: You must run infrastructure_teardown.py in the work directory that you want to destroy resources for. If you run the script in the wrong directory, it won't give an error but just say that "0 resources" were destroyed.