sudo apt install awscli # / pip install awscli / brew install awscli
aws configure # API credentials
ssh-keygen -m PEM -t rsa -b 2048 -C $(whoami)-dsikey \
-f ~/.ssh/$(whoami)-dsikey #no pass
ssh-agent bash # initialize ssh-agent, assuming you are using bash
ssh-add ~/.ssh/$(whoami)-dsikey
for a in $(aws ec2 describe-regions --query 'Regions[].{Name:RegionName}' --output text); do aws ec2 import-key-pair --key-name $(whoami)-dsikey --public-key-material file://~/.ssh/$(whoami)-dsikey.pub --region $a --cli-binary-format raw-in-base64-out ; done
git clone [email protected]:10gen/dsi.git; cd dsi; # git checkout stable
# Activate virtualenv / workon here if you want (python3)
pip3 install --user -r requirements.txt
curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_linux_amd64.zip
# mac: curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_darwin_amd64.zip
sudo unzip terraform.zip -d /usr/local/bin
WORK=any-path
$EDITOR configurations/bootstrap/bootstrap.example.yml
./bin/bootstrap.py --directory $WORK --bootstrap-file configurations/bootstrap/bootstrap.example.yml
cd $WORK
# You can put the following line in .bashrc if you don't mind adding a relative path to PATH
export PATH=./.bin:$PATH
infrastructure_provisioning.py
workload_setup.py
mongodb_setup.py
test_control.py
analysis.py
infrastructure_teardown.py
- The above steps in long form: Getting Started
- Frequently Asked Questions
- DSI is a complex system with hundreds of configuration options. All of them are documented under docs/config-specs/.
- Our paper from DBTest.io 2020 describes how we developed and used DSI to test MongoDB performance.
- The branch mongodb-2020 is a DSI version frozen in time to reflect the state of this project as described in that paper. (I've left MongoDB shortly after and removed some hard dependencies on infrastructure only available to MongoDB employees.)
DSI = Distributed Systems Infrastructure. At MongoDB we use this for system level performance tests where we deploy real MongoDB clusters in AWS.
DSI is the orchestrator which drives all of the below:
- bin/infrastructure_provisioning.py Deploy EC2 resources with terraform.
- terraform/remote-scripts/system-setup.sh Linux configurations (mount disks, install packages...)
- bin/workload_setup.py Install test specific dependencies (e.g. Java for YCSB)
- bin/mongodb_setup.py Deploy a MongoDB cluster
- bin/test_control.py Execute a test, collect and parse results.
- Currently supported benchmark tools: Mongo shell (Benchrun), YCSB, py-tpcc, Linkbench, Genny, Sysbench
- bin/analysis.py Run various checks on test log files: core files, replication lag, etc...
- bin/infrastructure_teardown.py terraform destroy
A key principle in developing DSI was that DSI owns and has access to all configuration. For example, we use vanilla AMI images and all system setup is in terraform/remote-scripts/system-setup.sh. If you look at a file called mongodb_setup.yml, you will see that it embeds a mongod.conf file (among other things). Similarly infrastructure_provisioning.yml embeds some input parameters to terraform *.tf files. All DSI config is in YAML. Since terraform uses JSON, DSI will convert the YAML to JSON when executing terraform.
The reasons for having all configuration in DSI are:
- Consistency: All configuration is in the same syntax (YAML) and in a limited set of files, which always have the same names, whether you use YCSB or Linkbench.
- Tracking: All configuration changes are committed to this repo. This avoids situations where performance changes are due to changes to a specially crafted AMI, generated by scripts in another repo, by a person on a different team.
- Globally shared, "normalized" config: All DSI binaries always read the entire set of config files. For example, mongodb_setup.py will use the same SSH key as terraform used in infrastructure_provisioning.py.
You use DSI by creating a work directory and putting some configuration files into it. (At least once upon a time it was even possible to run all DSI commands using just defaults, without any configuration files.) This directory will also hold your terraform tfstate files, benchmark output, logs, etc...
A helper script bin/bootstrap.py is a convenient way to create a directory and
copy some canned configuration files into it. In fact, we almost always use files available under
configurations/. You list the combination of configs you want to use in a simple
bootstrap.yml
file. See
configurations/bootstrap.example.yml to get started!
All configuration is in files, command line options aren't supported. This way there's a permanent record of all config that was used to create a specific benchmark result. (In CI we tar and store the entire work directory, containing both all your configuration as well as result files.) It's also simple to rerun the exact same test without having to copy paste cli options from a log file or a friend.
The effective runtime configuration is a blend of three levels of configuration:
- configurations/defaults.yml
- infrastructure_provisioning.yml, workload_setup.yml, mongodb_setup.yml, test_control.yml
- overrides.yml
...where later configurations override those in the former level.
The second level is split into one file per section, but are logically a single configuration. The reason for splitting into multiple files is modularity: Whether you want to deploy a 1-node or 3-node replica set, you can use the same test_control.yml with both.
The file overrides.yml is a small config file where you can conveniently add manual changes if you don't want to edit the files in level 2, as they tend to be bigger. However, editing those files is perfectly allowed too. It's up to you!
Run all validations, linters and tests:
testscripts/runtests.sh
Run all the unit tests:
testscripts/run-nosetest.sh
Run a specific test:
testscripts/run-nosetest.sh bin/tests/test_bootstrap.py