Distributed Systems Infrastructure 2.0

Quick Start (Ubuntu)

sudo apt install awscli  # / pip install awscli / brew install awscli
aws configure # API credentials

ssh-keygen -m PEM -t rsa -b 2048 -C $(whoami)-dsikey \
    -f  ~/.ssh/$(whoami)-dsikey #no pass

ssh-agent bash # initialize ssh-agent, assuming you are using bash
ssh-add ~/.ssh/$(whoami)-dsikey

for a in $(aws ec2 describe-regions --query 'Regions[].{Name:RegionName}' --output text); do aws ec2 import-key-pair --key-name $(whoami)-dsikey --public-key-material file://~/.ssh/$(whoami)-dsikey.pub --region $a --cli-binary-format raw-in-base64-out ; done

git clone [email protected]:10gen/dsi.git; cd dsi; # git checkout stable

# Activate virtualenv / workon here if you want (python3)
pip3 install --user -r requirements.txt

curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_linux_amd64.zip 
# mac: curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_darwin_amd64.zip 
sudo unzip terraform.zip -d /usr/local/bin

WORK=any-path
$EDITOR configurations/bootstrap/bootstrap.example.yml
./bin/bootstrap.py --directory $WORK --bootstrap-file configurations/bootstrap/bootstrap.example.yml
cd $WORK

# You can put the following line in .bashrc if you don't mind adding a relative path to PATH
export PATH=./.bin:$PATH
infrastructure_provisioning.py
workload_setup.py
mongodb_setup.py
test_control.py
analysis.py
infrastructure_teardown.py

More docs to get started

The above steps in long form: Getting Started
Frequently Asked Questions
DSI is a complex system with hundreds of configuration options. All of them are documented under docs/config-specs/.
Our paper from DBTest.io 2020 describes how we developed and used DSI to test MongoDB performance.
- The branch mongodb-2020 is a DSI version frozen in time to reflect the state of this project as described in that paper. (I've left MongoDB shortly after and removed some hard dependencies on infrastructure only available to MongoDB employees.)

Navigating and using this repo

DSI = Distributed Systems Infrastructure. At MongoDB we use this for system level performance tests where we deploy real MongoDB clusters in AWS.

DSI is the orchestrator which drives all of the below:

bin/infrastructure_provisioning.py Deploy EC2 resources with terraform.
terraform/remote-scripts/system-setup.sh Linux configurations (mount disks, install packages...)
bin/workload_setup.py Install test specific dependencies (e.g. Java for YCSB)
bin/mongodb_setup.py Deploy a MongoDB cluster
bin/test_control.py Execute a test, collect and parse results.
- Currently supported benchmark tools: Mongo shell (Benchrun), YCSB, py-tpcc, Linkbench, Genny, Sysbench
bin/analysis.py Run various checks on test log files: core files, replication lag, etc...
bin/infrastructure_teardown.py terraform destroy

A key principle in developing DSI was that DSI owns and has access to all configuration. For example, we use vanilla AMI images and all system setup is in terraform/remote-scripts/system-setup.sh. If you look at a file called mongodb_setup.yml, you will see that it embeds a mongod.conf file (among other things). Similarly infrastructure_provisioning.yml embeds some input parameters to terraform *.tf files. All DSI config is in YAML. Since terraform uses JSON, DSI will convert the YAML to JSON when executing terraform.

The reasons for having all configuration in DSI are:

Consistency: All configuration is in the same syntax (YAML) and in a limited set of files, which always have the same names, whether you use YCSB or Linkbench.
Tracking: All configuration changes are committed to this repo. This avoids situations where performance changes are due to changes to a specially crafted AMI, generated by scripts in another repo, by a person on a different team.
Globally shared, "normalized" config: All DSI binaries always read the entire set of config files. For example, mongodb_setup.py will use the same SSH key as terraform used in infrastructure_provisioning.py.

You use DSI by creating a work directory and putting some configuration files into it. (At least once upon a time it was even possible to run all DSI commands using just defaults, without any configuration files.) This directory will also hold your terraform tfstate files, benchmark output, logs, etc...

A helper script bin/bootstrap.py is a convenient way to create a directory and copy some canned configuration files into it. In fact, we almost always use files available under configurations/. You list the combination of configs you want to use in a simple bootstrap.yml file. See configurations/bootstrap.example.yml to get started!

All configuration is in files, command line options aren't supported. This way there's a permanent record of all config that was used to create a specific benchmark result. (In CI we tar and store the entire work directory, containing both all your configuration as well as result files.) It's also simple to rerun the exact same test without having to copy paste cli options from a log file or a friend.

The effective runtime configuration is a blend of three levels of configuration:

configurations/defaults.yml
infrastructure_provisioning.yml, workload_setup.yml, mongodb_setup.yml, test_control.yml
overrides.yml

...where later configurations override those in the former level.

The second level is split into one file per section, but are logically a single configuration. The reason for splitting into multiple files is modularity: Whether you want to deploy a 1-node or 3-node replica set, you can use the same test_control.yml with both.

The file overrides.yml is a small config file where you can conveniently add manual changes if you don't want to edit the files in level 2, as they tend to be bigger. However, editing those files is perfectly allowed too. It's up to you!

Development & Testing

Run all validations, linters and tests:

testscripts/runtests.sh

Run all the unit tests:

testscripts/run-nosetest.sh

Run a specific test:

testscripts/run-nosetest.sh bin/tests/test_bootstrap.py

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
aws_tools		aws_tools
bin		bin
configurations		configurations
docs		docs
requirements		requirements
terraform		terraform
test_lib		test_lib
testscripts		testscripts
.coveragerc		.coveragerc
.drone.yml		.drone.yml
.gitignore		.gitignore
.style.yapf		.style.yapf
.yamllint		.yamllint
GettingStarted.md		GettingStarted.md
LICENSE		LICENSE
README.md		README.md
pylintrc		pylintrc
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Systems Infrastructure 2.0

Quick Start (Ubuntu)

More docs to get started

Navigating and using this repo

Development & Testing

About

Releases

Packages

Languages

License

nyrkio/dsi

Folders and files

Latest commit

History

Repository files navigation

Distributed Systems Infrastructure 2.0

Quick Start (Ubuntu)

More docs to get started

Navigating and using this repo

Development & Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages