Repl Testing Plan

2015-02-05 (Feb. 5 2015) Trifork Test Work Summary and Conclussions

Trifork has improved rtcloud in the following areas:

Proisioning of pre-made EBS volumes with Riak data.
Automated build and provisioning of riak_ee from github.
Automated build and provisioning of riak_test from github.
Ansible-driven riak_test runs on provisioned clusters.
Test report generation to capture config and output of riak_test runs.
Test report upload to S3.
AWS Command line utility.
Jenkins job setup.
Python code restructuring

Each point is elaborated below.

Provisioning of pre-made EBS volumes with Riak data.

Riak often behaves very different when tested with a sigificant amount of data. Because of the time and IO ressources it takes to generate a usable dataset, it is not practical to have the test do this itself by populating Riak using basho_bench for instance. Instead, we want to have a pregenerated data set, that the test can start out with.

We built such a dataset using a copy-on-write (CoW) FS, because CoW allows for easily rolling back to a previous snapshot even after a test has modified data. BTRFS is readily available on linux, so this was the weapon of choice. The creation of 30mio key dataset went well on a 5-node Riak-cluster. To use them in a test, add the following line in cluster.config under each relevant cluster:

ebs_snapshots = snap-0ec90bf0,snap-08c90bf6,snap-07c90bf9,snap-02c90bfc,snap-01c90bff

The snapshots contain the entire Riak data-directory including a leveldb backend, anti-entropy and ring.

Because ring is included, the ebs volumes must be mounted in the right order. I.e. node 1 must have the first snapshot, aso. When running dual cluster tests, rtcloud automatically adjusts the ringfiles of cluster 2 to the correct node names (riak201.priv instead of riak101.priv aso.)

Data shape

Data is modelled around the Danish Medical use case having an average value size of 400 bytes uncrompressable with 2 secondary indeces pr. key. Due to overhead each object replica takes up about 600 bytes. anti_entropy adds about 25% to the space cost for this data shape.

Building a 1B key dataset on EBS/Btrfs/Linux

Significant efforts were put into generating a large dataset, which would be half the estimated Danish Medical use case target of 2B keys. However, EBS problems continued to emerge, with the device becomming unavailable to the kernel, and the kernel waiting for more than 2 mins for IO, ultimately crashing the Riak instance. Numerous attempts were made to fix the nodes, where the EBS volume had crashed including:

Destroying and recreating the EBS volume, restarting the cluster, and waiting for AAE to populate the empty node.
Destroying and recreating EC2 instance.
Upgrading linux kernel
Installing EC2 Enhanced networking.
Changing region between eu-west-1 and us-east-1

These efforts were fruitless, as one or two of the cluster nodes continued to crash after 1-10 hours of data initialization load. It was not tried to change the FS to ext4 for instance, as that would have meant not using CoW, and would require redisigning the test. Another option would be FS on linux

stop-test-and-revert-data.yml

The ansible playbook stop-test-and-revert-data.yml will stop any beam proces running on the testx01 machines, revert the clusters to the startup state, so the cluster(s) is/are ready for another test from scratch. This is faster than provisioning new EC2 instances and also cheaper in EC2 costs.

Automated build and provisioning of riak_ee from github.

When working on riak_ee, and testing during the work, the previous 2 ways of getting the riak-ee release on rtcloud were both somewhat involved.

Create and wrap up the ee-release, upload it to S3, modify rtcloud to download and use it.
With an "active" rtcloud, go into the riak_ee root dir, and run the rtcloud-riak-build command, which would build riak and provision the clusters with it.

To make it easier to automate tests using rtcloud, support was added for specifying riak_ee_branch in cluster.config. This makes rtcloud build and deploy riak_ee from the givn branch as part of the cluster provisioning.

Example cluster.config line: riak_ee_branch = 2.0

Automated build and provisioning of riak_test from github.

Much similar to testing custom riak_ee releases, the tests themselves are also often WIP when doing rtcloud tests. To accomodate this, rtcloud now builds and deploys riak_test from the github tag specified in cluster.config.

Example cluster.config line: riak_test_branch = enhance/krab/cluster-realtime-rebalance-cleanup

Ansible-driven riak_test runs on provisioned clusters.

To actually run a test against a riak_test cluster, you choose which test you want to run, and let ansible do it like this: ansible-playbook -i /path/to/rtcloud/clusters/my_cluster_name/generated/inventory ../../playbooks/run-test.yml -e "test=replication2_large_scale"

Test report generation to capture config and output of riak_test runs.

After having a test completed, generating the test report is as simple as: ansible-playbook -i /path/to/rtcloud/clusters/my_cluster_name/generated/inventory ../../playbooks/generate-test-report.yml

This extracts information from bench and the test machines and stuffs it together with the cluster.config in an archive named something like: test-report-20150120174833.tar.gz`

Test report upload to S3.

All test reports are automaticaly uploaded to Basho's S3 account in the S3 bucket rtcloud-test-reports

AWS Command line utility.

Out of necesity, and lack of AWS web access, we built the ebs.py tool, which supports a range of aws-related operations.

rsl@linux-54fw:~/projects/rtcloud> python bin/ebs.py

usage:
ebs.py create_volume sizeInGB [snapshot_id]
ebs.py create_volume_eu sizeInGB [snapshot_id]
ebs.py delete_volume volume_id
ebs.py delete_volume_eu volume_id
ebs.py attach_volume volume_id instance_id
ebs.py attach_volume_eu volume_id instance_id
ebs.py detach_volume volume_id
ebs.py detach_volume_eu volume_id
ebs.py create_snapshot volume_id
ebs.py create_snapshot_eu volume_id
ebs.py delete_snapshot snapshot_id
ebs.py delete_snapshot_eu snapshot_id
ebs.py copy_snapshot_to_eu snapshot_id
ebs.py copy_snapshot_to_us snapshot_id
ebs.py show_volumes region
ebs.py show_snapshots region
ebs.py reboot instance_id region
ebs.py terminate instance_id region
ebs.py placement_groups region
ebs.py delete_placement_group region name
ebs.py security_groups region
ebs.py instances region

Jenkins job setup.

Python code restructuring

12/12/2014 Week Status

Large scale test can now be executed by script.
Some EC2 issues encountered with hanging EBS vols and beam processes dying for no appearent reason.
VPN problem still not resolved meaning no inter-datacenter test possible so far. According to Joe/Engel, 2 VPN implementations exist in rtcloud - one of which works. Engel and Rune to take contact Monday.
1B key (almost) data load under way. Was stopped temporarily by a hanging ebs volume, but is running again.
riak_repl 2.0 branch is currently being cleaned up, so better use another one for the large scale test until 2.0 branch stabilizes.
Hipchat is a good way to stay/get in touch, so use it.
Trifork will try to get hold of Greg to coordinate CI'ing of the large scale rtcloud test with him.
test report from a run on empty clusters: https://docs.google.com/document/d/1IOZxpUKLdu8z4Fj_oLjQ5lMgGvrw7sPBblb2D24ceik/edit
We're still struggling getting the test to complete on the prefilled clusters due to the EC2 issues, and will make a report for such a test run when we have it done.

12/1/2014 Week Status

Over all
- Good progress on L2.5; large scale cluster test
- Example run without any preloaded data on the estimate branch. Bacho_bench results
- Some progress on new FS (MDC4/FS)
3 variants of fullsync discussed
1. keyspace sync, requiring same N-values for buckets on both ends
2. keyspace sync, permitting different N-values
3. clone sync

We will implement #1 above, which requires same N-values across clusters, but does permit different ring sizes in the two clusters. Buckets with same N-value are synced together; so all N=1 buckets together, all N=3 buckets together, etc. If some nodes are down, lower-values N's would make the fullsync fail partially.

Implementing #2 would require all nodes in both clusters to be up during the sync; because syncs need to happen between the primary responsible nodes (which would be guaranteed to host merkle trees for all relevant data).

We still need to understand the requirements for #3 (clone sync). If performance is the priority, it makes sense to only implement this for equal Q and N for both sides.

The new repl/aae will be on branch labeled krab-tagged-aae off the 2.0 branch.

Expect mid-january final delivery

11/21/2014 Week Status

Load-test now up and running - https://github.com/basho/riak_test/blob/enhance/krab/cluster-realtime-rebalance/tests/replication2_large_scale.erl
- background load enabled
- realtime enabled
- run a number of full syncs, nod up-down-enter-leave
- 2 x 5 cluster, US-EAST-1 to EU-WEST-1
Current data set ~30M keys @ 500 bytes avg (~900 bytes on disk avg)
Data images in S3 (mount @ EBS + ButterFS copy-on-write to local disk)

Minutes from 11/14/14

We will run a number of smaller tests next week (30M keys)
Throw in 1/1000 large objects (1-50MB)
Get a base line for these numbers
Micha will find tests that validate RT anf FS while nodes come up/down.
start building a 1B key dataset
meeting with greg to explain the setup (rune + miklix)
do a writeup on the test and how to run it.

Minutes from 10/31/14

Attendees:Krab, Jon, Michal, Greg, Heather

Loaded up grade may be a test that may be leveraged. Joe may have additional tests. Action: Jon get Joe to send additional tests he may have.
Capture stats: Baked into BashoBench. – Action: Greg to investigate
Jordan had a test to prepopulate data. - Kresten to get info from Jordan
Dataset size – 1B objects, replication factor of 3, size ~1K.

Week45, Status from Trifork - 11/10/14

Spent last week figuring out ins and outs of rt_cloud, amazon, tools, etc.
We can now run both mainstream (2.0) and specific branches of riak_ee/riak_test on Amazon
Mikael (miklix) have been working on a test that includes a background load using basho_bench.
Rune (rsltrifork) have been working on loading large datasets on S3 using basho's setup.
We still need to figure out how to get stats out from the rt_cloud environment
All in all, we believe we're now ready to do the "actual test".

REPL Test Plan (L2.5)

The large scale test is to test:

large scale repl (fullsync + realtime)
real time balance working
fullsync not halted by nodes up/down/add/remove
realtime not halted by nodes up/down/add/remove

Output artifacts:

performance graph #keys / #sync-time

Setup

The basic setup involves two clusters of 5 nodes, plus a test driver.

The test is run with a "base load" and a "test script".

The Base Load

For each test, there is a "base load" (that simulates a real time load), and then an operations script (the actual test).

For whatever hardware configuration, the base load should:

Use ~50% of available Memory, CPU and IOPS

I.e., it's not fun to test something that does not excersise the sytem.

The Test Script

The test script involves runnign various operations given the base load:

Active-Active Scenario

In this scenario, the base load involves

Taking writes in both clusters.
Real time enabled in both direction

Clusters in same or different availability zones

Operations

Now, the interesting tests involve

Running fullsync periodically.
... while adding/removing, starting/stopping nodes.

Test outcome

time of doing fullsync (as a function of the # keys)
does fullsync have to be restarted when nodes are added/removed? hopefully not.
validate after stopping the base load generator that data sets are equal.

In addition to the graphs/stats generated by basho_bench (used to generate the base load), we also need to capture CPU, Memory, IOPS load stats for the riak nodes.

Test Data

We need a realistic (large) amount of keys to simulate real-world performance of bulk ops like fullsyncs, and also to make sure we don't fir all data in in-memory caches/buffers.

Init a 5-node Riak-cluster with 1B keys (avg. 500b random data each) with riak's data-dir mounted on 5x600GB EBS volumes.
Snapshot the 5 EBS vols to S3.

Test run

Setup two 5-node test-clusters connected both ways by RT and FS. Clusters are in different regions: eu-west-1 and us-east-1. Also setup a test-node for each cluster for running the test and bench.
Create 2x5 600GB EBS provisioned IOPS (1000 IOPS/s) vols from the snapshot.
Mount the EBS vols.
Start basho_bench-istances for both clusters working all 5 nodes with the background load (~ 50% of max achievable with 2-way RT enabled).
Do more test stuff like start/stop/add/remove nodes and fullsyncs.
Create a test report with background load bench results and fullsync timings and machine metrics.

Out of scope for first run

Does realtime rebalance correctly when nodes are added/removed?
Master-Slave repl.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly