-
Notifications
You must be signed in to change notification settings - Fork 32
Repl Testing Plan
Attendees:Krab, Jon, Michal, Greg, Heather
- Loaded up grade may be a test that may be leveraged. Joe may have additional tests. Action: Jon get Joe to send additional tests he may have.
- Capture stats: Baked into BashoBench. – Action: Greg to investigate
- Jordan had a test to prepopulate data. - Kresten to get info from Jordan
- Dataset size – 1B objects, replication factor of 3, size ~1K.
- Spent last week figuring out ins and outs of rt_cloud, amazon, tools, etc.
- We can now run both mainstream (2.0) and specific branches of riak_ee/riak_test on Amazon
- Mikael (miklix) have been working on a test that includes a background load using basho_bench.
- Rune (rsltrifork) have been working on loading large datasets on S3 using basho's setup.
- We still need to figure out how to get stats out from the rt_cloud environment
- All in all, we believe we're now ready to do the "actual test".
The large scale test is to test:
- large scale repl (fullsync + realtime)
- real time balance working
- fullsync not halted by nodes up/down/add/remove
- realtime not halted by nodes up/down/add/remove
Output artifacts:
- performance graph #keys / #sync-time
The basic setup involves two clusters of 5 nodes, plus a test driver.
The test is run with a "base load" and a "test script".
For each test, there is a "base load" (that simulates a real time load), and then an operations script (the actual test).
For whatever hardware configuration, the base load should:
- Use ~50% of available Memory, CPU and IOPS
I.e., it's not fun to test something that does not excersise the sytem.
The test script involves runnign various operations given the base load:
In this scenario, the base load involves
- Taking writes in both clusters.
- Real time enabled in both direction
Now, the interesting tests involve
- Running fullsync periodically.
- ... while adding/removing, starting/stopping nodes.
- time of doing fullsync (as a function of the # keys)
- does fullsync have to be restarted when nodes are added/removed? hopefully not.
- validate after stopping the base load generator that data sets are equal.
In addition to the graphs/stats generated by basho_bench
(used to generate the base load), we also need to capture CPU, Memory, IOPS load stats for the riak nodes.
- Setup two 5-node clusters connected both ways by RT and FS. Clusters are in different regions: eu-west-1 and us-east-1. Also setup a test-node for each cluster for running the test and bench.
- Start basho_bench-isntances for both clusters working all 5 nodes with the background load (50%).
- Does realtime rebalance correctly when nodes are added/removed?
- Master-Slave repl.