This is a framework for repeatedly running a suite of performance tests for the Spark cluster computing framework.
- Start a spark cluster for the tests using the Spark EC2 scripts
- SSH into the Spark master and git clone spark-perf
- cd spark-perf
- copy config/config.py.template to config/config.py and modify as necessary. Specifically, you must set COMMIT_ID.
- execute bin/run
The default configuration settings aim to make it easy to run on Amazon using the Spark EC2 scripts. To run in another environment, customize config.py. For example, when developing and testing this framework, we recommend running a master and slave daemon on your development machine (the test framework will start and stop this cluster for you with the correct config settings). This exercises production code paths and avoids the need for extra code to support testing locally. See DEVELOPER-NOTES.txt for a list of the variables you probably want to update and possible suggestions for values to use.
The script requires Python 2.7. For earlier versions of Python, argparse might need to be installed, which can be done using easy_install argparse.
Questions or comments, contact @pwendell or @andyk.
This testing framework started as a port + heavy modifiation of a predecessor Spark performance testing framework written by Denny Britz called spark-perf.