This repository is a collection of Spark examples & use-case implementations for various components of the Spark eco-system including Spark-Core, Spark-Streaming, Spark-SQL, Spark-MLLib.
- Spark core examples
- Spark streaming examples
- Spark core use-cases
- Spark streaming use-cases
- LogAnalytics A simple spark streaming use-case to perform apache log analysis which could read data from Kafka & Kinesis performs some analysis and persists the result's to cassandra.
- Testing
Simplest way is to clone the repository:
git clone https://github.com/cloudwicklabs/spark_codebase.git
To run any of these examples or use-cases you have to package them using a uber-jar (most of the examples depend of external dependencies, hence have to be packaged as a assembly jar).
From the project's home directory
sbt assembly
spark-submit
is the simplest way to submit a spark application to the cluster and supports all the cluster manager's like stand-alone, yarn and mesos.
Each of the main class has documentation on how to run it.