7
7
This sbt plugin provides customizable sbt tasks to fire Spark jobs against local or remote Spark clusters.
8
8
It allows you submit Spark applications without leaving your favorite development environment.
9
9
The reactive nature of sbt makes it possible to integrate this with your Spark clusters whether it is a standalone
10
- cluster, YARN cluster, clusters run on EC2 and etc.
10
+ cluster, [ YARN cluster] ( examples/sbt-assembly-on-yarn ) , [ clusters run on EC2] ( examples/sbt-assembly-on-ec2 ) and etc.
11
11
12
12
## Setup
13
13
14
14
For sbt 0.13.6+ add sbt-spark-submit to your ` project/plugins.sbt ` or ` ~/.sbt/0.13/plugins/plugins.sbt ` file:
15
15
16
16
``` scala
17
- addSbtPlugin(" com.github.saurfang" % " sbt-spark-submit" % " 0.0.3 " )
17
+ addSbtPlugin(" com.github.saurfang" % " sbt-spark-submit" % " 0.0.4 " )
18
18
```
19
19
20
20
Naturally you will need to have spark dependency in your project itself such as:
@@ -75,8 +75,9 @@ Below we go into details about various keys that controls the default behavior o
75
75
More advanced techniques include but not limited to:
76
76
77
77
1 . Use one-jar plugins such as ` sbt-assembly ` to create a fat jar for deployment.
78
- 2 . While YARN would automatically upload the application jar, it doesn't seem to be the case for Spark Standalone
79
- cluster. So you might inject a JAR uploading process inside this key and returns the uploaded JAR instead.
78
+ 2 . While YARN automatically uploads the application jar, it doesn't seem to be the case for Spark Standalone
79
+ cluster. So you can inject a JAR uploading process inside this key and returns the uploaded JAR instead. See
80
+ [ sbt-assembly-on-ec2] ( examples/sbt-assembly-on-ec2 ) for an example.
80
81
81
82
### Spark and Application Arguments
82
83
` sparkSubmitSparkArgs ` and ` sparkSubmitAppArgs ` represents the arguments for Spark and Application respectively.
@@ -91,30 +92,33 @@ More interesting ones may be:
91
92
92
93
1 . If there is ` --help ` in ` appArgs ` you will want to run as ` local ` to see the usage information immediately.
93
94
2 . For YARN deployment, ` yarn-cluster ` is appropriate especially if you are submitting to a remote cluster from IDE.
94
- 3 . For EC2 deployment, you can use ` spark-ec2 ` script to figure out the correct address of Spark master.
95
+ 3 . For EC2 deployment, you can use ` spark-ec2 ` script to figure out the correct address of Spark master. See
96
+ [ sbt-assembly-on-ec2] ( examples/sbt-assembly-on-ec2 ) for an example.
95
97
96
98
### Default Properties File
97
99
` sparkSubmitPropertiesFile ` specifies the default properties file to use if ` --properties-file ` is not already supplied.
98
100
99
101
This can be especially useful for YARN deployment by pointing the Spark assembly to a JAR on HDFS via ` spark.yarn.jar `
100
- property so as to avoid the overhead of uploading Spark assembly jar everytime application is submitted.
102
+ property so as to avoid the overhead of uploading Spark assembly jar every time application is submitted. See
103
+ [ sbt-assembly-on-ec2] ( examples/sbt-assembly-on-yarn ) for an example.
101
104
102
105
Other interesting settings include driver/executor memory/cores, RDD compression/serialization and etc.
103
106
104
107
### Classpath
105
108
` sparkSubmitClassPath ` sets the classpath to use for Spark application deployment. Currently this is only relevant for
106
109
YARN deployment as I couldn't get ` yarn-site.xml ` correctly picked up even when ` HADOOP_CONF_DIR ` is properly set.
107
- In this case, you need to add:
110
+ In this case, you can add:
108
111
``` scala
109
112
sparkSubmitClasspath := {
110
113
new File (sys.env.getOrElse(" HADOOP_CONF_DIR" , " " )) +:
111
114
data((fullClasspath in Compile ).value)
112
115
}
113
116
```
117
+ Note: This is already automatically injected once you ` enablePlugins(SparkSubmitYARN) `
114
118
115
119
### SparkSubmit inputKey
116
120
` sparkSubmit ` is a generic ` inputKey ` and we will show you how to define additional tasks that have
117
- different default behavior in terms of parameters. However as for the inputKey itself, it parses
121
+ different default behavior in terms of parameters. As for the inputKey itself, it parses
118
122
space delimited arguments. If ` -- ` is present, the former part gets appended to ` sparkSubmitSparkArgs ` and
119
123
the latter part gets appended to ` sparkSubmitAppArgs ` . If ` -- ` is missing, then all arguments are assumed
120
124
to be application arguments.
@@ -142,8 +146,10 @@ object SparkSubmit {
142
146
)
143
147
}
144
148
```
149
+
145
150
Here we created a single ` SparkSubmitSetting ` object and fuses it with additional settings.
146
151
152
+
147
153
To create multiple tasks, you can wrap them with ` SparkSubmitSetting ` again like this:
148
154
``` scala
149
155
lazy val settings = SparkSubmitSetting (
@@ -185,7 +191,7 @@ There is already an implicit conversion from `SparkSubmitSetting` to `Seq[Def.Se
185
191
append itself to your project. When there are multiple settings, the third variant allows you to aggregate all
186
192
of them without additional type hinting for implicit to work.
187
193
188
- See ` src/sbt-test/sbt-spark-submit/multi-main ` for examples.
194
+ See [ ` src/sbt-test/sbt-spark-submit/multi-main ` ] ( src/sbt-test/sbt-spark-submit/multi-main ) for examples.
189
195
190
196
## Multi-project builds
191
197
@@ -201,8 +207,8 @@ select any specific project.
201
207
202
208
Of course, ` sparkB ` task won't even trigger a build on ` A ` unless ` B ` depends on ` A ` thanks to the magic of sbt.
203
209
204
- See ` src/sbt-test/sbt-spark-submit/multi-project ` for examples.
210
+ See [ ` src/sbt-test/sbt-spark-submit/multi-project ` ] ( src/sbt-test/sbt-spark-submit/multi-project ) for examples.
205
211
206
212
## Resources
207
213
208
- For more information and working examples, see projects under ` examples ` and ` src/sbt-test ` .
214
+ For more information and working examples, see projects under [ ` examples ` ] ( examples ) and [ ` src/sbt-test ` ] ( src/sbt-test ) .
0 commit comments