Skip to content

Commit 2752c2c

Browse files
committed
update README
1 parent 0ab7781 commit 2752c2c

File tree

1 file changed

+17
-11
lines changed

1 file changed

+17
-11
lines changed

README.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@
77
This sbt plugin provides customizable sbt tasks to fire Spark jobs against local or remote Spark clusters.
88
It allows you submit Spark applications without leaving your favorite development environment.
99
The reactive nature of sbt makes it possible to integrate this with your Spark clusters whether it is a standalone
10-
cluster, YARN cluster, clusters run on EC2 and etc.
10+
cluster, [YARN cluster](examples/sbt-assembly-on-yarn), [clusters run on EC2](examples/sbt-assembly-on-ec2) and etc.
1111

1212
## Setup
1313

1414
For sbt 0.13.6+ add sbt-spark-submit to your `project/plugins.sbt` or `~/.sbt/0.13/plugins/plugins.sbt` file:
1515

1616
```scala
17-
addSbtPlugin("com.github.saurfang" % "sbt-spark-submit" % "0.0.3")
17+
addSbtPlugin("com.github.saurfang" % "sbt-spark-submit" % "0.0.4")
1818
```
1919

2020
Naturally you will need to have spark dependency in your project itself such as:
@@ -75,8 +75,9 @@ Below we go into details about various keys that controls the default behavior o
7575
More advanced techniques include but not limited to:
7676

7777
1. Use one-jar plugins such as `sbt-assembly` to create a fat jar for deployment.
78-
2. While YARN would automatically upload the application jar, it doesn't seem to be the case for Spark Standalone
79-
cluster. So you might inject a JAR uploading process inside this key and returns the uploaded JAR instead.
78+
2. While YARN automatically uploads the application jar, it doesn't seem to be the case for Spark Standalone
79+
cluster. So you can inject a JAR uploading process inside this key and returns the uploaded JAR instead. See
80+
[sbt-assembly-on-ec2](examples/sbt-assembly-on-ec2) for an example.
8081

8182
### Spark and Application Arguments
8283
`sparkSubmitSparkArgs` and `sparkSubmitAppArgs` represents the arguments for Spark and Application respectively.
@@ -91,30 +92,33 @@ More interesting ones may be:
9192

9293
1. If there is `--help` in `appArgs` you will want to run as `local` to see the usage information immediately.
9394
2. For YARN deployment, `yarn-cluster` is appropriate especially if you are submitting to a remote cluster from IDE.
94-
3. For EC2 deployment, you can use `spark-ec2` script to figure out the correct address of Spark master.
95+
3. For EC2 deployment, you can use `spark-ec2` script to figure out the correct address of Spark master. See
96+
[sbt-assembly-on-ec2](examples/sbt-assembly-on-ec2) for an example.
9597

9698
### Default Properties File
9799
`sparkSubmitPropertiesFile` specifies the default properties file to use if `--properties-file` is not already supplied.
98100

99101
This can be especially useful for YARN deployment by pointing the Spark assembly to a JAR on HDFS via `spark.yarn.jar`
100-
property so as to avoid the overhead of uploading Spark assembly jar everytime application is submitted.
102+
property so as to avoid the overhead of uploading Spark assembly jar every time application is submitted. See
103+
[sbt-assembly-on-ec2](examples/sbt-assembly-on-yarn) for an example.
101104

102105
Other interesting settings include driver/executor memory/cores, RDD compression/serialization and etc.
103106

104107
### Classpath
105108
`sparkSubmitClassPath` sets the classpath to use for Spark application deployment. Currently this is only relevant for
106109
YARN deployment as I couldn't get `yarn-site.xml` correctly picked up even when `HADOOP_CONF_DIR` is properly set.
107-
In this case, you need to add:
110+
In this case, you can add:
108111
```scala
109112
sparkSubmitClasspath := {
110113
new File(sys.env.getOrElse("HADOOP_CONF_DIR", "")) +:
111114
data((fullClasspath in Compile).value)
112115
}
113116
```
117+
Note: This is already automatically injected once you `enablePlugins(SparkSubmitYARN)`
114118

115119
### SparkSubmit inputKey
116120
`sparkSubmit` is a generic `inputKey` and we will show you how to define additional tasks that have
117-
different default behavior in terms of parameters. However as for the inputKey itself, it parses
121+
different default behavior in terms of parameters. As for the inputKey itself, it parses
118122
space delimited arguments. If `--` is present, the former part gets appended to `sparkSubmitSparkArgs` and
119123
the latter part gets appended to `sparkSubmitAppArgs`. If `--` is missing, then all arguments are assumed
120124
to be application arguments.
@@ -142,8 +146,10 @@ object SparkSubmit {
142146
)
143147
}
144148
```
149+
145150
Here we created a single `SparkSubmitSetting` object and fuses it with additional settings.
146151

152+
147153
To create multiple tasks, you can wrap them with `SparkSubmitSetting` again like this:
148154
```scala
149155
lazy val settings = SparkSubmitSetting(
@@ -185,7 +191,7 @@ There is already an implicit conversion from `SparkSubmitSetting` to `Seq[Def.Se
185191
append itself to your project. When there are multiple settings, the third variant allows you to aggregate all
186192
of them without additional type hinting for implicit to work.
187193

188-
See `src/sbt-test/sbt-spark-submit/multi-main` for examples.
194+
See [`src/sbt-test/sbt-spark-submit/multi-main`](src/sbt-test/sbt-spark-submit/multi-main) for examples.
189195

190196
## Multi-project builds
191197

@@ -201,8 +207,8 @@ select any specific project.
201207

202208
Of course, `sparkB` task won't even trigger a build on `A` unless `B` depends on `A` thanks to the magic of sbt.
203209

204-
See `src/sbt-test/sbt-spark-submit/multi-project` for examples.
210+
See [`src/sbt-test/sbt-spark-submit/multi-project`](src/sbt-test/sbt-spark-submit/multi-project) for examples.
205211

206212
## Resources
207213

208-
For more information and working examples, see projects under `examples` and `src/sbt-test`.
214+
For more information and working examples, see projects under [`examples`](examples) and [`src/sbt-test`](src/sbt-test).

0 commit comments

Comments
 (0)