If you are familiar to sbt console, a convenient Scala REPL, and you are about to develop Spark using spark-shell, you don’t need to install spark-shell.
In fact, you don’t need to install even Spark!
Include Spark in build.sbt
Instead of download and install Spark, you can use spark by adding the following lines in your build.sbt
.
1 | libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1", |
Note that the last revision number, 2.1.1, is the version of Spark. You can check available spark modules at Maven repository.
- https://mvnrepository.com/artifact/org.apache.spark
- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11
Generate spark
and sc
in sbt console
spark-shell
provides spark
(Spark 2.x) and sc
object as the entry point of the Spark API. We can generate the API objects in sbt console as follows.
1 | import org.apache.spark.sql.SparkSession |
If you enter the above code in sbt console
, you would get the Spark start messages. Note that the start message introduces the URI of SparkUI like as follows.
1 | 17/06/07 04:18:48 INFO Utils: Successfully started service 'SparkUI' on port 4040. |
Automate the creation of Spark API
Instead of entering the above code every time when Scala REPL starts, you can automate the Spark API object creation by initialCommands
keyword in build.sbt
.
By adding the following code in build.sbt
, you can access spark
and sc
API object when sbt console
starts.
1 | initialCommands in console := """ |
Since the creation of spark
object implies start of Spark system, it is needed to send shutdown signal to Spark when console is closed.
By adding the following code in build.sbt
,
Spark get exit signal of sbt console
and shutdown gracefully.
1 | cleanupCommands in console := "spark.stop()" |
Get Things Together: Spark SBT template
(Update: 2016-06-13)
Get the above comments altogether, I have created a SBT project template form spark.
https://github.com/sanori/spark-sbt
Hope this helps your work.