Using sbt instead of spark-shell

If you are familiar to sbt console, a convenient Scala REPL, and you are about to develop Spark using spark-shell, you don’t need to install spark-shell.

In fact, you don’t need to install even Spark!

Include Spark in build.sbt

Instead of download and install Spark, you can use spark by adding the following lines in your build.sbt.

1
2
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1",
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.1",

Note that the last revision number, 2.1.1, is the version of Spark. You can check available spark modules at Maven repository.

Generate spark and sc in sbt console

spark-shell provides spark (Spark 2.x) and sc object as the entry point of the Spark API. We can generate the API objects in sbt console as follows.

1
2
3
4
5
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
val spark = SparkSession.builder().master("local").appName("spark-shell").getOrCreate()
import spark.implicits._
val sc = spark.sparkContext

If you enter the above code in sbt console, you would get the Spark start messages. Note that the start message introduces the URI of SparkUI like as follows.

1
2
17/06/07 04:18:48 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/06/07 04:18:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.2:4040

Automate the creation of Spark API

Instead of entering the above code every time when Scala REPL starts, you can automate the Spark API object creation by initialCommands keyword in build.sbt.

By adding the following code in build.sbt, you can access spark and sc API object when sbt console starts.

1
2
3
4
5
6
7
8
9
10
initialCommands in console := """
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
val spark = SparkSession.builder()
.master("local")
.appName("spark-shell")
.getOrCreate()
import spark.implicits._
val sc = spark.sparkContext
"""

Since the creation of spark object implies start of Spark system, it is needed to send shutdown signal to Spark when console is closed.

By adding the following code in build.sbt, Spark get exit signal of sbt console and shutdown gracefully.

1
cleanupCommands in console := "spark.stop()"

Get Things Together: Spark SBT template

(Update: 2016-06-13)

Get the above comments altogether, I have created a SBT project template form spark.

https://github.com/sanori/spark-sbt

Hope this helps your work.