Setting-up Spark cluster locally
In this page we describe how to setup your local Spark cluster to develop java applications and run them against it.
Installing Spark on your local machine
Download latest stable Spark release from
https://spark.apache.org/ . Choose spark pre-built for hadoon 2.4 release.
Copy the tar-gz file into some directory and unpack it:
tar xvzf spark-1.1.1-bin-hadoop2.4.tgz
This will create a directory with the same name of the file. Navigate to the sbin directory:
cd spark-1.1.1-bin-hadoop2.4/conf
cp spark-env.sh.template spark-env.sh
And edit the spark-env.sh file properties:
export SPARK_EXECUTOR_CORES = 2 #, Number of cores for the workers (Default: 1).
export SPARK_EXECUTOR_MEMORY = 2000M #, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
Put the values that suit your machine : - Executor cores, you can take the humber of cores of the machines -2, for example, if it has 8 cores, you can use 5 or 6) - Executor memory, take about 70% of the memory of the machine.
Now navigate to sbin directory:
cd ../sbin
./start-master.sh
This should start Spark on the master node (the local one). You can check that it is working by opening an Internet browser and go to:
http://localhost:8080/
This should open the Spark monitoring pag, with all tables empty. Navigate to:
cd ../bin/
./spark-shell
And you'll be opening the interactive spark shell.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/12/04 16:13:53 INFO SecurityManager: Changing view acls to: fjulbe
14/12/04 16:13:53 INFO SecurityManager: Changing modify acls to: fjulbe
14/12/04 16:13:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(fjulbe); users with modify permissions: Set(fjulbe)
14/12/04 16:13:53 INFO HttpServer: Starting HTTP Server
14/12/04 16:13:53 INFO Utils: Successfully started service 'HTTP class server' on port 34379.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.1
/_/
...
scala>
You can try several simple operations to see how it works:
scala> val k = (1 to 10)
k: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala>
To close the shell just call exit
scala> exit
Navigate back to the sbin directory, stop the master and start in cluster mode:
cd ../sbin
./stop-master.sh
./start-all.sh
This will start the Spark cluster. As we only have one node, it will start a cluster with the same node as before. The nodes of the cluster can be defined in /conf/slaves, where currently there is only one node, the 'localhost'.
After starting the node, go to the web browser and go to the url localhost:8080. This will again open the Spark monitor, but there shoudl be one worker in the workers list, the one where spark in running.
Go back to the bin directory and start the shell again but with the option --master and the URL of the master of your spark cluster (this URL can be seen at the top of the local spark web page at localhost:8080):
cd ../bin
./spark-shell --master spark://<hostname>:7077
In the Spark web page, you should see now an aplication running in the Running Applications table, one called "Spark Shell".
You can close the shell now. Spark is configured locally.
Now, when running your applications, you must run them using the --master option set to spark://<localhost>:7077 or directly from inside your code . Scala example:
val conf = new SparkConf()
.setMaster("spark://<localhost>:7077")
.setAppName("Test App")
.set("spark.executor.memory", "1g")
val sc = new SparkContext(conf)
or its Java equivalent:
SparkConf conf = new SparkConf()
.setMaster("spark://<localhost>:7077")
.setAppName("Test App")
.set("spark.executor.memory", "1g")
SparkContext sc = new SparkContext(conf)
--
Cesc Julbe - 2014-12-04
Comments