07/05/2016: How I Got Apache Spark to Sort Of (Not Really) Work on my PicoCluster of 5 Raspberry PI
I've read several blog posts about people running Apache Spark on a Raspberry PI. It didn't seem too hard so I thought I've have a go at it. But the results were disappointing. Bear in mind that I am a Spark novice so some setting is probably. I ran into two issues - memory and heartbeats.
So, this what I did.
I based my work on these pages:
* https://darrenjw2.wordpress.com/2015/04/17/installing-apache-spark-on-a-raspberry-pi-2/
* https://darrenjw2.wordpress.com/2015/04/18/setting-up-a-standalone-apache-spark-cluster-of-raspberry-pi-2/
* http://www.openkb.info/2014/11/memory-settings-for-spark-standalone_27.html
I created five SD cards according to my previous blog post (see http://affy.blogspot.com/2016/06/how-did-i-prepare-my-picocluster-for.html).
Installation of Apache Spark
* install Oracle Java and Python
for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local sudo apt-get install -y oracle-java8-jdk python2.7 &); done
* download Spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz
* Copy Spark to all RPI
for i in `seq 1 5`; do (scp -q -oStrictHostKeyChecking=no -oCheckHostIP=no spark-1.6.2-bin-hadoop2.6.tgz pirate@pi0${i}.local:. && echo "Copy complete to pi0${i}" &); done
* Uncompress Spark
for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local tar xfz spark-1.6.2-bin-hadoop2.6.tgz && echo "Uncompress complete to pi0${i}" &); done
* Remove tgz file
for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local rm spark-1.6.2-bin-hadoop2.6.tgz); done
* Add the following to your .bashrc file on each RPI. I can't figure out how to put this into a loop.
export SPARK_LOCAL_IP="$(ip route get 1 | awk '{print $NF;exit}')"
* Run Standalone Spark Shell
ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi01.local
cd spark-1.6.2-bin-hadoop2.6
bin/run-example SparkPi 10
bin/spark-shell --master local[4]
# This takes several minutes to display a prompt.
# While the shell is running, visit http://pi01.local:4040/
scala> sc.textFile("README.md").count
# After the job is complete, visit the monitor page.
scala> exit
* Run PyShark Shell
bin/pyspark --master local[4]
>>> sc.textFile("README.md").count()
>>> exit()
CLUSTER
Now for the clustering...
* Enable password-less SSH between nodes
ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi01.local
for i in `seq 1 5`; do avahi-resolve --name pi0${i}.local -4 | awk ' { t = $1; $1 = $2; $2 = t; print; } ' | sudo tee --append /etc/hosts; done
echo "$(ip route get 1 | awk '{print $NF;exit}') $(hostname).local" | sudo tee --append /etc/hosts
ssh-keygen
for i in `seq 1 5`; do ssh-copy-id pirate@pi0${i}.local; done
* Configure Spark for Cluster
cd spark-1.6.2-bin-hadoop2.6/conf
create a slaves file with the following contents
pi01.local
pi02.local
pi03.local
pi04.local
pi05.local
cp spark-env.sh.template spark-env.sh
In spark-env.sh
Set SPARK_MASTER_IP the results of "ip route get 1 | awk '{print $NF;exit}'"
SPARK_WORKER_MEMORY=512m
* Copy the spark environment script to the other RPI
for i in `seq 2 5`; do scp spark-env.sh pirate@pi0${i}.local:spark-1.6.2-bin-hadoop2.6/conf/; done
* Start the cluster
cd ..
sbin/start-all.sh
* Visit the monitor page
http://192.168.1.8:8080
And everything is working so far! But ...
* Start a Spark Shell
bin/spark-shell --executor-memory 500m --driver-memory 500m --master spark://pi01.local:7077 --conf spark.executor.heartbeatInterval=45s
And this fails...
So, this what I did.
I based my work on these pages:
* https://darrenjw2.wordpress.com/2015/04/17/installing-apache-spark-on-a-raspberry-pi-2/
* https://darrenjw2.wordpress.com/2015/04/18/setting-up-a-standalone-apache-spark-cluster-of-raspberry-pi-2/
* http://www.openkb.info/2014/11/memory-settings-for-spark-standalone_27.html
I created five SD cards according to my previous blog post (see http://affy.blogspot.com/2016/06/how-did-i-prepare-my-picocluster-for.html).
Installation of Apache Spark
* install Oracle Java and Python
for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local sudo apt-get install -y oracle-java8-jdk python2.7 &); done
* download Spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz
* Copy Spark to all RPI
for i in `seq 1 5`; do (scp -q -oStrictHostKeyChecking=no -oCheckHostIP=no spark-1.6.2-bin-hadoop2.6.tgz pirate@pi0${i}.local:. && echo "Copy complete to pi0${i}" &); done
* Uncompress Spark
for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local tar xfz spark-1.6.2-bin-hadoop2.6.tgz && echo "Uncompress complete to pi0${i}" &); done
* Remove tgz file
for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local rm spark-1.6.2-bin-hadoop2.6.tgz); done
* Add the following to your .bashrc file on each RPI. I can't figure out how to put this into a loop.
export SPARK_LOCAL_IP="$(ip route get 1 | awk '{print $NF;exit}')"
* Run Standalone Spark Shell
ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi01.local
cd spark-1.6.2-bin-hadoop2.6
bin/run-example SparkPi 10
bin/spark-shell --master local[4]
# This takes several minutes to display a prompt.
# While the shell is running, visit http://pi01.local:4040/
scala> sc.textFile("README.md").count
# After the job is complete, visit the monitor page.
scala> exit
* Run PyShark Shell
bin/pyspark --master local[4]
>>> sc.textFile("README.md").count()
>>> exit()
CLUSTER
Now for the clustering...
* Enable password-less SSH between nodes
ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi01.local
for i in `seq 1 5`; do avahi-resolve --name pi0${i}.local -4 | awk ' { t = $1; $1 = $2; $2 = t; print; } ' | sudo tee --append /etc/hosts; done
echo "$(ip route get 1 | awk '{print $NF;exit}') $(hostname).local" | sudo tee --append /etc/hosts
ssh-keygen
for i in `seq 1 5`; do ssh-copy-id pirate@pi0${i}.local; done
* Configure Spark for Cluster
cd spark-1.6.2-bin-hadoop2.6/conf
create a slaves file with the following contents
pi01.local
pi02.local
pi03.local
pi04.local
pi05.local
cp spark-env.sh.template spark-env.sh
In spark-env.sh
Set SPARK_MASTER_IP the results of "ip route get 1 | awk '{print $NF;exit}'"
SPARK_WORKER_MEMORY=512m
* Copy the spark environment script to the other RPI
for i in `seq 2 5`; do scp spark-env.sh pirate@pi0${i}.local:spark-1.6.2-bin-hadoop2.6/conf/; done
* Start the cluster
cd ..
sbin/start-all.sh
* Visit the monitor page
http://192.168.1.8:8080
And everything is working so far! But ...
* Start a Spark Shell
bin/spark-shell --executor-memory 500m --driver-memory 500m --master spark://pi01.local:7077 --conf spark.executor.heartbeatInterval=45s
And this fails...
06/25/2016: How I got Docker Swarm to Run on a Raspberry PI PicoCluster with Consul
At the end of this article, I have a working Docker Swarm running on a five-node PicoCluster. Please flash your SD cards according to http://affy.blogspot.com/2016/06/how-did-i-prepare-my-picocluster-for.html. Stop following that article after copying the SSH ids to the RPI.
I am controlling the PicoCluster using my laptop. Therefore, my laptop is the HOST in the steps below.
There is no guarantee this commands are correct. They just seem to work for me. And please don't ever, ever depend on this information for anything non-prototype without doing your own research.
* On the HOST, create the Docker Machine to hold the consul service.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--generic-ip-address=$(getip pi01.local) \
--generic-ssh-user "pirate" \
consul-machine
* Connect to the consul-machine Docker Machine
eval $(docker-machine env consul-machine)
* Start Consul.
docker run \
-d \
-p 8500:8500 \
hypriot/rpi-consul \
agent -dev -client 0.0.0.0
* Reset docker environment to talk with host docker.
unset DOCKER_TLS_VERIFY DOCKER_HOST DOCKER_CERT_PATH DOCKER_MACHINE_NAME
* Visit the consul dashboard to provide it is working and accessible.
firefox http://$(getip pi01.local):8500
* Create the swarm-master machine. Note that eth0 is being used instead of eth1.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-master \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi02.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-master
* Create the first slave node.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi03.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave01
* List nodes in the swarm. I don't know why, but this command must be run from one of the RPI. Otherwise, I see a "malformed HTTP response" message.
eval $(docker-machine env swarm-master)
docker -H $(docker-machine ip swarm-master):3376 run \
--rm \
hypriot/rpi-swarm:latest \
list consul://$(docker-machine ip consul-machine):8500
* Create the second slave node.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi04.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave02
* Create the first third node.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi05.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave03
* Check that docker machine sees all of the nodes
$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
consul-machine - generic Running tcp://192.168.1.8:2376 v1.11.1
swarm-master - generic Running tcp://192.168.1.7:2376 swarm-master (master) v1.11.1
swarm-slave01 - generic Running tcp://192.168.1.2:2376 swarm-master v1.11.1
swarm-slave02 - generic Running tcp://192.168.1.5:2376 swarm-master v1.11.1
swarm-slave03 - generic Running tcp://192.168.1.4:2376 swarm-master v1.11.1
* List the swarm nodes in Firefox using Consul.
firefox http://$(docker-machine ip consul-machine):8500/ui/#/dc1/kv/docker/swarm/nodes/
* Is my cluster working? First, switch to the swarm-master environment. Then view it's information. You should see the slaves listed. Next run the hello-world container. And finally, list the containers.
eval $(docker-machine env swarm-master)
docker -H $(docker-machine ip swarm-master):3376 info
docker -H $(docker-machine ip swarm-master):3376 run hypriot/armhf-hello-world
docker -H $(docker-machine ip swarm-master):3376 ps -a
I am controlling the PicoCluster using my laptop. Therefore, my laptop is the HOST in the steps below.
There is no guarantee this commands are correct. They just seem to work for me. And please don't ever, ever depend on this information for anything non-prototype without doing your own research.
* On the HOST, create the Docker Machine to hold the consul service.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--generic-ip-address=$(getip pi01.local) \
--generic-ssh-user "pirate" \
consul-machine
* Connect to the consul-machine Docker Machine
eval $(docker-machine env consul-machine)
* Start Consul.
docker run \
-d \
-p 8500:8500 \
hypriot/rpi-consul \
agent -dev -client 0.0.0.0
* Reset docker environment to talk with host docker.
unset DOCKER_TLS_VERIFY DOCKER_HOST DOCKER_CERT_PATH DOCKER_MACHINE_NAME
* Visit the consul dashboard to provide it is working and accessible.
firefox http://$(getip pi01.local):8500
* Create the swarm-master machine. Note that eth0 is being used instead of eth1.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-master \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi02.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-master
* Create the first slave node.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi03.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave01
* List nodes in the swarm. I don't know why, but this command must be run from one of the RPI. Otherwise, I see a "malformed HTTP response" message.
eval $(docker-machine env swarm-master)
docker -H $(docker-machine ip swarm-master):3376 run \
--rm \
hypriot/rpi-swarm:latest \
list consul://$(docker-machine ip consul-machine):8500
* Create the second slave node.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi04.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave02
* Create the first third node.
docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi05.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave03
* Check that docker machine sees all of the nodes
$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
consul-machine - generic Running tcp://192.168.1.8:2376 v1.11.1
swarm-master - generic Running tcp://192.168.1.7:2376 swarm-master (master) v1.11.1
swarm-slave01 - generic Running tcp://192.168.1.2:2376 swarm-master v1.11.1
swarm-slave02 - generic Running tcp://192.168.1.5:2376 swarm-master v1.11.1
swarm-slave03 - generic Running tcp://192.168.1.4:2376 swarm-master v1.11.1
* List the swarm nodes in Firefox using Consul.
firefox http://$(docker-machine ip consul-machine):8500/ui/#/dc1/kv/docker/swarm/nodes/
* Is my cluster working? First, switch to the swarm-master environment. Then view it's information. You should see the slaves listed. Next run the hello-world container. And finally, list the containers.
eval $(docker-machine env swarm-master)
docker -H $(docker-machine ip swarm-master):3376 info
docker -H $(docker-machine ip swarm-master):3376 run hypriot/armhf-hello-world
docker -H $(docker-machine ip swarm-master):3376 ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
456fa23b8c52 hypriot/armhf-hello-world "/hello" 8 seconds ago Exited (0) 5 seconds ago swarm-slave01/nauseous_swartz
e1eb8a790e3f hypriot/rpi-swarm:latest "/swarm join --advert" 3 hours ago Up 3 hours 2375/tcp swarm-slave03/swarm-agent
122b89a2ae5d hypriot/rpi-swarm:latest "/swarm join --advert" 3 hours ago Up 3 hours 2375/tcp swarm-slave02/swarm-agent
449aa7087ecc hypriot/rpi-swarm:latest "/swarm join --advert" 3 hours ago Up 3 hours 2375/tcp swarm-slave01/swarm-agent
6355f31de952 hypriot/rpi-swarm:latest "/swarm join --advert" 3 hours ago Up 3 hours 2375/tcp swarm-master/swarm-agent
05ee666e8662 hypriot/rpi-swarm:latest "/swarm manage --tlsv" 3 hours ago Up 3 hours 2375/tcp, 192.168.1.7:3376->3376/tcp swarm-master/swarm-agent-master
Jump up and down when you see that the hello-world container was run from swarm-master but run on swarm-slave01!