2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2020

07/05/2016: How I Got Apache Spark to Sort Of (Not Really) Work on my PicoCluster of 5 Raspberry PI

I've read several blog posts about people running Apache Spark on a Raspberry PI. It didn't seem too hard so I thought I've have a go at it. But the results were disappointing. Bear in mind that I am a Spark novice so some setting is probably. I ran into two issues - memory and heartbeats.

So, this what I did.

I based my work on these pages:

* https://darrenjw2.wordpress.com/2015/04/17/installing-apache-spark-on-a-raspberry-pi-2/
* https://darrenjw2.wordpress.com/2015/04/18/setting-up-a-standalone-apache-spark-cluster-of-raspberry-pi-2/
* http://www.openkb.info/2014/11/memory-settings-for-spark-standalone_27.html

I created five SD cards according to my previous blog post (see http://affy.blogspot.com/2016/06/how-did-i-prepare-my-picocluster-for.html).

Installation of Apache Spark

* install Oracle Java and Python

for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local sudo apt-get install -y oracle-java8-jdk python2.7 &); done

* download Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz

* Copy Spark to all RPI

for i in `seq 1 5`; do (scp -q -oStrictHostKeyChecking=no -oCheckHostIP=no spark-1.6.2-bin-hadoop2.6.tgz pirate@pi0${i}.local:. && echo "Copy complete to pi0${i}" &); done

* Uncompress Spark

for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local tar xfz spark-1.6.2-bin-hadoop2.6.tgz && echo "Uncompress complete to pi0${i}" &); done

* Remove tgz file

for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local rm spark-1.6.2-bin-hadoop2.6.tgz); done

* Add the following to your .bashrc file on each RPI. I can't figure out how to put this into a loop.

export SPARK_LOCAL_IP="$(ip route get 1 | awk '{print $NF;exit}')"

* Run Standalone Spark Shell

ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi01.local
cd spark-1.6.2-bin-hadoop2.6
bin/run-example SparkPi 10
bin/spark-shell --master local[4]
# This takes several minutes to display a prompt.
# While the shell is running, visit http://pi01.local:4040/
scala> sc.textFile("README.md").count
# After the job is complete, visit the monitor page.
scala> exit

* Run PyShark Shell

bin/pyspark --master local[4]
>>> sc.textFile("README.md").count()
>>> exit()

CLUSTER

Now for the clustering...

* Enable password-less SSH between nodes

ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi01.local
for i in `seq 1 5`; do avahi-resolve --name pi0${i}.local -4 | awk ' { t = $1; $1 = $2; $2 = t; print; } ' | sudo tee --append /etc/hosts; done
echo "$(ip route get 1 | awk '{print $NF;exit}') $(hostname).local" | sudo tee --append /etc/hosts
ssh-keygen
for i in `seq 1 5`; do ssh-copy-id pirate@pi0${i}.local; done

* Configure Spark for Cluster

cd spark-1.6.2-bin-hadoop2.6/conf

create a slaves file with the following contents
pi01.local
pi02.local
pi03.local
pi04.local
pi05.local

cp spark-env.sh.template spark-env.sh
In spark-env.sh
  Set SPARK_MASTER_IP the results of "ip route get 1 | awk '{print $NF;exit}'"
  SPARK_WORKER_MEMORY=512m

* Copy the spark environment script to the other RPI

for i in `seq 2 5`; do scp spark-env.sh pirate@pi0${i}.local:spark-1.6.2-bin-hadoop2.6/conf/; done

* Start the cluster

cd ..
sbin/start-all.sh

* Visit the monitor page

http://192.168.1.8:8080

And everything is working so far! But ...

* Start a Spark Shell

bin/spark-shell --executor-memory 500m --driver-memory 500m --master spark://pi01.local:7077 --conf spark.executor.heartbeatInterval=45s 

And this fails...



06/25/2016: How I got Docker Swarm to Run on a Raspberry PI PicoCluster with Consul

At the end of this article, I have a working Docker Swarm running on a five-node PicoCluster. Please flash your SD cards according to http://affy.blogspot.com/2016/06/how-did-i-prepare-my-picocluster-for.html. Stop following that article after copying the SSH ids to the RPI.

I am controlling the PicoCluster using my laptop. Therefore, my laptop is the HOST in the steps below.

There is no guarantee this commands are correct. They just seem to work for me. And please don't ever, ever depend on this information for anything non-prototype without doing your own research.

* On the HOST, create the Docker Machine to hold the consul service.

docker-machine create \
  -d generic \
  --engine-storage-driver=overlay \
  --generic-ip-address=$(getip pi01.local) \
  --generic-ssh-user "pirate" \
  consul-machine

* Connect to the consul-machine Docker Machine

eval $(docker-machine env consul-machine)

* Start Consul.

docker run \
  -d \
  -p 8500:8500 \
  hypriot/rpi-consul \
  agent -dev -client 0.0.0.0

* Reset docker environment to talk with host docker.

unset DOCKER_TLS_VERIFY DOCKER_HOST DOCKER_CERT_PATH DOCKER_MACHINE_NAME

* Visit the consul dashboard to provide it is working and accessible.

firefox http://$(getip pi01.local):8500

* Create the swarm-master machine. Note that eth0 is being used instead of eth1.

docker-machine create \
  -d generic \
  --engine-storage-driver=overlay \
  --swarm \
  --swarm-master \
  --swarm-image hypriot/rpi-swarm:latest \
  --swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
  --generic-ip-address=$(getip pi02.local) \
  --generic-ssh-user "pirate" \
  --engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
  --engine-opt="cluster-advertise=eth0:2376" \
  swarm-master

* Create the first slave node.

docker-machine create \
  -d generic \
  --engine-storage-driver=overlay \
  --swarm \
  --swarm-image hypriot/rpi-swarm:latest \
  --swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
  --generic-ip-address=$(getip pi03.local) \
  --generic-ssh-user "pirate" \
  --engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
  --engine-opt="cluster-advertise=eth0:2376" \
  swarm-slave01

* List nodes in the swarm. I don't know why, but this command must be run from one of the RPI. Otherwise, I see a "malformed HTTP response" message.

eval $(docker-machine env swarm-master)

docker -H $(docker-machine ip swarm-master):3376 run \
  --rm \
  hypriot/rpi-swarm:latest \
  list consul://$(docker-machine ip consul-machine):8500

* Create the second slave node.

docker-machine create \
  -d generic \
  --engine-storage-driver=overlay \
  --swarm \
  --swarm-image hypriot/rpi-swarm:latest \
  --swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
  --generic-ip-address=$(getip pi04.local) \
  --generic-ssh-user "pirate" \
  --engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
  --engine-opt="cluster-advertise=eth0:2376" \
  swarm-slave02

* Create the first third node.

docker-machine create \
  -d generic \
  --engine-storage-driver=overlay \
  --swarm \
  --swarm-image hypriot/rpi-swarm:latest \
  --swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
  --generic-ip-address=$(getip pi05.local) \
  --generic-ssh-user "pirate" \
  --engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
  --engine-opt="cluster-advertise=eth0:2376" \
  swarm-slave03

* Check that docker machine sees all of the nodes

$ docker-machine ls
NAME             ACTIVE   DRIVER    STATE     URL                      SWARM                   DOCKER    ERRORS
consul-machine   -        generic   Running   tcp://192.168.1.8:2376                           v1.11.1   
swarm-master     -        generic   Running   tcp://192.168.1.7:2376   swarm-master (master)   v1.11.1   
swarm-slave01    -        generic   Running   tcp://192.168.1.2:2376   swarm-master            v1.11.1   
swarm-slave02    -        generic   Running   tcp://192.168.1.5:2376   swarm-master            v1.11.1   
swarm-slave03    -        generic   Running   tcp://192.168.1.4:2376   swarm-master            v1.11.1   

* List the swarm nodes in Firefox using Consul.

firefox http://$(docker-machine ip consul-machine):8500/ui/#/dc1/kv/docker/swarm/nodes/

* Is my cluster working? First, switch to the swarm-master environment. Then view it's information. You should see the slaves listed. Next run the hello-world container. And finally, list the containers.

eval $(docker-machine env swarm-master)
docker -H $(docker-machine ip swarm-master):3376 info
docker -H $(docker-machine ip swarm-master):3376 run hypriot/armhf-hello-world
docker -H $(docker-machine ip swarm-master):3376 ps -a

CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS                     PORTS                                  NAMES
456fa23b8c52        hypriot/armhf-hello-world   "/hello"                 8 seconds ago       Exited (0) 5 seconds ago                                          swarm-slave01/nauseous_swartz
e1eb8a790e3f        hypriot/rpi-swarm:latest    "/swarm join --advert"   3 hours ago         Up 3 hours                 2375/tcp                               swarm-slave03/swarm-agent
122b89a2ae5d        hypriot/rpi-swarm:latest    "/swarm join --advert"   3 hours ago         Up 3 hours                 2375/tcp                               swarm-slave02/swarm-agent
449aa7087ecc        hypriot/rpi-swarm:latest    "/swarm join --advert"   3 hours ago         Up 3 hours                 2375/tcp                               swarm-slave01/swarm-agent
6355f31de952        hypriot/rpi-swarm:latest    "/swarm join --advert"   3 hours ago         Up 3 hours                 2375/tcp                               swarm-master/swarm-agent
05ee666e8662        hypriot/rpi-swarm:latest    "/swarm manage --tlsv"   3 hours ago         Up 3 hours                 2375/tcp, 192.168.1.7:3376->3376/tcp   swarm-master/swarm-agent-master

Jump up and down when you see that the hello-world container was run from swarm-master but run on swarm-slave01!