Running a Single-Node Accumulo Docker container

Based on the work by sroegner, I have a github project at https://github.com/medined/docker-accumulo which lets you run multiple single-node Accumulo instances using Docker.


First, create the image.


git clone https://github.com/medined/docker-accumulo.git
cd docker-accumulo/single_node
./make_image.sh

Now start your first container.


export HOSTNAME=bellatrix
export IMAGENAME=bellatrix
export BRIDGENAME=brbellatrix
export SUBNET=10.0.10
export NODEID=1
export HADOOPHOST=10.0.10.1
./make_container.sh $HOSTNAME $IMAGENAME $BRIDGENAME $SUBNET $NODEID $HADOOPHOST yes

And then you can start a second one:

export HOSTNAME=rigel
export IMAGENAME=rigel
export BRIDGENAME=brrigel
export SUBNET=10.0.11
export NODEID=1
export HADOOPHOST=10.0.11.1
./make_container.sh $HOSTNAME $IMAGENAME $BRIDGENAME $SUBNET $NODEID $HADOOPHOST no

And a third!

export HOSTNAME=saiph
export IMAGENAME=saiph
export BRIDGENAME=brbellatrix
export SUBNET=10.0.12
export NODEID=1
export HADOOPHOST=10.0.12.1
./make_container.sh $HOSTNAME $IMAGENAME $BRIDGENAME $SUBNET $NODEID $HADOOPHOST no

The SUBNET is different for all containers. This isolates the Accumulo containers from each other.

Look at the running containers


$ docker ps
CONTAINER ID        IMAGE                     COMMAND                CREATED             STATUS              PORTS
                    NAMES
41da6f17261f        medined/accumulo:latest   /docker/run.sh saiph   4 seconds ago       Up 2 seconds        0.0.0.0:49179->19888/tcp, 0.0.0.0:49180->2181/tcp, 0.0.0.0:49181->50070/tcp, 0.0.0.0:49182->50090/tcp, 0.0.0.0:49183->8141/tcp, 0.0.0.0:49184->10020/tcp, 0.0.0.0:49185->22/tcp, 0.0.0.0:49186->50095/tcp, 0.0.0.0:49187->8020/tcp, 0.0.0.0:49188->8025/tcp, 0.0.0.0:49189->8030/tcp, 0.0.0.0:49190->8050/tcp, 0.0.0.0:49191->8088/tcp   saiph
23692dfe3f1e        medined/accumulo:latest   /docker/run.sh rigel   10 seconds ago      Up 9 seconds        0.0.0.0:49166->19888/tcp, 0.0.0.0:49167->2181/tcp, 0.0.0.0:49168->50070/tcp, 0.0.0.0:49169->8025/tcp, 0.0.0.0:49170->8088/tcp, 0.0.0.0:49171->10020/tcp, 0.0.0.0:49172->22/tcp, 0.0.0.0:49173->50090/tcp, 0.0.0.0:49174->50095/tcp, 0.0.0.0:49175->8020/tcp, 0.0.0.0:49176->8030/tcp, 0.0.0.0:49177->8050/tcp, 0.0.0.0:49178->8141/tcp   rigel
63f8f1a7141f        medined/accumulo:latest   /docker/run.sh bella   21 seconds ago      Up 20 seconds       0.0.0.0:49153->19888/tcp, 0.0.0.0:49154->50070/tcp, 0.0.0.0:49155->8020/tcp, 0.0.0.0:49156->8025/tcp, 0.0.0.0:49157->8030/tcp, 0.0.0.0:49158->8050/tcp, 0.0.0.0:49159->8088/tcp, 0.0.0.0:49160->8141/tcp, 0.0.0.0:49161->10020/tcp, 0.0.0.0:49162->2181/tcp, 0.0.0.0:49163->22/tcp, 0.0.0.0:49164->50090/tcp, 0.0.0.0:49165->50095/tcp   bellatrix

You can connect to running instances using the public ports. Especially useful is the public zookeeper port. Rather than searching through the ports listed above, here is an easier way.


$  docker port saiph 2181
0.0.0.0:49180
$ docker port rigel 2181
0.0.0.0:49167
$ docker port bellatrix 2181
0.0.0.0:49162

Having '0.0.0.0' in the response means that any IP address can connect.

You can enter the namespace of a container (i.e., access a bash shell) this way.


$ ./enter_image.sh rigel
-bash-4.1# hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - accumulo accumulo            0 2014-07-12 09:13 /accumulo
drwxr-xr-x   - hdfs     supergroup          0 2014-07-11 21:06 /user

-bash-4.1# accumulo shell -u root -p secret
Shell - Apache Accumulo Interactive Shell
-
- version: 1.5.1
- instance name: accumulo
- instance id: bb713243-3546-487f-b6d6-cfaa272efb30
-
- type 'help' for a list of available commands
-
root@accumulo> tables
!METADATA

Now let's start an edge node. For my purposes, an edge node can connect to Hadoop, Zookeeper and Accumulo without running any of those processes. All of the edge node's resources are dedicated to client work.


export HOSTNAME=rigeledge
export IMAGENAME=rigeledge
export BRIDGENAME=brrigel
export SUBNET=10.0.11
export NODEID=2
export HADOOPHOST=10.0.11.1
./make_container.sh $HOSTNAME $IMAGENAME $BRIDGENAME $SUBNET $NODEID $HADOOPHOST no

As this container is started, the 'no' means that the supervisor configuration files will be deleted. So while supervisor will be running, it won't be managing any processes. This is not a best practice. It's just the way I chose for this prototype.