10/02/2012: How Accumulo Compresses Keys and Values
From the
General consensus seemed to favordouble compression - compression both at the application level (i.e., compress the values) and let Accumulo compress as well (i.e., the relative encoding).
In support of double compression, Ameet K. said:
Acccumulo User
mailing list, Keith T said:There are two levels of compression in Accumulo. First redundant
parts of the key are not stored. If the row in a key is the same as
the previous row, then its not stored again. The same is done for
columns and time stamps. After the relative encoding is done a block
of key values is then compressed with gzip. As data is read from an RFile, when the row of a key is the same as
the previous key it will just point to the previous keys row. This is
carried forward over the wire. As keys are transferred, duplicate
fields in the key are not transferred.
General consensus seemed to favor
In support of double compression, Ameet K. said:
I've switched to double compression as per previous posts and
its working nicely. I see about 10-15% more compression over just
application level Value compression.
08/28/2012: Stackscript for Accumulo (on Linode)
I wanted a way to test the very latest Accumulo code. I could not use any of my existing systems because they were multi-use boxes. So I wrote a Stackscript in order to prepare a Linode server. Note that this script pulls a few files from my affy.com server for simplicity.
Users: hadoop, zookeeper, accumulo
Password: password
This stackscript downloads, installs and configures hadoop, zookeeper, and accumulo.
Step one is to create your own stackscript. Mine was called "InitializeAccumulo". The idea is that the starter script pulls the actual script from some server so that you don't need to deal with the 'Manage Stackscript' interface. The starter script is:
After the Linode server boots, it will take about 10 minutes to run this script.
Get the latest copy of the stackscript from https://github.com/medined/accumulo_stackscript.
It's now possible to grab the whole project and then run the stackscript manually.
Users: hadoop, zookeeper, accumulo
Password: password
This stackscript downloads, installs and configures hadoop, zookeeper, and accumulo.
Step one is to create your own stackscript. Mine was called "InitializeAccumulo". The idea is that the starter script pulls the actual script from some server so that you don't need to deal with the 'Manage Stackscript' interface. The starter script is:
#!/bin/bash wget http://www.affy.com/linode_build/stackscript -O /tmp/stackscript.sh chmod +x /tmp/stackscript.sh /tmp/stackscript.sh
After the Linode server boots, it will take about 10 minutes to run this script.
Get the latest copy of the stackscript from https://github.com/medined/accumulo_stackscript.
It's now possible to grab the whole project and then run the stackscript manually.