Using Apache Accumulo as the backing store for Apache Gora - a tutorial
Apache Gora (http://gora.apache.org/) provides an abstraction layer to work with various
data storage engines. In this tutorial, we'll see how to use Gora with Apache Accumulo
as the storage engine.
I like to start projects with the Maven `pom.xml` file. So here is mine. It's important to
use Accumulo 1.4.3 instead of the newly released 1.5.0 because of an API incompatibility.
Otherwise, the `pom.xml` file is straightforward.
Now create a `src/main/resources/gora.properties` file configuring Gora by
specifying how to find Accumulo.
There are some important items to note. Firstly, we'll be using the MockInstance of
Accumulo so that you don't actually need to have it installed. Secondly, the password
needs to be blank if you are depending on Accumulo 1.4.3, change the password to
'''secret''' if using an earlier version.
That's all it takes to configure Gora. Now let's create a json file to define a very
simple object - a Person with just a first name. Create a json file with the
following:
This is the simplest object I could think of. Not very useful for real applications, but
great for a simple proof-of-concept project.
The json file needs to be compiled into a Java file with the Gora compiler. Hopefully, you
have installed Gora and put its ```bin``` directory onto your path. Run the following to
generate the Java code:
One last bit of setup is needed. Create a ```src/main/resources/gora-accumulo-mapping.xml```
file with the following:
Finally we get to the fun part. Actually writing a Java program to create, save, and
read a Person object. The code is straightforward so I won't explain it, just show it. Create
a src/main/java/com/affy/Create_Save_Read_Person_Driver.java file like this:
This program has this output:
Hopefully, I'll be able to post more complex examples in the future.
use Accumulo 1.4.3 instead of the newly released 1.5.0 because of an API incompatibility.
Otherwise, the `pom.xml` file is straightforward.
<project ...> <modelVersion>4.0.0</modelVersion> <groupId>com.affy</groupId> <artifactId>pojos-in-accumulo</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>POJOs in Accumulo</name> <url>http://affy.com</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <!-- Dependency Versions --> <accumulo.version>1.4.3</accumulo.version> <gora.version>0.3</gora.version> <slf4j.version>1.7.5</slf4j.version> <!-- Maven Plugin Dependencies --> <maven-compiler-plugin.version>2.3.2</maven-compiler-plugin.version> <maven-jar-plugin.version>2.4</maven-jar-plugin.version> <maven-dependency-plugin.version>2.4</maven-dependency-plugin.version> <maven-clean-plugin.version>2.4.1</maven-clean-plugin.version> </properties> <dependencies> <dependency> <groupId>org.apache.accumulo</groupId> <artifactId>accumulo-core</artifactId> <version>${accumulo.version}</version> <type>jar</type> </dependency> <dependency> <groupId>org.apache.accumulo</groupId> <artifactId>accumulo-server</artifactId> <version>${accumulo.version}</version> <type>jar</type> </dependency> <dependency> <groupId>org.apache.gora</groupId> <artifactId>gora-core</artifactId> <version>${gora.version}</version> </dependency> <dependency> <groupId>org.apache.gora</groupId> <artifactId>gora-accumulo</artifactId> <version>${gora.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>${slf4j.version}</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>${slf4j.version}</version> </dependency> <!-- TEST --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.8.2</version> <scope>test</scope> </dependency> </dependencies> </project>
Now create a `src/main/resources/gora.properties` file configuring Gora by
specifying how to find Accumulo.
gora.datastore.default=org.apache.gora.accumulo.store.AccumuloStore gora.datastore.accumulo.mock=true gora.datastore.accumulo.instance=instance gora.datastore.accumulo.zookeepers=localhost gora.datastore.accumulo.user=root gora.datastore.accumulo.password=
There are some important items to note. Firstly, we'll be using the MockInstance of
Accumulo so that you don't actually need to have it installed. Secondly, the password
needs to be blank if you are depending on Accumulo 1.4.3, change the password to
'''secret''' if using an earlier version.
That's all it takes to configure Gora. Now let's create a json file to define a very
simple object - a Person with just a first name. Create a json file with the
following:
{ "type": "record", "name": "Person", "namespace": "com.affy.generated", "fields": [ {"name": "first", "type": "string"} ] }
This is the simplest object I could think of. Not very useful for real applications, but
great for a simple proof-of-concept project.
The json file needs to be compiled into a Java file with the Gora compiler. Hopefully, you
have installed Gora and put its ```bin``` directory onto your path. Run the following to
generate the Java code:
gora goracompiler src/main/avro/person.json src/main/java
One last bit of setup is needed. Create a ```src/main/resources/gora-accumulo-mapping.xml```
file with the following:
<gora-orm> <class table="people" keyClass="java.lang.String" name="com.affy.generated.Person"> <field name="first" family="f" qualifier="q" /> </class> </gora-orm>
Finally we get to the fun part. Actually writing a Java program to create, save, and
read a Person object. The code is straightforward so I won't explain it, just show it. Create
a src/main/java/com/affy/Create_Save_Read_Person_Driver.java file like this:
package com.affy; import com.affy.generated.Person; import org.apache.avro.util.Utf8; import org.apache.gora.store.DataStore; import org.apache.gora.store.DataStoreFactory; import org.apache.gora.util.GoraException; import org.apache.hadoop.conf.Configuration; public class Create_Save_Read_Person_Driver { private void process() throws GoraException { Person person = new Person(); person.setFirst(new Utf8("David")); System.out.println("Person written: " + person); DataStore<String, Person> datastore = DataStoreFactory.getDataStore(String.class, Person.class, new Configuration()); if (!datastore.schemaExists()) { datastore.createSchema(); } datastore.put("001", person); Person p = datastore.get("001"); System.out.println("Person read: " + p); } public static void main(String[] args) throws GoraException { Create_Save_Read_Person_Driver driver = new Create_Save_Read_Person_Driver(); driver.process(); } }
This program has this output:
Person written: com.affy.generated.Person@20c { "first":"David" } Person read: com.affy.generated.Person@20c { "first":"David" }
Hopefully, I'll be able to post more complex examples in the future.