Accumulo BatchScanner With and Without WholeRowIterator
This note shows the difference between an Accumulo query both without and with an WholeRowIterator.
The code snippet below picks up the narrative after you've initialized a Connector object. First we can see what a plain scan looks like:
The results of this query, using the data loaded by the SOICSVToAccumulo
class from https://github.com/medined/D4M_Schema, is shown below.
Hopefully you can see that this output represents two 'standard' RDMS records with
columns named 'a00100', 'a00200', etc. This organization becomes really obvious
when the WholeRowIterator is used. The scanner part of the code for this is shown below:
The output for this code is:
What happened to all of the other information? We can find it again using the
WholeRowIterator.decodeRow method as shown below:
This code produces:
// Read from the tEdge table of the D4M schema. String tableName = "tEdge"; // Read from 5 tablets at a time. int numQueryThreads = 5; Text startRow = new Text("6000"); Text endRow = new Text("6001"); List<Range> range = Collections.singletonList(new Range(startRow, endRow)); BatchScanner scanner = connector.createBatchScanner(tableName, new Authorizations(), numQueryThreads); scanner.setRanges(range); for (Entry<Key, Value> entry : scanner) { System.out.println(entry.getKey()); } scanner.close();
The results of this query, using the data loaded by the SOICSVToAccumulo
class from https://github.com/medined/D4M_Schema, is shown below.
600006a870bb4c8471a27c9bd0f3f064265d062d :a00100|0.0001 [] 1401023353637 false 600006a870bb4c8471a27c9bd0f3f064265d062d :a00200|0.0001 [] 1401023353637 false ... 600006a870bb4c8471a27c9bd0f3f064265d062d :state|UT [] 1401023353637 false 600006a870bb4c8471a27c9bd0f3f064265d062d :zipcode|84521 [] 1401023353637 false 6000338cbf2daede3efd4355165c98771b3e2b66 :a00100|29673.0000 [] 1401023273694 false 6000338cbf2daede3efd4355165c98771b3e2b66 :a00200|20421.0000 [] 1401023273694 false ... 6000338cbf2daede3efd4355165c98771b3e2b66 :state|OR [] 1401023273694 false 6000338cbf2daede3efd4355165c98771b3e2b66 :zipcode|97365 [] 1401023273694 false
Hopefully you can see that this output represents two 'standard' RDMS records with
columns named 'a00100', 'a00200', etc. This organization becomes really obvious
when the WholeRowIterator is used. The scanner part of the code for this is shown below:
BatchScanner scanner = connector.createBatchScanner(tableName, new Authorizations(), numQueryThreads); scanner.setRanges(range); IteratorSetting iteratorSetting = new IteratorSetting(1, WholeRowIterator.class); scanner.addScanIterator(iteratorSetting); for (Entry<Key, Value> entry : scanner) { System.out.println(entry.getKey()); } scanner.close();
The output for this code is:
600006a870bb4c8471a27c9bd0f3f064265d062d : [] 9223372036854775807 false 6000338cbf2daede3efd4355165c98771b3e2b66 : [] 9223372036854775807 false
What happened to all of the other information? We can find it again using the
WholeRowIterator.decodeRow method as shown below:
for (Entry<Key, Value> entry : scanner) { try { SortedMap<Key, Value> wholeRow = WholeRowIterator.decodeRow(entry.getKey(), entry.getValue()); System.out.println(wholeRow); } catch (IOException e) { throw new RuntimeException(e); } }
This code produces:
{600006a870bb4c8471a27c9bd0f3f064265d062d :a00100|0.0001 [] 1401023353637 false=1, 600006a870bb4c8471a27c9bd0f3f064265d062d :a00200|0.0001 [] 1401023353637 false=1, ... 600006a870bb4c8471a27c9bd0f3f064265d062d :state|UT [] 1401023353637 false=1, 600006a870bb4c8471a27c9bd0f3f064265d062d :zipcode|84521 [] 1401023353637 false=1} {6000338cbf2daede3efd4355165c98771b3e2b66 :a00100|29673.0000 [] 1401023273694 false=1, 6000338cbf2daede3efd4355165c98771b3e2b66 :a00200|20421.0000 [] 1401023273694 false=1, ... 6000338cbf2daede3efd4355165c98771b3e2b66 :state|OR [] 1401023273694 false=1, 6000338cbf2daede3efd4355165c98771b3e2b66 :zipcode|97365 [] 1401023273694 false=1}