How Can I Use Reverse Sort On Generic Accumulo Keys?
his note shows how to reverse the sorting of Accumulo (actually, the row values). As you might know, the standard sort order is lexical. This first example shows a standard usage of a mock Accumulo instance.
Notice that records inserted in reverse order (5, 4, 3, 2, 1) but are printed in lexical order.
public static void main(String[] args) throws Exception { // connect to a mock Accumulo instance. Instance mock = new MockInstance("development"); Connector connector = mock.getConnector("root", "password".getBytes()); connector.tableOperations().create("TABLEA"); BatchWriter wr = connector.createBatchWriter("TABLEA", 10000000, 10000, 5); // insert five records in reverse order. for (int i = 5; i > 0; --i) { byte[] key = ("row_" + String.format("%04d", i)).getBytes(); Mutation m = new Mutation(new Text(key)); m.put("cf_" + String.format("%04d", i), "cq_" + 1, "val_" + 1); wr.addMutation(m); } wr.close(); // display records; notice they are lexically sorted. Scanner scanner = connector.createScanner("TABLEA", new Authorizations()); Iterator<Map.Entry&lyKey, Value>> iterator = scanner.iterator(); while (iterator.hasNext()) { Map.Entryentry = iterator.next(); Key key = entry.getKey(); System.out.println("ROW ID: " + key.getRow()); } }
The above code displays:
ROW ID: row_0001 ROW ID: row_0002 ROW ID: row_0003 ROW ID: row_0004 ROW ID: row_0005
Reverse sorting is accomplished by subtracting each byte in the row id from 255 as shown in the example below.
static byte[] convert(byte[] row) { byte[] rv = new byte[row.length * 2]; for (int i = 0; i < row.length; i++) { rv[i] = (byte) (255 - row[i]); } for (int i = 0; i < row.length; i++) { rv[i + row.length] = row[i]; } return rv; } public static void main(String[] args) throws Exception { // connect to a mock Accumulo instance. Instance mock = new MockInstance("development"); Connector connector = mock.getConnector("root", "password".getBytes()); connector.tableOperations().create("TABLEA"); BatchWriter wr = connector.createBatchWriter("TABLEA", 10000000, 10000, 5); // insert five records in reverse order. for (int i = 5; i > 0; --i) { byte[] key = ("row_" + String.format("%04d", i)).getBytes(); byte[] reverse_key = convert(key); Mutation m = new Mutation(new Text(reverse_key)); m.put("cf_" + String.format("%04d", i), "cq_" + 1, "val_" + 1); wr.addMutation(m); } wr.close(); // display records; notice they are lexically sorted. Scanner scanner = connector.createScanner("TABLEA", new Authorizations()); Iterator<Map.Entry&lyKey, Value>> iterator = scanner.iterator(); while (iterator.hasNext()) { Map.Entryentry = iterator.next(); Key key = entry.getKey(); System.out.println("ROW ID: " + key.getRow()); } }
The above code displays:
ROW ID: ��������row_0005 ROW ID: ��������row_0004 ROW ID: ��������row_0003 ROW ID: ��������row_0002 ROW ID: ��������row_0001
It's important to note that for teaching purposes, the key is stored once in reverse format and again normally. Thus when displayed you can verify that the key is stored in reverse order. Normally the convert method is used like this:
static byte[] convert(byte[] row) { byte[] rv = new byte[row.length]; for (int i = 0; i < row.length; i++) { rv[i] = (byte) (255 - row[i]); } return rv; }
For some use cases, you can convert the row bytes in place:
static byte[] convert(byte[] row) { for (int i = 0; i < row.length; i++) { row[i] = (byte) (255 - row[i]); } return row; }