2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2020

11/12/2012: How Can I Use Reverse Sort On Generic Accumulo Keys?

his note shows how to reverse the sorting of Accumulo (actually, the row values). As you might know, the standard sort order is lexical. This first example shows a standard usage of a mock Accumulo instance. Notice that records inserted in reverse order (5, 4, 3, 2, 1) but are printed in lexical order.

public static void main(String[] args) throws Exception {
  // connect to a mock Accumulo instance.
  Instance mock = new MockInstance("development");
  Connector connector = mock.getConnector("root", "password".getBytes());
  connector.tableOperations().create("TABLEA");
  BatchWriter wr = connector.createBatchWriter("TABLEA", 10000000, 10000, 5);

  // insert five records in reverse order.
  for (int i = 5; i > 0; --i) {
    byte[] key = ("row_" + String.format("%04d", i)).getBytes();
    Mutation m = new Mutation(new Text(key));
    m.put("cf_" + String.format("%04d", i), "cq_" + 1, "val_" + 1);
    wr.addMutation(m);
  }
  wr.close();

  // display records; notice they are lexically sorted.
  Scanner scanner = connector.createScanner("TABLEA", new Authorizations());
  Iterator<Map.Entry&lyKey, Value>> iterator = scanner.iterator();
  while (iterator.hasNext()) {
    Map.Entry entry = iterator.next();
    Key key = entry.getKey();
    System.out.println("ROW ID: " + key.getRow());
  }
 }

The above code displays:

ROW ID: row_0001
ROW ID: row_0002
ROW ID: row_0003
ROW ID: row_0004
ROW ID: row_0005

Reverse sorting is accomplished by subtracting each byte in the row id from 255 as shown in the example below.

static byte[] convert(byte[] row) {
  byte[] rv = new byte[row.length * 2];
  for (int i = 0; i < row.length; i++) {
    rv[i] = (byte) (255 - row[i]);
  }
  for (int i = 0; i < row.length; i++) {
    rv[i + row.length] = row[i];
  }
  return rv;
 }

 public static void main(String[] args) throws Exception {
  // connect to a mock Accumulo instance.
  Instance mock = new MockInstance("development");
  Connector connector = mock.getConnector("root", "password".getBytes());
  connector.tableOperations().create("TABLEA");
  BatchWriter wr = connector.createBatchWriter("TABLEA", 10000000, 10000, 5);

  // insert five records in reverse order.
  for (int i = 5; i > 0; --i) {
    byte[] key = ("row_" + String.format("%04d", i)).getBytes();
    byte[] reverse_key = convert(key);
    Mutation m = new Mutation(new Text(reverse_key));
    m.put("cf_" + String.format("%04d", i), "cq_" + 1, "val_" + 1);
    wr.addMutation(m);
  }
  wr.close();

  // display records; notice they are lexically sorted.
  Scanner scanner = connector.createScanner("TABLEA", new Authorizations());
  Iterator<Map.Entry&lyKey, Value>> iterator = scanner.iterator();
  while (iterator.hasNext()) {
    Map.Entry entry = iterator.next();
    Key key = entry.getKey();
    System.out.println("ROW ID: " + key.getRow());
  }
 }

The above code displays:

ROW ID: ��������row_0005
ROW ID: ��������row_0004
ROW ID: ��������row_0003
ROW ID: ��������row_0002
ROW ID: ��������row_0001

It's important to note that for teaching purposes, the key is stored once in reverse format and again normally. Thus when displayed you can verify that the key is stored in reverse order. Normally the convert method is used like this:

static byte[] convert(byte[] row) {
  byte[] rv = new byte[row.length];
  for (int i = 0; i < row.length; i++) {
    rv[i] = (byte) (255 - row[i]);
  }
  return rv;
 }

For some use cases, you can convert the row bytes in place:

static byte[] convert(byte[] row) {
  for (int i = 0; i < row.length; i++) {
    row[i] = (byte) (255 - row[i]);
  }
  return row;
 }

10/30/2012: Considerations for a Manual Partitioning Strategy in Accumulo

Considerations for a Manual Partitioning Strategy in Accumulo On the Accumulo Users mailing list, Adam F. suggested:
1. Parallelism and balance at ingest time. You need to find a happy medium between too few partitions (not enough parallelism) and too many partitions (tablet server resource contention and inefficient indexes). Probably at least one partition per tablet server being actively written to is good, and you'll want to pre-split so they can be distributed evenly. Ten partitions per tablet server is probably not too many -- I wouldn't expect to see contention at that point.

2. Parallelism and balance at query time. At query time, you'll be selecting a subset of all of the partitions that you've ever ingested into. This subset should be bounded similarly to the concern addressed in #1, but the bounds could be looser depending on the types of queries you want to run. Lower latency queries would tend to favor only a few partitions per node.

3. Growth over time in partition size. Over time, you want partitions to be bounded to less than about 10-100GB. This has to do with limiting the maximum amount of time that a major compaction will take, and impacts availability and performance in the extreme cases. At the same time, you want partitions to be as large as possible so that their indexes are more efficient.

One strategy to optimize partition size would be to keep using each partition until it is "full", then make another partition. Another would be to allocate a certain number of partitions per day, and then only put data in those partitions during that day. These strategies are also elastic, and can be tweaked as the cluster grows.

In all of these cases, you will need a good load balancing strategy. The default strategy of evening up the number of partitions per tablet server is probably not sufficient, so you may need to write your own tablet load balancer that is aware of your partitioning strategy.