2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

04/23/2003: Lucene; Bug; TestPhrasePrefixQuery in 2003.04.21 Build Has Misleading Code?

The TestPhrasePrefixQuery looks like it is searching for "blueberry pi*" and it even seems to work at first glance. However, the test data is not extensive enough to show what is really happening.

The searching technique implemented in TestPhrasePrefixQuery will find not only "blueberry pie" but also "blueberry strudel" if both exist in the documents.

The reason is that the IndexReader.terms(Term termToMatch) method looks for the first term equal or larger than termToMatch and then returns *all* terms from that point in the index to the end.

One potential solution might be something like the following:

String pattern = "pi*";
TermEnum te = ir.terms(new Term("body", pattern));
while (te.term().text().matches(pattern)) {
    termsWithPrefix.add(te.term());
    if (te.next() == false)
        break;
    }
}

Of course, the code above only works with JDK1.4 because of the pattern matching.

Comments?



subscribe via RSS