04/21/2003: Erik Hatcher's Self-Contained Lucene Example
I was fortunate enough to attend Erik Hatcher’s Lucene presentation at the Northern Virginia Software Symposium. The symposium was organized by . I’ll talk more about Lucene as I explore its abilties.
For now, I’m just documenting the self-contained example program that Erik used as his first example:
/* * Created on Apr 21, 2003 * */ package com.affy.lucene.tutorial; import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.Searcher; import org.apache.lucene.search.TermQuery; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; /** * This program indexes three strings using Lucene * and then searches for the string that contains * the "doc1" string. */ public class ErikHatcherSelfContainedExample { public static void main(String[] args) throws IOException { String docs[] = { "doc1 - present!", "doc2 is right here", "and do not forget lil ol doc3" }; Directory directory = new RAMDirectory(); Analyzer analyzer = new StandardAnalyzer(); IndexWriter writer = new IndexWriter(directory, analyzer, true); for (int j = 0; j < docs.length; j++) { Document d = new Document(); d.add(Field.Text("contents", docs[j])); writer.addDocument(d); } writer.close(); Searcher searcher = new IndexSearcher(directory); Query query = new TermQuery(new Term("contents", "doc1")); Hits hits = searcher.search(query); System.out.println("doc1 hits: " + hits.length()); searcher.close(); System.out.println("Done."); } }
04/14/2003: Faceted Classification
I worked to create a product hierarchy at Toyrus.com back in 1999. Little did I know that I was actually creating a system to support Faceted Classifications. I love learning new words that codify my real-world experience. My understanding of Faceted Classification is that several exclusive descriptive hierarchies are used to describe objects.
At the beginning of a search, each node in the hierarchy shows the number of objects associated with that node. However, as more specific-nodes are selected from any hierarchy, the number of objects belonging to each node (in all hierarchies) is reduced to exclude objects that don’t include the trait associated with the selected nodes. An online demonstration of this type of search can be found at FacetMap.com.
Here is are examples of classifications:
Bad Classification
- evening red dress
- morning dress
- green dress
- wool green dress
- silk dress
</ul>
Notice that you can't build an object such as a "green evening dress" because "evening dress" appear only in the compound phrase of "evening red dress".
Good Classification
- dresses by colour
- red
- green
- yellow </ul>
- dresses by material
- silk
- wool
- cotton </ul>
- dresses by purpose
- evening
- bathrobe
- sleeping garments
</ul>
</ul>
Using the above hierarchies, it is easy to select object traits such as "yellow cotton evening dress".
BTW, I stole this example from an email message on the FacetedClassification group at Yahoo Groups.
- dresses by colour