Entity Extraction Links (also Named entity recognition)
  • Websites
  • Software
    • Ruby
    • http://github.com/hypomodern/flex-attributes/ - Originally by Eric Anderson, some of that code still remains. If you’re not into this version, check his out at rubyforge.org/projects/flex-attributes/ See Hypomodern::FlexAttributes for usage information.
  • http://www.opencalais.com/ - Calais is a rapidly growing toolkit of capabilities that allow you to readily incorporate state-of-the-art semantic functionality within your blog, content management system, website or application.
  • http://www.aktors.org/technologies/annie/ - Open Source Information Extraction from The University of Sheffield; ANNIE is an open-source, robust Information Extraction (IE) system which relies on finite state algorithms. ANNIE consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, named entity recogniser.
  • http://www.searchenginecaffe.com/2007/03/java-open-source-text-mining-and.html - Jeff Dalton's List of Java Open Source NLP and Text Mining tools
  • http://gate.ac.uk/ - GATE as an architecture suggests that the elements of software systems that process natural language can usefully be broken down into various types of component, known as resources
  • http://www.casos.cs.cmu.edu/projects/automap/ - AutoMap is a text mining tool that enables the extraction of network data from texts. AutoMap can extract three types of information: content analytic (words and frequencies), semantic networks, and meta-networks.
  • http://www.netowl.com/ - SRA developed NetOwl®, a suite of rich text mining tools, to discover and extract the knowledge found in free-form text documents and turn it into actionable information. NetOwl has been refined over more than a decade of research and development. Our team of researchers and engineers continue to expand NetOwl’s capabilities to keep pace with evolving information needs.
  • http://www.inxightfedsys.com/products/sdks/tf/default.asp - Out of the box, Inxight ThingFinder automatically identifies and extracts more than 35 key entities - such as people, dates, places, companies or other things - from any text data source, in multiple languages. This ability to automatically identify and classify relevant entities makes ThingFinder one of the most powerful text analysis and extraction tools on the market. Using Inxight ThingFinder, developers can maximize and extend the value of their applications by enabling end-users to quickly find the most important pieces of information within large volumes of documents.
  • http://incubator.apache.org/uima/ - UIMA enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)". Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages.
Articles
  • http://en.wikipedia.org/wiki/Named_entity_recognition - Named entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
  • http://www.semanticuniverse.com/articles-entity-extraction-and-semantic-web.html - Entity Extraction is the process of automatically extracting document metadata from unstructured text documents. Extracting key entities such as person names, locations, dates, specialized terms and product terminology from free-form text can empower organizations to not only improve keyword search but also open the door to semantic search, faceted search and document repurposing. This article defines the field of entity extraction, shows some of the technical challenges involved, and shows how RDF can be used to store document annotations. It then shows how new tools such as Apache UIMA are poised to make entity extraction much more cost effective to an organization.
  • http://broadcast.oreilly.com/2009/02/how-entity-extraction-is-fueli.html - How Entity Extraction is Fueling the Semantic Web Fire; short commentary on Apache UIMA and a few other tools.