Building Search Applications: Lucene, LingPipe, and Gate
Lucene, LingPipe, and Gate are popular open source tools to build powerful search applications. Building Search Applications describes functions from Lucene that include indexing, searching, ranking, and spelling correction to build search engines. With this book you will learn to: Extract tokens from text using custom tokenizers and analyzers from Lucene, LingPipe, and Gate. Construct a search engine index with an optional backend database to manage large document collections. Explore the wide range of Lucene queries to search an index, understand the ranking algorithm for a query, and suggest spelling corrections. Find the names of people, places, and other entities in text using LingPipe and Gate. Categorize documents by topic using classifiers and build groups of self-organized documents using clustering algorithms from LingPipe. Create a Web crawler to scan the Web, Intranet, or desktop using Nutch. Track the sentiment of articles published on the Web with LingPipe.
What people are saying - Write a review
We haven't found any reviews in the usual places.
Indexing Text with Lucene
Searching Text with Lucene
Clustering 6 1 Applications 6 2 Creating Clusters 6 2 1 Clustering Documents 6 2 2 Similarity Measures 6 2 3 Comparison of Similarity Measures 6 2...
adjectives algorithm annotation applications assigned Berkeley DB bigrams boolean BooleanQuery Brown corpus build Chapter character chunk chunker classifier compute constructor contains contents cosine similarity crawl crawler created default deleted dictionary document collection edit distance entity evaluate example extract fetch Field Figure filter frequency graph hypernym hyponym identify incoming links indexed documents IndexWriter interface Intranet Java keywords large number LingPipe list of hits list of tokens Lucene matching document metadata method multiple ngrams noun number of clusters Nutch occur OpenSearch PageRank parameter phrases POS tags precision probability public class query term QueryParser rank regular expression relevant rules sample score search engine index segment server similarity similarity matrix single spam specified stop words stored synset tag sequence tagger text stream text string training data training documents URLs Web graph WordNet