Wednesday, April 11, 2007

 

Lucene Optimization Tip: Reuse Searcher

Lucene is a great text search engine, but some aspects of it are not clearly documented. For example, if you have multiple simultaneous requests (e.g., in a web application) searching for different queries, should there be a separate IndexSearcher for each thread or one shared IndexSearcher will do.

Initially we were using different IndexSearcher for different requests, creating a new instance at the start of the request and destroying soon after searching. After a lot of experimentation and exchange of a few mails over the lucene mailing list, I discovered that the efficient way is to use a single shared IndexSearcher across all requests. Multiple concurrent threads can easily invoke the search method on a single IndexSearcher object. Reusing cached data makes the single searcher approach very efficient both in terms of response time and memory usages. In our case, response times decreased by a factor of 2-3 after this change.

Remember that IndexSearcher object however needs to be destroyed and recreated after the index is modified or updated. Unless the searcher is recreated, it will not reflect the changes made to the index. A good strategy is to periodically destroy and recreate the shared index using a separate Timer thread.

Labels: , ,


This page is powered by Blogger. Isn't yours?