Wednesday, April 11, 2007

 

Lucene Optimization Tip: Reuse Searcher

Lucene is a great text search engine, but some aspects of it are not clearly documented. For example, if you have multiple simultaneous requests (e.g., in a web application) searching for different queries, should there be a separate IndexSearcher for each thread or one shared IndexSearcher will do.

Initially we were using different IndexSearcher for different requests, creating a new instance at the start of the request and destroying soon after searching. After a lot of experimentation and exchange of a few mails over the lucene mailing list, I discovered that the efficient way is to use a single shared IndexSearcher across all requests. Multiple concurrent threads can easily invoke the search method on a single IndexSearcher object. Reusing cached data makes the single searcher approach very efficient both in terms of response time and memory usages. In our case, response times decreased by a factor of 2-3 after this change.

Remember that IndexSearcher object however needs to be destroyed and recreated after the index is modified or updated. Unless the searcher is recreated, it will not reflect the changes made to the index. A good strategy is to periodically destroy and recreate the shared index using a separate Timer thread.

Labels: , ,


Comments:
Thanks Nilesh for good post.
I am java developer and also worked on lucene search api for
my site.
Can u give me small tips which on lucene which might be small gain.

Will optimizing lucene index help
in gaining performance rise
 
Does you know some placing explaning how to do the same with Zend_Search_Lucene ???

Thnx!
 
Good Article on optimization techniques in Lucene...
Lucene Optimization
 
This comment has been removed by a blog administrator.
 
Hi Nilesh,
rightly said, that an index searcher instance can be reused for better optimization.
I am facing same problem. I am using Apache threads to serve request via cgi. these cgi create an instance of searcher and gets the result back . Now, m not sure here how can I use the same instance of searcher by every apache thread handling cgi.
How u used the same instance of searcher in diff web request.
 
@Suman - I would suggest using Tomcat (if you use Java) or some app container that allows you to share objects across requests.
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?