Friday, February 19, 2010

Digging up Resources - Fulltext search in BibSonomy

Background:
For a while now we were redesigning BibSonomy's full text search backend and now we decided that it is mature enough for mastering all of BibSonomy's search requests.

Our old backend was based on MySQL, using the MyISAM storage engine. But with all your Posts enlarging the search index each day, we nearly reached our server's capacity. Looking for a more efficient way of implementing full text search, we stumbled upon Lucene, a highly optimized search engine library, which is incorporated by the Apache Jakarta Project family since September 2001.

Now all of BibSonomy's full text search queries are handled by two redundant Lucene indexes, which are alternatively updated every 5 minutes.

Impact on your daily "BibSonomy-Experience":
First of all, switching to Lucene was an important step for preparing our servers to deal with even more users joining the BibSonomy community, as the search task now is separated and can be distributed among several independent machines. Secondly we hope to decrease BibSonomy's already small response time. But finally we now support more sophisticated search queries like "collaborative AND (b*marking OR ressource*)".

If you have any suggestion or encounter any problem, please contact us.

Happy Tagging!