Yahoo!Clusty - Adding real-time clustering functionality to the Yahoo! web search engine Giuseppe Narzisi Courant Institute of Mathematical Sciences New York University email: narzisi@nyu.edu 1 Objective of the project Yahoo!Clusty 1 is a Clustering Meta-search Engine (MSE) that allows users to send queries to Yahoo!. The returned snippets are grouped into homoge- neous groups by topic. The objective of this project has been to create a flexible MSE for the Yahoo! web search engine. The purpose is to present the results returned to a query in a more structured format which will allow the user to easily explore them by category. The basic idea, which is has been recently become a focus of attention in the information retrieval community [6, 7], is to consider only the snippets of the returned web pages as a consistent representation of each page and grouping them in homogeneous clusters by means of clustering and catego- rization algorithms. The processing must be done on the fly at run-time, so it requires efficient implementation and design of technologies and algo- rithms in order to minimize the latency between the issuing of the query and the presentation of the results. Many different approaches have been presented in the last 10 years (Copernic, Dogpile, iBoogie, Kartoo, Mooter, Vivisimo, etc.) and many academic prototypes have been explored as well. A recent example is given by the Armil 2 [1] meta-search engine. Given the limited amount of time and the complexity of the project, the goal is not to develop a sophisticated MSE that can outperform all the previous MSEs but to create a flexible platform for testing various clustering algorithms and labeling techniques on snippets and show that all this can be achieved in a one semester project. Moreover the system has been developed in such a way that it can be easily extended with more functionalities. 1 http://cims.nyu.edu/∼gn387/websearchengines/yahoo!clusty.html 2 http://armil.iit.cnr.it 1