Modified Rough Set Based Aggregation for Effective Evaluation of Web Search Systems Rashid Ali Department of Computer Engineering A. M. U., Aligarh U.P., India rashidaliamu@rediffmail.com M. M. Sufyan Beg Department of Computer Engineering Jamia Millia Islamia New Delhi, India mmsbeg@hotmail.com Abstract—Rank Aggregation is the problem of generating a single consensus ranking for a given set of rankings. Rough set based Rank aggregation is a user feedback based technique for rank aggregation, which learns ranking rules using rough set theory. In this paper, we discuss an improved version of the Rough set based Rank aggregation technique, which is more suitable for aggregation of different Web search evaluation techniques. For learning the ranking rules, we obtain the implicit user feedback to the search results returned by a search engine in response to a set of fifteen queries and mine the ranking rules using rough set theory. In the modified rough set based rank aggregation technique, we incorporate the confidence of the rules in predicting a class for a given set of data. That means, we do not say surely that the record belongs to a particular class according to a particular rule. Instead, we associate a score variable to the predicted class of the record, where the value of the variable is equal to the confidence measure of the rule. We validate the mined ranking rules by comparing the predicted user feedback based ranking with the actual user feedback based ranking. We apply the ranking rules to another set of thirty seven queries for aggregating different rankings of search results obtained on the basis of different evaluation techniques. We show our experimental results pertaining to seven public search engines. Keywords- Web Search Evaluation, Rank Aggregation, Rough Set, Ranking Rules, Confidence, User Feedback, Vector Space Model, Boolean Similarity Measures, PageRank I. INTRODUCTION Rank Aggregation is the problem of combining given set of rankings from different voters into a single ranking list, which represents consensus. This finds applications in various fields. For example, in sports, it may be used to get an overall ranking of teams from the rankings by different judges. In academics, it can be used to obtain ranking of universities on the whole from the rankings done on the basis of different performance measuring parameters. In commerce, it may be used to grade a set of products completely from the grading by a number of parameters like cost, weight, volume etc. When applied to the web, this finds applications in meta-searching, spam fighting and evaluating search systems, searching for multiple terms using word association techniques and combining different ranking functions. In this paper, we discuss Rank aggregation in context of the evaluation of Web search systems. Let us begin with some of the important definitions. A. Important Definitions Definition 1. Rank Aggregation Problem: Given a set of n candidates say C=(C 1 ,C 2 ,C 3 ,,C n ), a set of m voters say V=(V 1 ,V 2 ,V 3 ,,V m ), and a ranked list l i on C for each voter i. Then, l i (j) < l i (k) indicates that the voter i prefers the candidate j to k. The rank aggregation problem is to combine the m ranked lists l 1 , l 2 , l 3 ,…, l m into a single list of candidates, say l that represents the collective choice of the voters. The function used to get l from l 1 , l 2 , l 3 ,…, l m (i.e. f(l 1 , l 2 , l 3 ,…, l m )=l) is known as rank aggregation function. Definition 2. Given a universe U and S U, an ordered list (or simply, a list) l with respect to U is given as l = [e 1 , e 2 ,…,e |s| ], with each e i S, and e 1 e 2 e |s\ , where “” is some ordering relation on S. Also, for j U Λ j l, let l(j) denote the position or rank of j, with a higher rank having a lower numbered position in the list. We may assign a unique identifier to each element in U and thus, without loss of generality we may get U = {1, 2,…,|U|}. Definition 3. Full List: If a list contains all the elements in U, then it is said to be a full list. Definition 4. Partial List: A list l p containing elements, which are a strict subset of universe U, is called a partial list. We have a strict inequality | l p | <|U|. Definition 5. Spearman Rank Order Correlation coefficient [1]: Let the full lists [u 1 , u 2 ,…,u n ] and [v 1 , v 2 ,…,v n ] be the two rankings for some query Q. Spearman rank order correlation coefficient (r s ) between these two rankings is defined as follows ( ) ( ) [ ] ( ) 1 6 1 2 2 1 - - - = = n n v l u l r n i i f i f s (1) Definition 6. Modified Spearman Rank Order Correlation coefficient[2]: Without loss of generality, assume that the full list is given as [1, 2,…, n]. Let the partial list be given as [v 1 , v 2 ,…,v m ]. The Modified Spearman rank order correlation coefficient (r s ׳) between these two rankings is defined as follows 978-1-4244-4577-6/09/$25.00 ©2009 IEEE The 28th North American Fuzzy Information Processing Society Annual Conference (NAFIPS2009) Cincinnati, Ohio, USA - June 14 - 17, 2009