Modified Rough Set Based Aggregation for Effective
Evaluation of Web Search Systems
Rashid Ali
Department of Computer Engineering
A. M. U., Aligarh
U.P., India
rashidaliamu@rediffmail.com
M. M. Sufyan Beg
Department of Computer Engineering
Jamia Millia Islamia
New Delhi, India
mmsbeg@hotmail.com
Abstract—Rank Aggregation is the problem of generating a single
consensus ranking for a given set of rankings. Rough set based
Rank aggregation is a user feedback based technique for rank
aggregation, which learns ranking rules using rough set theory.
In this paper, we discuss an improved version of the Rough set
based Rank aggregation technique, which is more suitable for
aggregation of different Web search evaluation techniques. For
learning the ranking rules, we obtain the implicit user feedback
to the search results returned by a search engine in response to a
set of fifteen queries and mine the ranking rules using rough set
theory. In the modified rough set based rank aggregation
technique, we incorporate the confidence of the rules in
predicting a class for a given set of data. That means, we do not
say surely that the record belongs to a particular class according
to a particular rule. Instead, we associate a score variable to the
predicted class of the record, where the value of the variable is
equal to the confidence measure of the rule. We validate the
mined ranking rules by comparing the predicted user feedback
based ranking with the actual user feedback based ranking. We
apply the ranking rules to another set of thirty seven queries for
aggregating different rankings of search results obtained on the
basis of different evaluation techniques. We show our
experimental results pertaining to seven public search engines.
Keywords- Web Search Evaluation, Rank Aggregation, Rough
Set, Ranking Rules, Confidence, User Feedback, Vector Space
Model, Boolean Similarity Measures, PageRank
I. INTRODUCTION
Rank Aggregation is the problem of combining given set of
rankings from different voters into a single ranking list, which
represents consensus. This finds applications in various fields.
For example, in sports, it may be used to get an overall ranking
of teams from the rankings by different judges. In academics, it
can be used to obtain ranking of universities on the whole from
the rankings done on the basis of different performance
measuring parameters. In commerce, it may be used to grade a
set of products completely from the grading by a number of
parameters like cost, weight, volume etc. When applied to the
web, this finds applications in meta-searching, spam fighting
and evaluating search systems, searching for multiple terms
using word association techniques and combining different
ranking functions. In this paper, we discuss Rank aggregation
in context of the evaluation of Web search systems. Let us
begin with some of the important definitions.
A. Important Definitions
Definition 1. Rank Aggregation Problem: Given a set of
n candidates say C=(C
1
,C
2
,C
3
,…,C
n
), a set of m voters say
V=(V
1
,V
2
,V
3
,…,V
m
), and a ranked list l
i
on C for each voter i.
Then, l
i
(j) < l
i
(k) indicates that the voter i prefers the
candidate j to k. The rank aggregation problem is to combine
the m ranked lists l
1
, l
2
, l
3
,…, l
m
into a single list of candidates,
say l that represents the collective choice of the voters. The
function used to get l from l
1
, l
2
, l
3
,…, l
m
(i.e. f(l
1
, l
2
, l
3
,…,
l
m
)=l) is known as rank aggregation function.
Definition 2. Given a universe U and S ⊆ U, an ordered
list (or simply, a list) l with respect to U is given as l = [e
1
,
e
2
,…,e
|s|
], with each e
i
∈ S, and e
1
≻ e
2
≻…≻e
|s\
, where “≻” is
some ordering relation on S. Also, for j ∈ U Λ j ∈ l, let l(j)
denote the position or rank of j, with a higher rank having a
lower numbered position in the list. We may assign a unique
identifier to each element in U and thus, without loss of
generality we may get U = {1, 2,…,|U|}.
Definition 3. Full List: If a list contains all the elements
in U, then it is said to be a full list.
Definition 4. Partial List: A list l
p
containing elements,
which are a strict subset of universe U, is called a partial list.
We have a strict inequality | l
p
| <|U|.
Definition 5. Spearman Rank Order Correlation
coefficient [1]: Let the full lists [u
1
, u
2
,…,u
n
] and [v
1
, v
2
,…,v
n
]
be the two rankings for some query Q. Spearman rank order
correlation coefficient (r
s
) between these two rankings is
defined as follows
( ) ( ) [ ]
( ) 1
6
1
2
2
1
-
-
- =
∑
=
n n
v l u l
r
n
i
i f i f
s
(1)
Definition 6. Modified Spearman Rank Order
Correlation coefficient[2]: Without loss of generality, assume
that the full list is given as [1, 2,…, n]. Let the partial list be
given as [v
1
, v
2
,…,v
m
]. The Modified Spearman rank order
correlation coefficient (r
s
׳) between these two rankings is
defined as follows
978-1-4244-4577-6/09/$25.00 ©2009 IEEE
The 28th North American Fuzzy Information Processing Society Annual Conference (NAFIPS2009)
Cincinnati, Ohio, USA - June 14 - 17, 2009