A Technique for Summarizing Web Reviews
Alpana Dubey
Siemens Corporate Technology
Bangalore, INDIA
alpanad@ieee.org
Atul Kumar
Accenture Technology Labs
Bangalore, INDIA
atulk@ieee.org
Abstract
We propose a technique for summarizing Web reviews.
Information summarization has become an important prob-
lem in the current content saturated world. One such ex-
ample is the World Wide Web which provides a platform to
publish and evaluate information. This collaborative na-
ture of the Web has enabled users to write their opinion on
certain topics and also evaluate others’ opinions by assign-
ing ranks. In this paper we show that the above aspect of
Web can be utilized to generate more useful summary. We
consider the problem of generating summary from the Web
reviews and the rank (usefulness) assigned to these reviews
by other users. We study the usefulness of user ranks in the
summarization task. Based on the study, we propose a tech-
nique which takes ranked reviews as input and generates
a summary. We experiment with different variations of the
proposed technique and evaluate them based on different
criteria.
1. Introduction
The collaborative nature of the Web has created in-
formation overload due to a lot of redundant and use-
less information. One such instance is user reviews on
topics such as movies, hotels, attractions, etc. published
on web portals (some examples are http://www.imdb.com,
http://www.tripadvisor.com). These sites also provide facil-
ity for ranking reviews. The e-commerce sites for selling
and buying consumer products, such as amazon.com, also
provide customer reviews and the usefulness rank assigned
to these reviews by other customers. A survey done on hotel
and restaurant industry in 2007 has concluded that 80% of
the UK consumers research online before booking a hotel
and half of them refrained from booking a hotel as a di-
rect result of a negative review on portals [1]. Reviews in
such portals get accumulated over a period of time and often
have a lot of redundant information, leading to the problem
of information overload. Hence, often a small sample of
reviews is browsed by users to get a general idea about a
topic. A small sample does not reflect the actual opinion
of the mass (those who visit the portal). Hence, there is a
need for a summarization system that takes into account all
the reviews and user provided ranks to generate useful sum-
mary. In this paper we propose a technique to summarize
web reviews.
A review summarization system can be seen as a doc-
ument summarization system where a review is consid-
ered as a document. Automatic document summarization is
the process of restating essential idea of document or pas-
sage [9]. Most of the classical document summarization
techniques attempt at addressing summarization of a single
document or a passage [2, 3, 4, 9]. There are number of
attempts which use more than one documents to create a
summary [5, 8, 10, 12, 13]. Two main approaches are used
for document summarization: a) extraction based: in which
some of the sentences are extracted from documents and
copied in summary [5], and b) abstraction based: where,
besides sentence extraction, techniques also involves para-
phrasing of selected sentences [11]. In both of the tech-
niques, sentences are extracted based on certain features.
The above techniques are not quite suitable for summariz-
ing web based review because web based reviews often have
an additional information called user provided ranks which
are not considered while generating summaries. User pro-
vided rank measures how many users found a review useful.
We propose to use this additional information as a feature
to generate more useful summary. We first study different
aspects of web based reviews and correlate those aspects
with their ranks in order to asses usefulness of ranks. Later,
we experiment with various combinations of features along
with review ranks in generating summary. Results of the
experiments show that user rank is very useful information
for generating summaries.
Rest of the paper is organized as follows: section 2 dis-
cusses the related work. Section 3 presents the proposed ap-
proach with the background and required definitions. Sec-
tion 4 discusses the implementation of the proposed ap-
proach and results. Section 5 concludes the paper.
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
978-0-7695-3496-1/08 $25.00 © 2008 IEEE
DOI 10.1109/WIIAT.2008.205
707
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
978-0-7695-3496-1/08 $25.00 © 2008 IEEE
DOI 10.1109/WIIAT.2008.205
711