A Technique for Summarizing Web Reviews Alpana Dubey Siemens Corporate Technology Bangalore, INDIA alpanad@ieee.org Atul Kumar Accenture Technology Labs Bangalore, INDIA atulk@ieee.org Abstract We propose a technique for summarizing Web reviews. Information summarization has become an important prob- lem in the current content saturated world. One such ex- ample is the World Wide Web which provides a platform to publish and evaluate information. This collaborative na- ture of the Web has enabled users to write their opinion on certain topics and also evaluate others’ opinions by assign- ing ranks. In this paper we show that the above aspect of Web can be utilized to generate more useful summary. We consider the problem of generating summary from the Web reviews and the rank (usefulness) assigned to these reviews by other users. We study the usefulness of user ranks in the summarization task. Based on the study, we propose a tech- nique which takes ranked reviews as input and generates a summary. We experiment with different variations of the proposed technique and evaluate them based on different criteria. 1. Introduction The collaborative nature of the Web has created in- formation overload due to a lot of redundant and use- less information. One such instance is user reviews on topics such as movies, hotels, attractions, etc. published on web portals (some examples are http://www.imdb.com, http://www.tripadvisor.com). These sites also provide facil- ity for ranking reviews. The e-commerce sites for selling and buying consumer products, such as amazon.com, also provide customer reviews and the usefulness rank assigned to these reviews by other customers. A survey done on hotel and restaurant industry in 2007 has concluded that 80% of the UK consumers research online before booking a hotel and half of them refrained from booking a hotel as a di- rect result of a negative review on portals [1]. Reviews in such portals get accumulated over a period of time and often have a lot of redundant information, leading to the problem of information overload. Hence, often a small sample of reviews is browsed by users to get a general idea about a topic. A small sample does not reflect the actual opinion of the mass (those who visit the portal). Hence, there is a need for a summarization system that takes into account all the reviews and user provided ranks to generate useful sum- mary. In this paper we propose a technique to summarize web reviews. A review summarization system can be seen as a doc- ument summarization system where a review is consid- ered as a document. Automatic document summarization is the process of restating essential idea of document or pas- sage [9]. Most of the classical document summarization techniques attempt at addressing summarization of a single document or a passage [2, 3, 4, 9]. There are number of attempts which use more than one documents to create a summary [5, 8, 10, 12, 13]. Two main approaches are used for document summarization: a) extraction based: in which some of the sentences are extracted from documents and copied in summary [5], and b) abstraction based: where, besides sentence extraction, techniques also involves para- phrasing of selected sentences [11]. In both of the tech- niques, sentences are extracted based on certain features. The above techniques are not quite suitable for summariz- ing web based review because web based reviews often have an additional information called user provided ranks which are not considered while generating summaries. User pro- vided rank measures how many users found a review useful. We propose to use this additional information as a feature to generate more useful summary. We first study different aspects of web based reviews and correlate those aspects with their ranks in order to asses usefulness of ranks. Later, we experiment with various combinations of features along with review ranks in generating summary. Results of the experiments show that user rank is very useful information for generating summaries. Rest of the paper is organized as follows: section 2 dis- cusses the related work. Section 3 presents the proposed ap- proach with the background and required definitions. Sec- tion 4 discusses the implementation of the proposed ap- proach and results. Section 5 concludes the paper. 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology 978-0-7695-3496-1/08 $25.00 © 2008 IEEE DOI 10.1109/WIIAT.2008.205 707 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology 978-0-7695-3496-1/08 $25.00 © 2008 IEEE DOI 10.1109/WIIAT.2008.205 711