Graph-based Informative-Sentence Selection for Opinion Summarization Linhong Zhu Information Sciences Institute University of Southern California linhong@isi.edu Sheng Gao, Sinno Jialin Pan, Haizhou Li Institute for Infocomm Research A-STAR, Singapore {gaosheng,jspan,hli}@i2r.a-star.edu.sg Dingxiong Deng and Cyrus Shahabi Computer Science Department University of Southern California {dingxiong.deng, shahabi}@usc.edu Abstract—In this paper, we propose a new framework for opinion summarization based on sentence selection. Our goal is to assist users to get helpful opinion suggestions from reviews by only reading a short summary with few informative sentences, where the quality of summary is evaluated in terms of both aspect coverage and viewpoints preservation. More specifically, we formulate the informative-sentence selection problem in opinion summarization as a community-leader detection problem, where a community consists of a cluster of sentences towards the same aspect of an entity. The detected leaders of the communities can be considered as the most informative sentences of the corresponding aspect, while informativeness of a sentence is defined by its informativeness within both its community and the document it belongs to. Review data from six product domains from Amazon.com are used to verify the effectiveness of our method for opinion summarization. I. I NTRODUCTION Nowadays, the flourish of online opinions poses challenges in digesting all the massive information. For instance, in Ama- zon, some popular products may get hundreds even thousands of reviews, which makes it hard for potential customers to go through all the reviews to make an informed decision on purchase. Furthermore, some reviews are noninformative and may mislead customers. To address these issues, most online portals provide two services: aspect summary and review helpfulness rating. Accordingly, various amount of research has been conducted in aspect-based opinion summarization [1], [2], [3], [4], [5], [6] and review quality evaluation [7], [8], [9], [10]. Aspect-based opinion summarization aims to identify as- pects of a given entity, and summarize the overall sentiment orientation towards each aspect. This kind of summarization is useful for consumers. However, it may lose some detailed information, which is also important for consumers to make decisions. For example, travelers may prefer to get informa- tion on suggested traveling routines in detail instead of only summarizing which tourist spots are good or bad. In some scenarios, opinion summarization by selecting informative reviews is more desirable. Some approaches such as [11], [12], [13], [14] have been proposed to this task. A common idea behind them is to predict a score for each review to estimate its helpfulness, and select the top ones as informative reviews. However, most of them do not take the following two issues into consideration: 1) redundancy, the reviews with highest scores on helpfulness may contain redundant information; 2) coverage, the reviews with highest scores on helpfulness may not cover all aspects of the entity, and some important aspects may be missing. In this paper, we propose a new opinion summarization framework, named sentence-based opinion summarization, to address these issues. Given a set of reviews for a specific entity, our goal is to generate summaries by extracting a small number of sentences from the reviews of a specific entity, such that the coverage of the entity aspects and the polarity distribution of the aspects can be preserved as much as possible. Note that our proposed framework is not to resume aspect-based opinion summarization approaches. In contrast, since the selected informative sentences preserve the coverage and sentiment polarity distribution of the entity aspects, aspect- based opinion summarization techniques can be post-applied to the selected sentences to generate summarization towards each aspect without information loss. Based on the new opinion summarization framework, we propose a graph-based method to identify informative sentences. More specifically, we formulate the informative- sentence selection problem in opinion summarization as a community leader detection problem in social computing. We first construct a sentence-graph by adding an edge between a pair of sentences if they are similar in word distribution and sentiment polarity distribution. Here each node of the graph representing a sentence can be considered as a user in social computing. Finally, we propose an algorithm to detect leaders and communities simultaneously on the sentence-graph, where a community consists of a set of sentences towards the same aspect and the leaders of the community can be considered as the most informative sentences. In all, we summarize our contributions of this research: We have introduced a new sentence-based summa- rization framework which generates summaries that preserve aspect coverage as much as possible and are representative of aspect-level viewpoints. We have bridged across the area of sentiment analysis and the area of social computing by applying com- munity and leader detection algorithm to solve the informative sentences selection problem. We have presented an effective leader-community de- tection algorithm based on both the structure of graph and the context information of review documents. 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 408 ASONAM'13, August 25-29, 2013, Niagara, Ontario, CAN Copyright 2013 ACM 978-1-4503-2240-9 /13/08 ...$15.00