IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 17, NO. 12, DECEMBER 2015 2281 Word-of-Mouth Understanding: Entity-Centric Multimodal Aspect-Opinion Mining in Social Media Quan Fang, Changsheng Xu, Fellow, IEEE, Jitao Sang, M. Shamim Hossain, Senior Member, IEEE, and Ghulam Muhammad, Member, IEEE Abstract—Most existing approaches on aspect-opinion mining focus on the text domain and cannot be applied to social media where the aspects are essentially multimodal and the opinions depend on the specific aspects. To address the problem of multimodal aspect-opinion mining for entities by leveraging multiple cross-collection sources in social media, in this paper we propose a multimodal aspect-opinion model (mmAOM) considering both user-generated photos and textual documents to simultaneously capture correlations between textual and visual modalities, as well as associations between aspects and opinions. By identifying the aspects and the corresponding opinions related to entities, we apply the mmAOM to entity association visualization and multimodal aspect-opinion retrieval. We have conducted extensive experiments on real-world datasets of entities including Flickr photos, Tripadvisor reviews, and news articles. Qualitative and quantitative evaluation results have validated the effectiveness of the multimodal aspect-opinion mining model, and demonstrated the utility of the derived aspects and opinions from mmAOM in applications of entity association visualization and aspect-opinion retrieval. Index Terms—Application, knowledge mining, probabilistic topic model. I. INTRODUCTION T HE prevalence of social media services has reshaped the way in which people access and share information. People are now able to conveniently generate and consume rich so- cial multimedia content, including multimedia documents, so- cial links, etc. As a result, the social media platforms have gath- Manuscript received May 08, 2015; revised August 08, 2015; accepted Oc- tober 03, 2015. Date of publication October 14, 2015; date of current version November 13, 2015. This work was supported in part by the National Basic Re- search Program of China under Grant 2012CB316304, in part by the National Natural Science Foundation of China under Grant 61225009, Grant 61432019, Grant 61332016, and Grant 61303176, in part by the Beijing Natural Science Foundation under Grant 4131004, and in part by the Deanship of Scientific Research, King Saud University, Riyadh, Saudi Arabia, under research group project RGP-1436-023. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Vasileios Mezaris. Q. Fang, C. Xu, and J. Sang are with the National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail: qfang@nlpr.ia.ac.cn; csxu@nlpr.ia.ac.cn; jtsang@nlpr.ia.ac.cn). M. S. Hossain is with the Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia (e-mail: mshossain@ksu.edu.sa). G. Muhammad is with the Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia (e-mail: ghulam@ksu.edu.sa). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2015.2491019 Fig. 1. Toy example of multimodal aspects and opinions for “Beijing.” ered a huge repository of people’s opinions and sentiments to- wards a vast spectrum of entities, 1 such as brands, geo-loca- tions, and celebrities. For example, people could share photos in Flickr 2 and deliver reviews in TripAdvisor 3 about Beijing after travel. Mining opinions [1] from the vast amounts of user-gener- ated content is an important task in knowledge mining as it aims at discovering collective and subjective information, which may be more beneficial to users than factual information in many ap- plications such as human decision making [1], brand monitoring [2], and collective information retrieval [3]. In real-world scenarios, an entity inherently has multiple as- pects that describe certain characteristics or attributes of the en- tity. Fig. 1 illustrates a toy example for “Beijing”. Three kinds of aspects are presented: landmarks, haze, and economy. People express different opinions towards different aspects of an en- tity. The sentiment orientations of aspects depend on specific semantic aspects. For example, people hold positive views with opinion words like “great, famous” on the famous scenic spots while criticizing poor air quality of Beijing with opinion words like “bad, harmful”. This suggests a fine-grained opinion anal- ysis at the aspect-level, instead of the general opinion analysis from all aspects regarding the entity, which could help people consume information in a more efficient and effective way. So far, most existing work of aspect-opinion mining con- centrates on the textual content [1], such as aspect-based opinion mining for products from online customers reviews [4], [5], topic-oriented opinion summarization in news articles and tweets [6], [7]. Few efforts have been devoted to mining 1 Please note that entity here refers to any concept that is well defined and described in a Wikipedia page, such as persons, products, geo-locations, lan- damraks, etc. 2 [Online]. Available: http://www.Flickr.com 3 [Online]. Available: http://www.tripadvisor.com 1520-9210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.