IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 17, NO. 12, DECEMBER 2015 2281
Word-of-Mouth Understanding: Entity-Centric
Multimodal Aspect-Opinion Mining in
Social Media
Quan Fang, Changsheng Xu, Fellow, IEEE, Jitao Sang, M. Shamim Hossain, Senior Member, IEEE, and
Ghulam Muhammad, Member, IEEE
Abstract—Most existing approaches on aspect-opinion mining
focus on the text domain and cannot be applied to social
media where the aspects are essentially multimodal and the
opinions depend on the specific aspects. To address the problem
of multimodal aspect-opinion mining for entities by leveraging
multiple cross-collection sources in social media, in this paper
we propose a multimodal aspect-opinion model (mmAOM)
considering both user-generated photos and textual documents to
simultaneously capture correlations between textual and visual
modalities, as well as associations between aspects and opinions. By
identifying the aspects and the corresponding opinions related to
entities, we apply the mmAOM to entity association visualization
and multimodal aspect-opinion retrieval. We have conducted
extensive experiments on real-world datasets of entities including
Flickr photos, Tripadvisor reviews, and news articles. Qualitative
and quantitative evaluation results have validated the effectiveness
of the multimodal aspect-opinion mining model, and demonstrated
the utility of the derived aspects and opinions from mmAOM in
applications of entity association visualization and aspect-opinion
retrieval.
Index Terms—Application, knowledge mining, probabilistic
topic model.
I. INTRODUCTION
T
HE prevalence of social media services has reshaped the
way in which people access and share information. People
are now able to conveniently generate and consume rich so-
cial multimedia content, including multimedia documents, so-
cial links, etc. As a result, the social media platforms have gath-
Manuscript received May 08, 2015; revised August 08, 2015; accepted Oc-
tober 03, 2015. Date of publication October 14, 2015; date of current version
November 13, 2015. This work was supported in part by the National Basic Re-
search Program of China under Grant 2012CB316304, in part by the National
Natural Science Foundation of China under Grant 61225009, Grant 61432019,
Grant 61332016, and Grant 61303176, in part by the Beijing Natural Science
Foundation under Grant 4131004, and in part by the Deanship of Scientific
Research, King Saud University, Riyadh, Saudi Arabia, under research group
project RGP-1436-023. The associate editor coordinating the review of this
manuscript and approving it for publication was Dr. Vasileios Mezaris.
Q. Fang, C. Xu, and J. Sang are with the National Lab of Pattern Recognition,
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
(e-mail: qfang@nlpr.ia.ac.cn; csxu@nlpr.ia.ac.cn; jtsang@nlpr.ia.ac.cn).
M. S. Hossain is with the Department of Software Engineering, College of
Computer and Information Sciences, King Saud University, Riyadh 11543,
Saudi Arabia (e-mail: mshossain@ksu.edu.sa).
G. Muhammad is with the Department of Computer Engineering, College
of Computer and Information Sciences, King Saud University, Riyadh 11543,
Saudi Arabia (e-mail: ghulam@ksu.edu.sa).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMM.2015.2491019
Fig. 1. Toy example of multimodal aspects and opinions for “Beijing.”
ered a huge repository of people’s opinions and sentiments to-
wards a vast spectrum of entities,
1
such as brands, geo-loca-
tions, and celebrities. For example, people could share photos in
Flickr
2
and deliver reviews in TripAdvisor
3
about Beijing after
travel. Mining opinions [1] from the vast amounts of user-gener-
ated content is an important task in knowledge mining as it aims
at discovering collective and subjective information, which may
be more beneficial to users than factual information in many ap-
plications such as human decision making [1], brand monitoring
[2], and collective information retrieval [3].
In real-world scenarios, an entity inherently has multiple as-
pects that describe certain characteristics or attributes of the en-
tity. Fig. 1 illustrates a toy example for “Beijing”. Three kinds
of aspects are presented: landmarks, haze, and economy. People
express different opinions towards different aspects of an en-
tity. The sentiment orientations of aspects depend on specific
semantic aspects. For example, people hold positive views with
opinion words like “great, famous” on the famous scenic spots
while criticizing poor air quality of Beijing with opinion words
like “bad, harmful”. This suggests a fine-grained opinion anal-
ysis at the aspect-level, instead of the general opinion analysis
from all aspects regarding the entity, which could help people
consume information in a more efficient and effective way.
So far, most existing work of aspect-opinion mining con-
centrates on the textual content [1], such as aspect-based
opinion mining for products from online customers reviews
[4], [5], topic-oriented opinion summarization in news articles
and tweets [6], [7]. Few efforts have been devoted to mining
1
Please note that entity here refers to any concept that is well defined and
described in a Wikipedia page, such as persons, products, geo-locations, lan-
damraks, etc.
2
[Online]. Available: http://www.Flickr.com
3
[Online]. Available: http://www.tripadvisor.com
1520-9210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.