Isanette: A Common and Common Sense
Knowledge Base for Opinion Mining
Erik Cambria
*
, Yangqiu Song
†
, Haixun Wang
†
and Amir Hussain
‡
*
Temasek Laboratories, National University of Singapore, cambria@nus.edu.sg
†
Microsoft Research Asia, {yangqiu.song, haixun.wang}@microsoft.com
‡
COSIPRA Lab, University of Stirling, ahu@cs.stir.ac.uk
Abstract—The ability to understand natural language text is
far from being emulated in machines. One of the main hurdles
to overcome is that computers lack both the common and the
common sense knowledge humans normally acquire during the
formative years of their lives. If we want machines to really
understand natural language, we need to provide them with
this kind of knowledge rather than relying on the valence of
keywords and word co-occurrence frequencies. In this work, we
blend the largest existing taxonomy of common knowledge with
a natural-language-based semantic network of common sense
knowledge, and use multi-dimensionality reduction techniques
on the resulting knowledge base for opinion mining and
sentiment analysis.
Keywords-Knowledge-Based Systems; Semantic Networks;
Natural Language Processing; Opinion Mining.
I. I NTRODUCTION
The ever-growing amount of available information in the
Social Web fostered the proliferation of many business and
research activities around the relatively new fields of opinion
mining and sentiment analysis. The automatic analysis of
user generated contents such as online news, reviews, blogs
and tweets, in fact, can be extremely valuable for tasks
such as mass opinion estimation, corporate reputation mea-
surement, political orientation categorization, stock market
prediction, customer preference and public opinion study.
Distilling useful information from such unstructured data,
however, is a multi-faceted and multi-disciplinary problem
as opinions and sentiments can be expressed in a multitude
of forms and combinations in which it is extremely difficult
to find any kind of regular behavior. A lot of conceptual
rules, in fact, govern the expression of opinions and senti-
ments and there exist even more clues that can convey these
concepts from realization to verbalization in human mind.
Most of current approaches to opinion mining and sen-
timent analysis rely on rather unambiguous affective key-
words extracted from an existing knowledge base (e.g.,
WordNet [1]) or from a purpose-built lexicon based on a
domain-dependent corpus [2], [3], [4], [5]. Such approaches
are still far from being able to perfectly extract the cognitive
and affective information associated with natural language
and, hence, often fail to meet the golden standard of human
annotators.
Especially when dealing with social media, in fact, con-
tents are often very diverse and noisy and the use of a limited
number of affect words or a domain-dependent training
corpus is simply not enough (see Table I). In order to enable
computers to intelligently process open-domain textual re-
sources, we need to provide them with both the common and
common sense knowledge humans normally acquire during
the formative years of their lives, as relying just on valence
of keywords and word co-occurrence frequencies does not
allow a deep understanding of natural language. Common
knowledge represents human general knowledge acquired
from the world, e.g., “canine distemper is a domestic animal
disease”. Common sense knowledge is some obvious thing
that people normally know but usually leave unstated, e.g.,
“cat can hunt mice” and “cat is cute”.
It is through the combined use of common and common
sense knowledge that we can have a grip on both low and
high level concepts in natural language sentences and, hence,
effectively communicate with other people without having to
continuously ask for definitions and explanations. Common
sense knowledge, moreover, enables the propagation of sen-
timent from affect words, e.g., ‘happy’ and ‘sad’, to general
concepts, e.g., ‘birthday gift’, ‘school graduation’, ‘cancer’
and ‘canine distemper’, which is useful for tasks such as
sentiment elicitation and polarity detection. In this work, we
blend ProBase [6], the largest existing taxonomy of common
knowledge, with ConceptNet [7], a natural-language-based
semantic network of common sense knowledge, and use
multi-dimensionality reduction techniques on the resulting
knowledge base for opinion mining and sentiment analysis.
The structure of the paper is as follows: Section II presents
related works in the field of opinion mining, Section III
discusses how and why blending common and common
sense knowledge is important for the development of domain
independent sentiment analysis system, Section IV explains
in detail the strategies adopted to build the common and
common sense knowledge base, Section V illustrates the
dimensionality reduction techniques employed to perform
reasoning on the newly built knowledge base, Section VI
presents the development of an opinion mining engine and
its evaluation, Section VII, eventually, comprises concluding
remarks and future directions.
2011 11th IEEE International Conference on Data Mining Workshops
978-0-7695-4409-0/11 $26.00 © 2011 IEEE
DOI 10.1109/ICDMW.2011.106
315