Isanette: A Common and Common Sense Knowledge Base for Opinion Mining Erik Cambria * , Yangqiu Song , Haixun Wang and Amir Hussain * Temasek Laboratories, National University of Singapore, cambria@nus.edu.sg Microsoft Research Asia, {yangqiu.song, haixun.wang}@microsoft.com COSIPRA Lab, University of Stirling, ahu@cs.stir.ac.uk Abstract—The ability to understand natural language text is far from being emulated in machines. One of the main hurdles to overcome is that computers lack both the common and the common sense knowledge humans normally acquire during the formative years of their lives. If we want machines to really understand natural language, we need to provide them with this kind of knowledge rather than relying on the valence of keywords and word co-occurrence frequencies. In this work, we blend the largest existing taxonomy of common knowledge with a natural-language-based semantic network of common sense knowledge, and use multi-dimensionality reduction techniques on the resulting knowledge base for opinion mining and sentiment analysis. Keywords-Knowledge-Based Systems; Semantic Networks; Natural Language Processing; Opinion Mining. I. I NTRODUCTION The ever-growing amount of available information in the Social Web fostered the proliferation of many business and research activities around the relatively new fields of opinion mining and sentiment analysis. The automatic analysis of user generated contents such as online news, reviews, blogs and tweets, in fact, can be extremely valuable for tasks such as mass opinion estimation, corporate reputation mea- surement, political orientation categorization, stock market prediction, customer preference and public opinion study. Distilling useful information from such unstructured data, however, is a multi-faceted and multi-disciplinary problem as opinions and sentiments can be expressed in a multitude of forms and combinations in which it is extremely difficult to find any kind of regular behavior. A lot of conceptual rules, in fact, govern the expression of opinions and senti- ments and there exist even more clues that can convey these concepts from realization to verbalization in human mind. Most of current approaches to opinion mining and sen- timent analysis rely on rather unambiguous affective key- words extracted from an existing knowledge base (e.g., WordNet [1]) or from a purpose-built lexicon based on a domain-dependent corpus [2], [3], [4], [5]. Such approaches are still far from being able to perfectly extract the cognitive and affective information associated with natural language and, hence, often fail to meet the golden standard of human annotators. Especially when dealing with social media, in fact, con- tents are often very diverse and noisy and the use of a limited number of affect words or a domain-dependent training corpus is simply not enough (see Table I). In order to enable computers to intelligently process open-domain textual re- sources, we need to provide them with both the common and common sense knowledge humans normally acquire during the formative years of their lives, as relying just on valence of keywords and word co-occurrence frequencies does not allow a deep understanding of natural language. Common knowledge represents human general knowledge acquired from the world, e.g., “canine distemper is a domestic animal disease”. Common sense knowledge is some obvious thing that people normally know but usually leave unstated, e.g., “cat can hunt mice” and “cat is cute”. It is through the combined use of common and common sense knowledge that we can have a grip on both low and high level concepts in natural language sentences and, hence, effectively communicate with other people without having to continuously ask for definitions and explanations. Common sense knowledge, moreover, enables the propagation of sen- timent from affect words, e.g., ‘happy’ and ‘sad’, to general concepts, e.g., ‘birthday gift’, ‘school graduation’, ‘cancer’ and ‘canine distemper’, which is useful for tasks such as sentiment elicitation and polarity detection. In this work, we blend ProBase [6], the largest existing taxonomy of common knowledge, with ConceptNet [7], a natural-language-based semantic network of common sense knowledge, and use multi-dimensionality reduction techniques on the resulting knowledge base for opinion mining and sentiment analysis. The structure of the paper is as follows: Section II presents related works in the field of opinion mining, Section III discusses how and why blending common and common sense knowledge is important for the development of domain independent sentiment analysis system, Section IV explains in detail the strategies adopted to build the common and common sense knowledge base, Section V illustrates the dimensionality reduction techniques employed to perform reasoning on the newly built knowledge base, Section VI presents the development of an opinion mining engine and its evaluation, Section VII, eventually, comprises concluding remarks and future directions. 2011 11th IEEE International Conference on Data Mining Workshops 978-0-7695-4409-0/11 $26.00 © 2011 IEEE DOI 10.1109/ICDMW.2011.106 315