STUDIA UNIV. BABES ¸–BOLYAI, INFORMATICA, Volume LVIII, Number 2, 2013 ON THE STUDY OF REDUCING THE LEXICAL DIFFERENCES BETWEEN SOCIAL KNOWLEDGE SOURCES AND TWITTER FOR TOPIC CLASSIFICATION ANDREA VARGA (1) , AMPARO CANO (2) , FABIO CIRAVEGNA (1) , AND YULAN HE (2) Abstract. State-of-the-art approaches on cross-source topic classification (TC) of Tweets rely on building a supervised machine learning classifier on Social Knowledge Sources (KSs) (such as DBpedia and Freebase) for detecting topics of Tweets. These approaches typically employ various lexical, syntactical or semantic features derived from the content of these documents or Tweets, often ignoring other indicators to external data sources (e.g. URL), which can provide additional background information for cross-source TC. In order to address these limitations, in this paper we analyse various such indicators, and evaluate their impact on cross-source TC. Our experiments, evaluating the proposed TC in the context of Vio- lence Detection (VD) and Emergency Response (ER) tasks, indicate that the Twitter specific information (indicators) contain valuable information; and thus incorporating them into a TC can improve the performance over previous approaches not considering them. 1. Introduction Topic classification (TC) of Tweets has only started to gain attention very recently. It provides an efficient and effective way of organising and searching Tweets, which can then be useful for various tasks e.g. relating topics to events (such as an Airplane crash, Egypt revolution, Mexican drug war, etc.) ([11]), summarisation ([12]), question answering ([6]), content filtering ([16]) etc. State-of-the-art approaches on cross-source topic classification (TC) of Tweets rely on building a supervised machine learning classifier on Social Knowledge Received by the editors: April 15, 2013. 2010 Mathematics Subject Classification. 68T50, 03H65. 1998 CR Categories and Descriptors. I.2.7 [Artificial Intelligence]: Natural Language Processing – Text Analysis. Key words and phrases. cross-source topic classification, linked knowledge sources, vio- lence detection, emergency response. This paper has been presented at the International Conference KEPT2013: Knowledge Engineering Principles and Techniques, organized by Babe¸ s-Bolyai University, Cluj-Napoca, July 5-7 2013. 53