Knowledge-based Approach for Event Extraction from Arabic Tweets Mohammad AL-Smadi Computer Science Department Jordan University of Science and Technology P.O.Box: 3030 Irbid 22110, Jordan Omar Qawasmeh Computer Science Department Jordan University of Science and Technology P.O.Box: 3030 Irbid 22110, Jordan Abstract—Tweets provide a continuous update on current events. However, Tweets are short, personal- ized and noisy, thus raises more challenges for event extraction and representation. Extracting events out of Arabic tweets is a new research domain where few examples – if any – of previous work can be found. This paper describes a knowledge-based approach for fostering event extraction out of Arabic tweets. The approach uses an unsupervised rule-based technique for event extraction and provides a named entity dis- ambiguation of event related entities (i.e. person, or- ganization, and location). Extracted events and their related entities are populated to the event knowledge base where tagged tweets’ entities are linked to their corresponding entities represented in the knowledge base. Proposed approach was evaluated on a dataset of 1K Arabic tweets covering diferent types of events (i.e. instant events and interval events). Results show that the approach has an accuracy of, 75.9% for event trigger extraction, 87.5% for event time extraction, and 97.7% for event type identifcation. Keywords—Event Extraction; Knowledge base; Entity linking; Named entity disambiguation; Arabic NLP. I. Introduction Social media sites such as Facebook and Twitter pro- vide the most updated events leveraging the social gener- ated content. Hundreds of millions of tweets are provided every day covering a variety of events and news. However, extracting structured information about events from these tweets holds a great promise especially when it comes to visualize events in more appealing way according to users’ interests. Nevertheless, linking entity mentions in tweets with their events to their corresponding entities in the knowledge base fosters many research felds such as knowledge base population, questions answering, and information integration. Many of previous research on event extraction [1]–[4] have focused on document level extraction such as News articles and Blogs, whereas few examples can be found on event extraction from noisy text such as tweets [5]–[10]. However, research targeting event extraction out of Arabic text is limited [11]–[13] and to the best of our knowledge there is only one concurent research reported on event extraction out of Arabic tweets [14]. In general, extracting information from noisy text such as social media posts is challenging. Such posts are disor- ganized and require automated approaches of information extraction and categorizing. For instance, tweets are short and self-contained which make them lack useful discourse information such as contextual information. According to [10], Twitter holds a set of challenges when it comes to event extractions such as: (a) tweets are personalized and mainly hold information about owner daily activities that of interest for their close social network only. (b) Tweets are short and self-contained and usually lack information about their context which causes NLP tools to perform poorly. On the other hand, such challenges hold great promises to enhance and adapt state-of-the-art NLP tools accordingly. (c) Twitter users informally contribute to a variety of topics and domains thus complex to categorize. With the advances of Semantic Web and the so-called Web 3.0 folksonomy-based social environments, interoper- ability of knowledge management is a key challenge where semantics play an important role in facing it [15]. However, this cannot be achieved without bridging Web data with knowledge bases through linking named entity mentions appearing in Web material with their corresponding enti- ties in a knowledge base [16]. Entity linking plays a major role when it comes to populate information to the knowledge base, or integrat- ing extracted information from the Web. Adding newly extracted information to the knowledge base requires an entity linking step between entity mentions (in the text) and their corresponding entities in the knowledge base [17], [18]. However, non of the events extraction related work has focused on the events entities disambiguation or linking. In this research, an unsupervised approach for event extraction out of Arabic tweets is discussed. The approach tags the event expression and the related entities and link them to the knowledge base entities and events. To the best of our knowledge there is no research that links events entity mentions to the Linked Open Data (LOD) as part of the event extraction process. This research links events’ entity mentions (i.e. Person, Location, and Organization) to their corresponding entities in Wikipedia or DBpedia. This process is handled through an ontology based knowledge base that has been designed to represent event entities and link them to LOD. Moreover, newly extracted events (not available in the knowledge base) are (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 7, No. 6, 2016 483 | P a g e www.ijacsa.thesai.org