International Journal of Electrical and Computer Engineering (IJECE) Vol. 6, No. 4, August 2016, pp. 1897~1906 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i4.10403 1897 Journal homepage: http://iaesjournal.com/online/index.php/IJECE Inferring Student's Chat Topic in Colloquial Arabic Text Using Semantic Representation Faisal T. Khamayseh Department of Information Technology and Computer Engineering Palestine Polytechnic University, Palestine Article Info ABSTRACT Article history: Received Mar 6, 2016 Revised May 22, 2016 Accepted Jun 6, 2016 Since the colloquial Arabic is now widespread it is required to describe the collection and classification of a multi-dialectal corpus of Arabic. Nowadays, colloquial multi-dialectal comes in almost country based forms such as Egyptian, Iraqi, Levantine, Tunisian, etc. This paper discusses a new method for analyzing the conversation of the educational chat room using Corpus for Palestinian Arabic and Stanford Tagger tool. This method represents the key words using semantic net-like representation to obtain the main subjects of the conversation. The main subject of the chat is obtained using the proposed method which achieves a high accuracy. Using Arabic Corpus, Stanford Tagger and percentage of keywords will assure more accuracy. The study also examines the effect of pivot-words distribution based on occurrences and betweeness values of the pivots throughout the text. This study examines some of the characteristics of the texts written in colloquial Arabic dialect and analysis of the free expressive Arabic statements. The results show that the core subject of the chat can be determined by combining both the occurrences and the distribution of the word through the conversation. Keyword: Arabic chat Arabic corpora Colloquial analysis Palestine arabic corpus Semantic net Copyright © 2016 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Faisal T. Khamayseh, Department of Computer Science, College of Information Technology and Computer Engineering, Palestine Polytechnic University, Department of Computer Science, Hebron. P.O. Box 198. Palestine. Email: faisal@ppu.edu.ps 1. INTRODUCTION Social networking and social media platforms increase rapidly in types and in the huge number of users. They increase in their usage and their huge number of documents such as Skype, WhatsApp, Twitter, Facebook, Viber, IRC, Blogs, Myspace, just to mention a few. Each of these networks provides chat platform for the large number of users. Some platforms exist to serve some specific scopes such as studies and research, while others are shared with the followers on various social media platforms such as open conversation rooms. Specific groups benefit from open platforms to form their closed social network, and others may benefit more from specific configured platforms such as LMS e-classes on Moodle, Illuminate, etc. Nowadays conversation on social media skipped the standard grammatical rules in almost all languages. As in most of the current languages, Arabic language has two forms; the standard and the colloquial. The standard form is subject to the firm rules that syntactically cover all forms of written and spoken statements. Colloquial Arabic is widely used as spoken language and lately is being widely used as written language especially in mobile messaging and web social media. Some recent attempts focus on analyzing the rule-free text and building some rules (rooting). A considerable work done by [1]-[4] in developing Arabic Ontology to define the formal specification of the concepts of Arabic words related to Palestinian spoken and written conversations. People may use their social colloquial text while chatting on