International Journal of Electrical and Computer Engineering (IJECE)
Vol. 6, No. 4, August 2016, pp. 1897~1906
ISSN: 2088-8708, DOI: 10.11591/ijece.v6i4.10403 1897
Journal homepage: http://iaesjournal.com/online/index.php/IJECE
Inferring Student's Chat Topic in Colloquial Arabic Text Using
Semantic Representation
Faisal T. Khamayseh
Department of Information Technology and Computer Engineering
Palestine Polytechnic University, Palestine
Article Info ABSTRACT
Article history:
Received Mar 6, 2016
Revised May 22, 2016
Accepted Jun 6, 2016
Since the colloquial Arabic is now widespread it is required to describe the
collection and classification of a multi-dialectal corpus of Arabic. Nowadays,
colloquial multi-dialectal comes in almost country based forms such as
Egyptian, Iraqi, Levantine, Tunisian, etc. This paper discusses a new method
for analyzing the conversation of the educational chat room using Corpus for
Palestinian Arabic and Stanford Tagger tool. This method represents the key
words using semantic net-like representation to obtain the main subjects of
the conversation. The main subject of the chat is obtained using the proposed
method which achieves a high accuracy. Using Arabic Corpus, Stanford
Tagger and percentage of keywords will assure more accuracy. The study
also examines the effect of pivot-words distribution based on occurrences
and betweeness values of the pivots throughout the text. This study examines
some of the characteristics of the texts written in colloquial Arabic dialect
and analysis of the free expressive Arabic statements. The results show that
the core subject of the chat can be determined by combining both the
occurrences and the distribution of the word through the conversation.
Keyword:
Arabic chat
Arabic corpora
Colloquial analysis
Palestine arabic corpus
Semantic net
Copyright © 2016 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Faisal T. Khamayseh,
Department of Computer Science, College of Information Technology and Computer Engineering,
Palestine Polytechnic University,
Department of Computer Science, Hebron. P.O. Box 198. Palestine.
Email: faisal@ppu.edu.ps
1. INTRODUCTION
Social networking and social media platforms increase rapidly in types and in the huge number of
users. They increase in their usage and their huge number of documents such as Skype, WhatsApp, Twitter,
Facebook, Viber, IRC, Blogs, Myspace, just to mention a few. Each of these networks provides chat platform
for the large number of users. Some platforms exist to serve some specific scopes such as studies and
research, while others are shared with the followers on various social media platforms such as open
conversation rooms. Specific groups benefit from open platforms to form their closed social network, and
others may benefit more from specific configured platforms such as LMS e-classes on Moodle, Illuminate,
etc.
Nowadays conversation on social media skipped the standard grammatical rules in almost all
languages. As in most of the current languages, Arabic language has two forms; the standard and the
colloquial. The standard form is subject to the firm rules that syntactically cover all forms of written and
spoken statements. Colloquial Arabic is widely used as spoken language and lately is being widely used as
written language especially in mobile messaging and web social media. Some recent attempts focus on
analyzing the rule-free text and building some rules (rooting). A considerable work done by [1]-[4] in
developing Arabic Ontology to define the formal specification of the concepts of Arabic words related to
Palestinian spoken and written conversations. People may use their social colloquial text while chatting on