I.J. Intelligent Systems and Applications, 2017, 8, 11-24
Published Online August 2017 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijisa.2017.08.02
Copyright © 2017 MECS I.J. Intelligent Systems and Applications, 2017, 8, 11-24
Parsing Arabic Nominal Sentences Using Context
Free Grammar and Fundamental Rules of
Classical Grammar
Nabil Ababou and Azzeddine Mazroui
University Mohammed First, Faculty of Sciences, Oujda, Morocco
E-mail: nabilaababou@gmail.com, azze.mazroui@gmail.com
Rachid Belehbib
University Mohammed First, Faculty of Arts and Humanities, Oujda, Morocco
E-mail: racbel59@hotmail.com
Received: 06 March 2017; Accepted: 06 July 2017; Published: 08 August 2017
Abstract—This work falls within the framework of the
Arabic natural language processing. We are interested in
parsing Arabic texts. Existing parsers generate parse trees
that give an idea about the structure of the sentence
without considering the syntactic functions specific to the
Arabic language. Thus, the results are still insufficient in
terms of syntactic information. The system we have
developed in this article takes into consideration all these
syntactic functions. This system begins with a
morphological analysis in the context. Then, it uses a
CFG grammar to extract the phrases and ends by
exploiting the formalism of unification grammar and
traditional grammar to combine these phrases and
generate the final sentence structure.
Index Terms—POS tagger, Parser, Arabic phrase,
grammar, syntax tree, syntactic functions.
I. INTRODUCTION
Parsing is a fundamental step to the design of several
applications in Arabic natural language processing such
as spelling and grammar checker, information retrieval,
automatic generation of sentences, machine translation,
conversion information system and Querying Database
[1,2].
Parsing a sentence is usually a tricky task. It is more
complex with languages whose morphology and syntax is
very rich, as in the case of the Arabic language. This
explains the challenges that face the development of
automatic systems allowing to carry out a syntactic
analysis.
Arabic parsers have been reported in [3,4] All these
initiatives use grammars created manually. Recently,
Arabic Treebank (ATB) was used to improve the
performance of the syntactic analysis since it covers
widely the Arabic language [5].
Similarly, approaches based on statistical treatment
have been developed [6]. However, these analyzers have
adopted techniques used for English and do not take into
account the specificities of the Arabic language. Thus, if
we consider the outputs of the Stanford parser
1
related to
the analysis of the four simple sentences of Table 1, we
notice that we have no information about the subject
(جزذأ اى\Almbtd>
2
\) or the predicate ( اىخجش\Alxbr\) of the
first two sentences of the table. The analyzer does not
distinguish between the words ذا عؼ\sEdA\ (happy) and
قبد\qdm\ (coming), while they play two different
syntactic roles: predicate for the first and circumstantial
phrase ( اىحبه\AlHAl\) for the second. For the last two
examples, the system generates the same tree consisting
of a single phrase despite the difference between them.
Indeed, the third example is a complete sentence
composed of two phrases that are the subject ىذ اى\Alwld\
(the boy) and the predicate جزغ \mbtsm\ (smiling), while
the last example is not a complete sentence but only a
phrase composed of a noun ىذ اىand its adjective جزغ اى
\Almbtsm\ (the smiling).
Table 1. Result the analysis of four examples by the Stanford parser
N Sentence Result
1
ذا عؼىذ قبد اى
\Alwld qAdm sEydA\
(The boy is coming happy)
(ROOT
(S
(NP (DTNN ىذاى))
(ADJP (JJ قبد) (JJ ذاعؼ))))
2
ىذ اىذا عؼ قبد
\Alwld sEydA qAdm \
(The boy is coming happy)
(ROOT
(S
(NP (DTNN ىذاى))
(ADJP (JJ ذاعؼ) (JJ قبد))))
3
ىذ اىجزغ
\Alwld mbtsm\
(The boy is smiling)
(ROOT
(NP (DTNN ىذاى) (DTJJ
جزغ)))
4
جزغىذ اى اى
\Alwld Almbts\
(The smiling boy)
(ROOT
(NP (DTNN ىذاى) (DTJJ
جزغاى)))
Unlike the other parsers, which have adopted
annotations derived from those introduced by English
1
https://nlp.stanford.edu/software/lex-parser.html
2
Buckwalter transliteration http://www.qamus.org/transliteration.htm