Research Article
Exploring the Performance of Tagging for the Classical and
the Modern Standard Arabic
Dia AbuZeina
1
and Taqieddin Mostafa Abdalbaset
2
1
College of Information Technology and Computer Engineering, Palestine Polytechnic University, Hebron, State of Palestine
2
Palestine Technical University–Kadoorie, AL-Aroub Branch, Hebron, State of Palestine
Correspondence should be addressed to Dia AbuZeina; abuzeina@ppu.edu
Received 7 August 2018; Accepted 23 October 2018; Published 23 January 2019
Guest Editor: Omar Abu Arqub
Copyright © 2019 Dia AbuZeina and Taqieddin Mostafa Abdalbaset. Tis is an open access article distributed under the Creative
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
Te part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications. In fact, the
PoS taggers contribute as a preprocessing step in various NLP tasks, such as syntactic parsing, information extraction, machine
translation, and speech synthesis. In this paper, we examine the performance of a modern standard Arabic (MSA) based tagger
for the classical (i.e., traditional or historical) Arabic. In this work, we employed the Stanford Arabic model tagger to evaluate
the imperative verbs in the Holy Quran. In fact, the Stanford tagger contains 29 tags; however, this work experimentally evaluates
just one that is the VB ≡ imperative verb. Te testing set contains 741 imperative verbs, which appear in 1,848 positions in the Holy
Quran. Despite the previously reported accuracy of the Arabic model of the Stanford tagger, which is 96.26% for all tags and 80.14%
for unknown words, the experimental results show that this accuracy is only 7.28% for the imperative verbs. Tis result promotes
the need for further research to expose why the tagging is severely inaccurate for classical Arabic. Te performance decline might
be an indication of the necessity to distinguish between training data for both classical and MSA Arabic for NLP tasks.
1. Introduction
Te part of speech (PoS) tagging, also known as word-
category disambiguation, is a process to determine the tag
of each word in a given input text. Te tagging process
uses the context to label words using syntactic tags, such as
noun, adjective, verb, or preposition that are also known as
parts of speech, word-classes, grammatical categories, lexical
class markers, or syntactic categories. Tagging is performed
either manually by linguistic experts or automatically by
machine learning algorithms; intuitively, this work considers
the computational track. Word tags are mainly used to
describe the words and their jobs according to the context for
further processing. Tat is, each word has a particular role
based on the position and the adjacent words in the sentence.
Te tagset is a predefned list that generally includes some
symbols, such as nouns, pronouns, adjectives, verbs, adverbs,
propositions, conjunctions, and the defnite and indefnite
articles (sometimes called “determiners”). Of course, the
tagset is prepared by the language linguistic scholars to
describe the language’s membership or word family. Te size
of the tagset is variable and depends on the requirements or
the capacity of developing applications. In any case, the tagset
should best ft and efciently serve the intended purposes.
Hence, there is no predefned tagset for all languages and
thereisnostandard(i.e.,unique)tagsetforacertainlanguage.
Rather, it is a debatable matter.
Te PoS is increasingly becoming a vital factor in the
related natural language processing (NLP) applications. In
fact, creating knowledge base resources (e.g., tag relation-
ships) is one objective of the PoS tagging that can be later
used in other NLP tools. In fact, PoS tagging has many roles
in the feld of NLP as a basic prepossessing step. For instance,
some of NLP PoS tagging based applications include syntactic
parsing, information extraction, machine translation, speech
synthesis, and named entity recognition (NER). Tis work
is aimed at exploring the performance of the PoS for the
classical Arabic using a modern standard Arabic (MSA)
tagger that is the Stanford tagger [1]. Since it is difcult to
evaluate the Stanford tagger for all tags (29 tags) as it requires
Hindawi
Advances in Fuzzy Systems
Volume 2019, Article ID 6254649, 10 pages
https://doi.org/10.1155/2019/6254649