INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 3, ISSUE 12, December 2014 ISSN 2277-8616
118
IJSTR©2014
www.ijstr.org
Design And Implementation Of Morphology Based
Spell Checker
Gaddisa Olani Ganfure, Dr. Dida Midekso
Abstract: Introducing texts to word processing tools may result in spelling errors. Hence, text processing application software‘s has spell checkers.
Integrating spell checker into word processors reduces the amount of time and energy spent to find and correct the misspelled word. However, these
tools are not available for Afaan Oromo, Cushitic language family spoken in Ethiopia. In this paper, we describe the design and implementation of a
non-word Afaan Oromo spell checker. The system is designed based on a dictionary look-up with morphological analysis (i.e. morphology based spell
checker). To develop morphology based spell checker, the knowledge of the language morphology is necessarily required. Accordingly, the
morphological properties of Afaan Oromo have been studied. To the best of our knowledge, this work is the first of its kind for Afaan Oromo. The
methodology delineated in the paper can be replicated for other languages showing similar morphology with Afaan Oromo.
Index Terms: Spell checker, non-word error, Error detection, Error correction, Morphology, Morphological Analyzer, Morphological generator, Afaan
Oromo, typographic errors, cognitive errors
————————————————————
1 INTRODUCTION
A spell checker is a tool that enables us to check the
spellings of the words in a text file, validates them i.e.
checks whether they are rightly or wrongly spelled and in
case the spell checker has doubts about the spelling of the
word, suggests possible alternatives. The two core
functionalities provided by a spell checkers are: spelling
error detection and spelling error correction. ‗Error
Detection‘ is to verify the validity of a word in the language
while ‗Error Correction‘ is to suggest corrections for the
misspelled word. Spell checker may be stand-alone
capable of operating on a block of text, or as part of a larger
application, such as a word processor, email client,
electronic dictionary, or search engine [1]. Several
researches have been done for the languages like English,
Arabic, Chinese and few researches have been done for
Amharic language, but none for Afaan Oromo. Afaan
Oromo (when translated it means Oromo Language) is one
of the major African languages that is widely spoken and
used in most parts of Ethiopia and some parts of other
neighbor countries like Kenya and Somalia. Afaan Oromo
belongs to the Lowland East Cushitic sub-family of the Afro-
asiatic super-phylum. Among the Cushitic language families
to which it belongs, Afaan Oromo ranks first by the number
of its speakers [2]. Currently, it is an official language of
Oromiya regional state. Despite of its popularity and its
status as a regional language, Afaan Oromo language
processing is still in its infancy. According to Damerau [3]
and Peterson [4] spelling errors are generally divided into
two types, typographic errors and cognitive errors.
Typographic errors occur when writer knows the correct
spelling of the word but mistypes the word by mistake.
Cognitive errors occur when a writer does not know or has
forgotten the correct spelling of a word. A study by
Damerau reports that 80% of the misspelled words in
English are non-word errors and caused by single error
misspellings [3]. We did a simple study to analyze spelling
error pattern of Afaan Oromo before implementation. For
this purpose, module prepared for teaching Afaan Oromo
courses was selected. We used text analysis data gathering
technique for this purpose. The finding of study depicts the
existence of spelling errors. When analyzed, it was found
that 1,342 words were misspelled. Out of this 1,287 words
were in the category of non-word errors. Though a
comprehensive study is required to come to a clear opinion,
it was enough to realize that non-word error detection is the
first step towards a truly professional spellchecker. The
paper is organized in to the following sections. Section 2
discusses the challenges in building a spell checker for
Afaan Oromo and the work done so far. Section 3
discusses the design of the system. Discussion and results
are discussed in Section 4. Finally the paper ends with
some concluding remarks.
2 Challenges and Related work
As stated in [5] like a number of other African languages,
Afaan Oromo has a very rich morphology. In agglutinative
languages most of the grammatical information is conveyed
through affixes and other structures. Therefore, the
grammatical information of the language is described in
relation to its morphology. As Afaan Oromo is an
agglutinative and morphologically rich language, each root
word can combine with multiple morphemes to generate
huge number of word forms. For the purpose of supporting
such inflectionally rich languages, the structure of each
word has to be identified. Afaan Oromo has compound,
derived and simple nouns, verbs, and adjectives. It also has
first person, second person, and derived pronouns. Nouns
get inflected for number. Gender, number, tense, voice,
aspect and mood cause inflections to verbs. Many times it
is context which decides whether a word is a noun or
adjective or adverb or post position. This increases the
complexity of parsing Afaan Oromo. Because of all these
reasons development of a spell checker for Afaan Oromo is
a challenging task.
____________________________
Gaddisa Olani Ganfure is a lecturer of Computer
Science Department, Dire Dawa University, Ethiopia.
Email: gaddisaolex@gmail.com
Dr. Dida Midekso is an Associate Professor of
Computer Science Department, Addis Ababa
University, Ethiopia. Email: mideksod@yahoo.com