INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 3, ISSUE 12, December 2014 ISSN 2277-8616 118 IJSTR©2014 www.ijstr.org Design And Implementation Of Morphology Based Spell Checker Gaddisa Olani Ganfure, Dr. Dida Midekso Abstract: Introducing texts to word processing tools may result in spelling errors. Hence, text processing application software‘s has spell checkers. Integrating spell checker into word processors reduces the amount of time and energy spent to find and correct the misspelled word. However, these tools are not available for Afaan Oromo, Cushitic language family spoken in Ethiopia. In this paper, we describe the design and implementation of a non-word Afaan Oromo spell checker. The system is designed based on a dictionary look-up with morphological analysis (i.e. morphology based spell checker). To develop morphology based spell checker, the knowledge of the language morphology is necessarily required. Accordingly, the morphological properties of Afaan Oromo have been studied. To the best of our knowledge, this work is the first of its kind for Afaan Oromo. The methodology delineated in the paper can be replicated for other languages showing similar morphology with Afaan Oromo. Index Terms: Spell checker, non-word error, Error detection, Error correction, Morphology, Morphological Analyzer, Morphological generator, Afaan Oromo, typographic errors, cognitive errors ———————————————————— 1 INTRODUCTION A spell checker is a tool that enables us to check the spellings of the words in a text file, validates them i.e. checks whether they are rightly or wrongly spelled and in case the spell checker has doubts about the spelling of the word, suggests possible alternatives. The two core functionalities provided by a spell checkers are: spelling error detection and spelling error correction. ‗Error Detection‘ is to verify the validity of a word in the language while ‗Error Correction‘ is to suggest corrections for the misspelled word. Spell checker may be stand-alone capable of operating on a block of text, or as part of a larger application, such as a word processor, email client, electronic dictionary, or search engine [1]. Several researches have been done for the languages like English, Arabic, Chinese and few researches have been done for Amharic language, but none for Afaan Oromo. Afaan Oromo (when translated it means Oromo Language) is one of the major African languages that is widely spoken and used in most parts of Ethiopia and some parts of other neighbor countries like Kenya and Somalia. Afaan Oromo belongs to the Lowland East Cushitic sub-family of the Afro- asiatic super-phylum. Among the Cushitic language families to which it belongs, Afaan Oromo ranks first by the number of its speakers [2]. Currently, it is an official language of Oromiya regional state. Despite of its popularity and its status as a regional language, Afaan Oromo language processing is still in its infancy. According to Damerau [3] and Peterson [4] spelling errors are generally divided into two types, typographic errors and cognitive errors. Typographic errors occur when writer knows the correct spelling of the word but mistypes the word by mistake. Cognitive errors occur when a writer does not know or has forgotten the correct spelling of a word. A study by Damerau reports that 80% of the misspelled words in English are non-word errors and caused by single error misspellings [3]. We did a simple study to analyze spelling error pattern of Afaan Oromo before implementation. For this purpose, module prepared for teaching Afaan Oromo courses was selected. We used text analysis data gathering technique for this purpose. The finding of study depicts the existence of spelling errors. When analyzed, it was found that 1,342 words were misspelled. Out of this 1,287 words were in the category of non-word errors. Though a comprehensive study is required to come to a clear opinion, it was enough to realize that non-word error detection is the first step towards a truly professional spellchecker. The paper is organized in to the following sections. Section 2 discusses the challenges in building a spell checker for Afaan Oromo and the work done so far. Section 3 discusses the design of the system. Discussion and results are discussed in Section 4. Finally the paper ends with some concluding remarks. 2 Challenges and Related work As stated in [5] like a number of other African languages, Afaan Oromo has a very rich morphology. In agglutinative languages most of the grammatical information is conveyed through affixes and other structures. Therefore, the grammatical information of the language is described in relation to its morphology. As Afaan Oromo is an agglutinative and morphologically rich language, each root word can combine with multiple morphemes to generate huge number of word forms. For the purpose of supporting such inflectionally rich languages, the structure of each word has to be identified. Afaan Oromo has compound, derived and simple nouns, verbs, and adjectives. It also has first person, second person, and derived pronouns. Nouns get inflected for number. Gender, number, tense, voice, aspect and mood cause inflections to verbs. Many times it is context which decides whether a word is a noun or adjective or adverb or post position. This increases the complexity of parsing Afaan Oromo. Because of all these reasons development of a spell checker for Afaan Oromo is a challenging task. ____________________________ Gaddisa Olani Ganfure is a lecturer of Computer Science Department, Dire Dawa University, Ethiopia. Email: gaddisaolex@gmail.com Dr. Dida Midekso is an Associate Professor of Computer Science Department, Addis Ababa University, Ethiopia. Email: mideksod@yahoo.com