AN INTELLIGENT SPELLING ERROR CORRECTOR E. J. YANNAKOUDAKIS and D. FAW~HROP Postgraduate School of Computing. University of Bradford. Bradford. W. Yorkshire BD7 1DP. England (Received zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA for publication 4 August 1982) Abstract-This paper describes an intelligent spelling error correction system for use in a word processing environment. The system employsa dictionaryof 93,769 words and provided the intended word is in the dictionary it identifies 80 to 90% of spelling and typing errors. I. INTRODUCTION The techniques of Artificial Intelligence have been applied in many fields, notably medical diagnosis[7], mineral prospecting[5]. Spelling error correction however has not benefited from these techniques because while people use intelligence and expertise when performing this task no systematic formulation of this expertise had appeared. An analysis of 1377 spelling and typing errors made by adults established a number of empirical rules[l] which are obeyed consistently. This detailed formulation is used as a basis for the present work. This paper describes a spelling error correction system which, to some extent, emulates the action of the human brain. While it has no knowledge of context or semantic structure, it contains information of possible pronunciations of the word and also of the nature of spelling and typing errors as made by adults. The system knows which portion of the dictionary probably contains the word intended and its efficiency and accuracy is enhanced by a number of considerations. For example when it tries to correct the word AB-LITY the system rapidly abandons its examination of the word ANNUAL because the large difference (5 characters) between these words shows that AB-LITY is unlikely to be a misspelling of ANNUAL. Similarly, when it tries to correct the error form ALOW it can decide that the word intended is probably ALLOW and not AGLOW. 1. DEFINITIONS AND TERMINOLOGY Several conventions supplementary to previous work [ 11are used throughout this paper: (i) An “error form” is an example of a misspelled or mistyped word. (ii) A “dictionary form” is a word which appears in a dictionary. Both, American and English dictionaries are considered to be relevant for words which are written in the ap- propriate dialect. The term refers to the word intended by the misspeller. (iii) A “dictionary word” is that word in the dictionary currently being examined by the system. (iv) The “character form” of a word is its “picture” in characters. 3. THE ALGORITHM The algorithm is designed for use in word processing environments and only the words which are not found in the dictionary are passed to the algorithm for correction. The algorithm in essence inverts the methodology of the previous work[l] which showed, by comparing error forms and dictionary forms, that spelling errors follow specific patterns or rules and that only certain sections of the dictionary are likely to contain the word intended. The algorithm therefore searches a small part of the dictionary word by word. If one or two differences are found between the error form and the dictionary word and the differences follow any of the rules, then the dictionary word may indicate a spelling error and it is then transferred to a “choice list”. Finally, a choice is made from that list using Bayesian statistics (see section 8). 101