Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti National Electronics and Computer Technology Center, National Science and Technology Development Agency, Ministry of Science and Technology Environment, 22 nd Floor Gypsum Metropolitan Tower 539/2 Sriayudhya Rd. Rajthevi Bangkok 10400 Thailand Email: kanokwutt@notes.nectec.or.th, tanapong@nectec.or.th and virach@nectec.or.th Abstract In this paper, we propose an intelligent aid for text input method in order to provide an easier way for text inputting with the conventional keyboard. We use the character n-gram model and error correction rules to identify the language being typed and to predict the most probable character string without extra keystroke. The character n-gram model is also required only a small amount of memory spaces when using Bi-gram and Tri-gram. The paper also proposes rule-reduction algorithm applying mutual information to reduce the error-correction rules. Our algorithm archives more than 99% accuracy in both language identification and key prediction. Keywords: key prediction, error correction rule and language identification. 1 Introduction For Thai users, there are always two annoyances while typing Thai-English bilingual documents, which are usual for Thais. The first is when the users want to switch from typing Thai to English, they have to input a special key to tell the operating system to change the language mode. Further, if the language-switching key is ignored, they have to delete the token just typed and re-type that token after language switching. The second is that Thai has more than 100 alphabets, to input about half of all Thai characters, the user has to use combinations of two keys (shift key + another key) to input them. Some of the other Asian language users also have the same problem. It will be wonderful, if there is an intelligent keyboard system that is able to perform these two tasks –switching language and shifting key– automatically. This paper proposes a practical solution for these disturbances by applying tri-gram character probabilistic model and error- correction rules. To optimize number of the generated error-correction rules, we propose a rule reduction approach using mutual information. More than 99 percent of key prediction accuracy results are reported. 2 Related Works There are two related works on an intelligent aid for text input method. Firstly, Microsoft Word 2000 provides simple automatic language detection by considering only the existing surrounding text. This feature cannot detect user’s input intention during the typing. As a result, the input character sequence cannot switch back and forth to accept the proper language input. Secondly, Zheng et al. [1] applied lexical trees and Chinese word n-grams to word prediction for inputting Chinese sentences by using digit keys. They reported 94.4% prediction accuracy. However, they did not deal with automatic language identification process. The lexicon trees they employed required a large amount of memory space. 3 The Approach 3.1 Overview In the traditional Thai keyboard input system, a key button with the help of the language-switching key and the shift key can output 4 different characters. For example, in the Thai keyboard the ‘a’-key button can represent 4 different characters in different modes as shown in table 1. without Shift with Shift English Mode ‘a’ ‘A’ Thai Mode ‘¢’ ‘§’ Table 1: A Key Button can Represent Different Characters in Different Modes. However, using NLP technique, the Thai-English keyboard system which can predict the key users intend to type without the combination use with the language- selection key and the shift key, should be efficiently implemented. We propose an intelligent keyboard system to solve this problem and have implemented with a successful result.