Automatic Whiteout: Discovery and Correction of Typographical Errors in Mobile Text Input James Clawson, Alex Rudnick, Kent Lyons , Thad Starner College of Computing and GVU Center Georgia Institute of Technology Atlanta, GA 30332 {jamer,alexr,kent,thad}@cc.gatech.ed ABSTRACT We detect and correct typing errors made on mini–QWERTY keyboards by analyzing features of the typing itself. Examining a database of mini–QWERTY typing data reveals that many errors made by typists are “off–by–one” errors. One likely cause of these errors is the relative size difference between the user’s thumb and the small, densely packed keys of the mini–QWERTY keyboard. Our goal with this work is to improve expert typing speeds and accuracy by automatically correcting the user’s typing errors before they are displayed on the screen. Using pattern recognition methods, we reduced the number of off–by–one errors by 39.78% and the total errors by 26.41%. This paper discusses the problem, the features used to detect errors, the techniques used to train the system, and future steps to generalize the algorithm for other keyboards and situations. Keywords Keyboard Input, Error Correction, Mobile Phones, Mini–QWERTY, Mobile Text Entry. Categories and Subject Descriptors H.5.2 [:]: User Interfaces, Input devices and strategies 1. INTRODUCTION Miniature keyboards and miniature keypads are currently used extensively on mobile devices such as mobile phone handsets and personal digital assistants. The mini–QWERTY keyboard (Figure 1) is a common handheld mobile two–handed keyboard which contains at least one key for each letter plus a space bar and is configured in the same manner as a desktop QWERTY keyboard. While the layout is analogous to desktop keyboards, mini- QWERTY keyboards contain keys that are densely packed to save space and are usually operated by a user’s two thumbs. Often the keys are smaller than the digit used to manipulate them (the thumb) resulting in difficulty of use. The user’s digit occludes visibility of the keys introducing ambiguity as to which key was actually Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. Figure 1: Commercial mobile phones with mini–QWERTY keyboards such as the Danger/T–Mobile Sidekick pressed. Furthermore, Fitts’ Law, which describes the relationship between speed of movement, target size, and accuracy (adjusted target size) [8], implies that users will type less accurately as they type faster. Together these effects lead to typing errors where one digit may press multiple keys at once or a key adjacent to the intended key. These types of errors often occur, especially at rapid typing rates. In this paper we examine a set of mini–QWERTY keyboard text input data and identify a common type of error in the dataset that accounts for 61.28% of the total errors (off–by–one errors). We then use pattern recognition techniques to automatically recognize and correct these types of errors. We evaluate the effect of the correction on overall keystroke accuracy in an existing database and discuss how our algorithm can be employed to improve mobile text input on mini–QWERTY keyboards. 2. GENERATION OF THE DATA SET We generated the data set used for our analysis from two longitudinal studies of mini–QWERTY keyboard use [1, 3]. In the first study we recruited 14 participants who had no prior experience with mini-QWERTY keyboard typing. They were randomly assigned to one of two subject groups, each using one of two different keyboard models. Subjects used the same keyboard throughout the experiment, which consisted of twenty 20–minute typing sessions. The sessions involved subjects typing during several trial blocks; 10 phrases comprised each block. The phrases were taken from MacKenzie and Soukoreff’s set of 500 phrases designed for use in text entry studies [7]. The phrases use only lowercase letters and spaces with no punctuation. The canonical set was altered to use American English spellings. The test software prompts the user with the target phrase, displays the text produced by the user,