Using Digital Cameras for Text Input on Mobile Devices Frank Siegemund and Muhammad Haroon European Microsoft Innovation Center Ritterstrasse 23, D-52074 Aachen, Germany {franksie|i-mharoo}@microsoft.com Abstract This paper presents novel text input methods for mobile devices. In particular, it shows how users can capture text from books, newspapers, and other objects by using the digital cameras that are integrated into an increasing number of smartphones. The idea is to se- lect text from a live video stream generated by these digital cameras. Different interaction patterns are dis- cussed that allow users to select text: (1) automatic detection of the word closest to a fixed position on the mobile device’s screen, (2) delayed movement of the text selection to improve the precision of the selection procedure, and (3) using the stylus to select words and word groups. The proposed interaction patterns are evaluated based on the results of a user study, and their application is shown in an example scenario. The presented research has practical relevance because it can improve the usability of mobile applications such as online searching. 1. Introduction In Pervasive Computing, a variety of tagging tech- nologies – from simple bar codes to active and passive RFID tags are used to augment everyday objects. While these technologies are well suited to link physi- cal objects with virtual services, they often require a post-hoc augmentation of theseobjects. However, there is one “natural” tagging technology that is al- ready present on many everyday things. This tagging technology is printed text. Printed text can be found not only in books and newspapers but also on posters and product packages. Also, with the increasing num- ber of cameras integrated into cell phones and PDAs, these devices can serveas “tag readers.” In other words, mobile devices can be used to capture text from the physical world, for transforming this information, and for using it as input for a range of applications. For example, a user reading a newspaper article about Ro- man history could capture words in the article with her mobile phone. Afterwards, the mobile phone could then automatically start an Internet search for more in- formation about the captured words. The same mecha- nism applies when capturing text from posters or prod- ucts in a supermarket. As an example of this vision, this paper focuses on capturing text from printed documents. In the pre- sented approaches, users point their mobile device at a printed text. The camera integrated in the mobile de- vice then generates a live video stream that is analyzed in real-time for words that appear in the text. Users can then select a word and subsequently extend their text selection over multiple words. The result of the text selection is an identified area in a video frame that contains the selected words. This sub-image is pre- processed on the phone, and can then be sent to an ap- plication such as an online search engine. Using such methods for text input, a mobile device’s digital cam- era can serve as an additional input channel for mobile applications. The main contributions of the paper are as follows: First, it presents a thorough analysis of user interaction patterns for selecting text with a handheld-embedded digital camera. It is shown, for example, how to cope with unintentional camera movements, and how to im- prove the precision of text selections. Second, the pa- per proposes adaptive algorithms for selecting words in a live video stream. Third, it presents a concrete ap- plication that shows how the proposed concepts can be used in real-world scenarios. The remainder of this paper is structured as follows: Sect. 2 summarizes related work. Sect. 3 describes user interaction patterns for selecting printed text with a camera phone, while Sect. 4 discusses their implemen- tation. Sect. 5 presents algorithmic improvements. Sect. 6 evaluates our work based on a detailed user study,and Sect.7 presents a selected application. Sect. 8 concludes the paper. Proceedings of the Fifth Annual IEEE International Conference on Pervasive Computing and Communications (PerCom'07) 0-7695-2787-6/07 $20.00 © 2007