Using Digital Cameras for Text Input on Mobile Devices
Frank Siegemund and Muhammad Haroon
European Microsoft Innovation Center
Ritterstrasse 23, D-52074 Aachen, Germany
{franksie|i-mharoo}@microsoft.com
Abstract
This paper presents novel text input methods for
mobile devices. In particular, it shows how users can
capture text from books, newspapers, and other objects
by using the digital cameras that are integrated into an
increasing number of smartphones. The idea is to se-
lect text from a live video stream generated by these
digital cameras. Different interaction patterns are dis-
cussed that allow users to select text: (1) automatic
detection of the word closest to a fixed position on the
mobile device’s screen, (2) delayed movement of the
text selection to improve the precision of the selection
procedure, and (3) using the stylus to select words and
word groups. The proposed interaction patterns are
evaluated based on the results of a user study, and
their application is shown in an example scenario. The
presented research has practical relevance because it
can improve the usability of mobile applications such
as online searching.
1. Introduction
In Pervasive Computing, a variety of tagging tech-
nologies – from simple bar codes to active and passive
RFID tags – are used to augment everyday objects.
While these technologies are well suited to link physi-
cal objects with virtual services, they often require a
post-hoc augmentation of theseobjects. However,
there is one “natural” tagging technology that is al-
ready present on many everyday things. This tagging
technology is printed text. Printed text can be found
not only in books and newspapers but also on posters
and product packages. Also, with the increasing num-
ber of cameras integrated into cell phones and PDAs,
these devices can serveas “tag readers.” In other
words, mobile devices can be used to capture text from
the physical world, for transforming this information,
and for using it as input for a range of applications. For
example, a user reading a newspaper article about Ro-
man history could capture words in the article with her
mobile phone. Afterwards, the mobile phone could
then automatically start an Internet search for more in-
formation about the captured words. The same mecha-
nism applies when capturing text from posters or prod-
ucts in a supermarket.
As an example of this vision, this paper focuses on
capturing text from printed documents. In the pre-
sented approaches, users point their mobile device at a
printed text. The camera integrated in the mobile de-
vice then generates a live video stream that is analyzed
in real-time for words that appear in the text. Users can
then select a word and subsequently extend their text
selection over multiple words. The result of the text
selection is an identified area in a video frame that
contains the selected words. This sub-image is pre-
processed on the phone, and can then be sent to an ap-
plication such as an online search engine. Using such
methods for text input, a mobile device’s digital cam-
era can serve as an additional input channel for mobile
applications.
The main contributions of the paper are as follows:
First, it presents a thorough analysis of user interaction
patterns for selecting text with a handheld-embedded
digital camera. It is shown, for example, how to cope
with unintentional camera movements, and how to im-
prove the precision of text selections. Second, the pa-
per proposes adaptive algorithms for selecting words
in a live video stream. Third, it presents a concrete ap-
plication that shows how the proposed concepts can be
used in real-world scenarios.
The remainder of this paper is structured as follows:
Sect. 2 summarizes related work. Sect. 3 describes user
interaction patterns for selecting printed text with a
camera phone, while Sect. 4 discusses their implemen-
tation. Sect. 5 presents algorithmic improvements.
Sect. 6 evaluates our work based on a detailed user
study,and Sect.7 presents a selected application.
Sect. 8 concludes the paper.
Proceedings of the Fifth Annual IEEE International
Conference on Pervasive Computing and Communications (PerCom'07)
0-7695-2787-6/07 $20.00 © 2007