SiMPE : 2nd Workshop on Speech in Mobile and Pervasive Environments A. A. Nanavati, N. Rajput IBM India Research Lab, 4, Block C, Vasant Kunj New Delhi - 110070, India. namit,rnitendra@in.ibm.com A. I. Rudnicky School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA. air@cs.cmu.edu M. Turunen Dept. of Computer Science University of Tampere Tampere, 33014, Finland. mturunen@cs.uta.ﬁ ABSTRACT Traditionally, voice-based applications have been accessed using unintelligent telephone devices through Voice Browsers that reside on the server. With proliferation of pervasive de- vices and the increase in their processing capabilities, client- side speech processing is emerging as a viable alternative. As in SiMPE 2006 [2], we will further explore the various possi- bilities and issues that arise while enabling speech processing on resource-constrained, possibly mobile devices. In particular, this year’s theme will be SiMPE for devel- oping regions. The workshop will highlight the many open areas that require research attention, identify key problems that need to be addressed, and also discuss a few approaches for solving some of them — not only to build the next gen- eration of conversational systems, but also help create the next generation of IT users, thus extending the beneﬁts of technology to a much wider populace. Categories and Subject Descriptors I.2.7 [Artiﬁcial Intelligence]: Natural Language Process- ing—Speech recognition and synthesis ; C.3 [Computer Sys- tems Organisation]: Special-purpose and Application-based Systems—Real-time and embedded systems General Terms Algorithms, Performance, Design, Reliability, Human Fac- tors, Standardization, Languages, Theory 1. BACKGROUND AND MOTIVATION The growth of mobile devices has exceeded the Internet pen- etration by a signiﬁcant margin. This diﬀerence is even larger for developing regions where people can not aﬀord to own a PC and/or are not literate to work on computers. At the same time, mobile devices have been becoming more pervasive owing to their continuous reduction in size along with the monotonic increase in the features they oﬀer. How- ever, with reduced form factor of the device, the available input mechanisms of the device have been extremely lim- ited. For such devices, speech provides a natural and ideal input mechanism without the requirement of any additional increase in the device size. Moreover, such devices are often used in settings where hands/eyes may be occupied in other activities. Thus, speech provides an easier means of render- ing information to the user, without requiring attention of the other human senses. In developing regions, speech pro- vides a much more user friendly interface for the illiterate masses. The proliferation of mobile devices has stimulated the de- velopment of applications that support ubiquitous access via multiple modalities. Since the processing capabilities of pervasive devices diﬀer vastly, device-speciﬁc application adaptation becomes essential. How does one do speech ap- plication adaptation for pervasive devices with diﬀerent re- source (memory, power) constraints ? How does one de- vise eﬃcient algorithms for speech recognition and synthesis in resource-constrained devices operating in noisy environ- ments ? To provide high quality of speech recognition on a hand-held device, Distributed Speech Recognition is used as an alternative. In this setting, the initial speech processing of the user utterance is performed on the client device and the processed signals are then passed to the Voice Server. This approach is quite restrictive; it does not adapt to a particular client’s capabilities. Device adaptation of speech applications seems to be a viable approach – how does one do ﬂexible and eﬃcient speech application adaptation ? What eﬃcient architectures, protocols and standards should be de- veloped to support application ﬂexibility and the variation in client capabilities ? A mobile user accesses a pervasive device in various environments, requiring her to use mul- tiple modalities. What kind of interfaces oﬀer a seamless experience to the user ? The questions above give only a hint of the various issues that arise. Enabling conversational systems on pervasive devices will require new models, algorithms, systems that are robust across a variety of mobile and ubiquitous de- vices and dynamic and noisy environments. This multi- disciplinary problem invites the attention of software archi- tects, algorithm designers, speech recognition and synthesis experts, interface designers and modellers. Designing eval- uation measures, benchmarks and performance modelling of mobile speech systems will be important for supporting the advancements in the above technologies. This workshop aims to provide answers to some of these questions by invit-