IMPLEMENTING SRI’S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE Jing Zheng, Arindam Mandal, Xin Lei 1 Wen Wang, Murat Akbacak, Kristin Precoda , Michael Frandsen, Necip Fazil Ayan, Dimitra Vergyri, Speech Technology and Research Laboratory, SRI International, Menlo Park, USA 1 Xin Lei is currently with Google Inc. ABSTRACT We describe our recent effort implementing SRI’s UMPC- based Pashto speech-to-speech (S2S) translation system on a smart phone running the Android operating system. In order to maintain very low latencies of system response on computationally limited smart phone platforms, we developed efficient algorithms and data structures and optimized model sizes for various system components. Our current Android-based S2S system requires less than one- fourth the system memory and significantly lower processor speed with a sacrifice of 15% relative loss of system accuracy, compared to a laptop-based platform. Index Terms— speech-to-speech translation, mobile computing, smart phone, Android 1. INTRODUCTION The new generation of smart phones, such as the Apple iPhone and Google Nexus One, is extremely popular and has revolutionized the use of mobile computing for everyday tasks. Compared to their predecessors, the new smart phones have more powerful processors, larger screens, and better touch-enabled graphic interfaces, which provide new functionalities and result in a superior user experience. In addition, both Apple and Google provide an open software development toolkit (SDK) and application programming interfaces (APIs), allowing third-party developers to quickly build applications (apps) for these phones. As a result, hundreds of thousands of apps are available for these platforms that allow consumers a rich mobile computing experience. As with many other applications, smart phone platforms are a good candidate for deploying speech-to-speech (S2S) translation apps, because of extreme portability, a very large and rapidly growing customer base, and very affordable price points. The main challenge for developing S2S apps on smart phones is achieving acceptable system performance given the computational constraints of such platforms. To date, even high-end smart phones still have limited processing power and physical memory. For example, the Google Nexus One, a smart phone running the Android operating system, is configured with 512 MB of random access memory (RAM) and a 1 GHz QualComm Snapdragon™ processor, while a low-end laptop, or UMPC, typically has 1 to 2 GB of memory and a much more powerful CPU (typically 1.6 GHz) that may also be dual- core. A state-of-the-art two-way, large-vocabulary S2S translation system typically uses multiple computing- intensive components and requires large memory availability to store models in dynamic data structures, therefore posing significant challenges to developers. This paper describes our recent work implementing SRI’s Pashto-English S2S translation system on the Google Nexus One smart phone. We used various techniques to reduce memory usage, including memory-efficient algorithms and data structures, and optimizing model sizes of system components. The paper is organized as follows: Section 2 briefly introduces the Pashto-English S2S system; Section 3 discusses the challenge posed by limited hardware and our engineering solutions; Section 4 describes work on the algorithms, data structures and system architecture; Section 5 shows results of model size optimization; Section 6 summarizes the user interface design. Section 7 presents our conclusions and plans for future work. 2. PASHTO S2S SYSTEM Figure 1 illustrates the architecture of SRI’s Pashto two-way S2S Pashto-English translation system, which is similar to our previously reported systems [11]. The system has seven main components, including the user interface and control, two automatic speech recognition (ASR) engines, one each for Pashto and English; two statistical machine translation (SMT) engines for Pashto-to-English and English-to-Pashto directions; and two text-to-speech (TTS) voices for Pashto and English. The system was originally designed for laptop and ultra mobile PC (UMPC) platforms, and requires 2 GB physical memory and modern CPU (such as a 1.6 Ghz Intel 121 978-1-4244-7903-0/10/$26.00 ©2010 IEEE SLT 2010