IMPLEMENTING SRI’S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A
SMART PHONE
Jing Zheng, Arindam Mandal, Xin Lei
1
Wen Wang, Murat Akbacak, Kristin Precoda
, Michael Frandsen, Necip Fazil Ayan, Dimitra Vergyri,
Speech Technology and Research Laboratory, SRI International, Menlo Park, USA
1
Xin Lei is currently with Google Inc.
ABSTRACT
We describe our recent effort implementing SRI’s UMPC-
based Pashto speech-to-speech (S2S) translation system on a
smart phone running the Android operating system. In order
to maintain very low latencies of system response on
computationally limited smart phone platforms, we
developed efficient algorithms and data structures and
optimized model sizes for various system components. Our
current Android-based S2S system requires less than one-
fourth the system memory and significantly lower processor
speed with a sacrifice of 15% relative loss of system
accuracy, compared to a laptop-based platform.
Index Terms— speech-to-speech translation, mobile
computing, smart phone, Android
1. INTRODUCTION
The new generation of smart phones, such as the Apple
iPhone and Google Nexus One, is extremely popular and has
revolutionized the use of mobile computing for everyday
tasks. Compared to their predecessors, the new smart phones
have more powerful processors, larger screens, and better
touch-enabled graphic interfaces, which provide new
functionalities and result in a superior user experience. In
addition, both Apple and Google provide an open software
development toolkit (SDK) and application programming
interfaces (APIs), allowing third-party developers to quickly
build applications (apps) for these phones. As a result,
hundreds of thousands of apps are available for these
platforms that allow consumers a rich mobile computing
experience.
As with many other applications, smart phone platforms
are a good candidate for deploying speech-to-speech (S2S)
translation apps, because of extreme portability, a very large
and rapidly growing customer base, and very affordable
price points. The main challenge for developing S2S apps on
smart phones is achieving acceptable system performance
given the computational constraints of such platforms. To
date, even high-end smart phones still have limited
processing power and physical memory. For example, the
Google Nexus One, a smart phone running the Android
operating system, is configured with 512 MB of random
access memory (RAM) and a 1 GHz QualComm
Snapdragon™ processor, while a low-end laptop, or UMPC,
typically has 1 to 2 GB of memory and a much more
powerful CPU (typically 1.6 GHz) that may also be dual-
core. A state-of-the-art two-way, large-vocabulary S2S
translation system typically uses multiple computing-
intensive components and requires large memory availability
to store models in dynamic data structures, therefore posing
significant challenges to developers.
This paper describes our recent work implementing
SRI’s Pashto-English S2S translation system on the Google
Nexus One smart phone. We used various techniques to
reduce memory usage, including memory-efficient
algorithms and data structures, and optimizing model sizes
of system components. The paper is organized as follows:
Section 2 briefly introduces the Pashto-English S2S system;
Section 3 discusses the challenge posed by limited hardware
and our engineering solutions; Section 4 describes work on
the algorithms, data structures and system architecture;
Section 5 shows results of model size optimization; Section
6 summarizes the user interface design. Section 7 presents
our conclusions and plans for future work.
2. PASHTO S2S SYSTEM
Figure 1 illustrates the architecture of SRI’s Pashto two-way
S2S Pashto-English translation system, which is similar to
our previously reported systems [11]. The system has seven
main components, including the user interface and control,
two automatic speech recognition (ASR) engines, one each
for Pashto and English; two statistical machine translation
(SMT) engines for Pashto-to-English and English-to-Pashto
directions; and two text-to-speech (TTS) voices for Pashto
and English. The system was originally designed for laptop
and ultra mobile PC (UMPC) platforms, and requires 2 GB
physical memory and modern CPU (such as a 1.6 Ghz Intel
121 978-1-4244-7903-0/10/$26.00 ©2010 IEEE SLT 2010