978-1-4244-1694-3/08/$25.00 ©2008 IEEE
Abstract
In this paper, we present the design and implementa-
tion of a distributed sensor network application for
embedded, isolated-word, real-time speech recognition.
In our system design, we adopt a parameterized-data-
flow-based modeling approach to model the functional-
ities associated with sensing and processing of acoustic
data, and we implement the associated embedded soft-
ware on an off-the-shelf sensor node platform that is
equipped with an acoustic sensor. The topology of the
sensor network deployed in this work involves a clus-
tered network hierarchy. A customized time division
multiple access protocol is developed to manage the
wireless channel. We analyze the distribution of the
overall computation workload across the network to
improve energy efficiency. In our experiments, we dem-
onstrate the recognition accuracy for our speech recog-
nition system to verify its functionality and utility. We
also evaluate improvements in network lifetime to dem-
onstrate the effectiveness of our energy-aware optimi-
zation techniques.
1. Introduction
Speech recognition involves converting acoustic
signals, captured by a microphone or an acoustic sen-
sor, to a set of words. Then, these words are compared
with some pre-defined words and some sort of indica-
tion is given if there is a match. The recognized words
can then be analyzed and used for back-end applica-
tions such as command and control, commercial infor-
mation retrieval, and linguistic processing for speech
understanding. Figure 1 presents a design flow for basic
speech recognition systems.
From an embedded system design aspect, a major
design challenge for speech recognition is to assure the
processing of large amounts of data in real-time. Vari-
ous prior efforts on embedded speech recognition, e.g.
[1, 12, 13], focused on implementing various speech
recognition algorithms on embedded platforms, such as
programmable digital signal processors (PDSPs) and
comparing their performance. These efforts typically
have not explored further optimization for real-time
operations and energy usage beyond what is already
available through a standard PDSP-based design flow.
These existing design approaches therefore are not fully
suited for heavily resource-limited, distributed systems,
such as wireless sensor networks.
A wireless (distributed) sensor network (WSN) sys-
tem is composed of resource-limited sensor nodes,
which consist of components for sensing, data process-
ing, and wireless communication (e.g., see [7]). WSN
systems have a variety of potential applications [9],
such as environmental monitoring and intrusion detec-
tion. Sensor nodes are often deployed in inaccessible or,
in the case of certain military and security-related appli-
cations, dangerous areas and communicate with each
other through self-organizing protocols. To maximize
the useful life of these systems, power consumption
must carefully be considered during sensor node
design.
Integrating speech recognition into a WSN system
enables a new class of applications for speech recogni-
tion that have distributed configurations. We refer to
such speech-recognition-equipped WSN systems as dis-
tributed automatic speech recognition (DASR) systems.
The DASR system developed in this paper is an iso-
lated-word and speaker-dependent speech processing
system, where templates of extracted coefficients of
words have to be created and stored at a central node.
The system functionality is to have all sensor nodes col-
lect speech data within their sensing ranges, and trans-
mit this data periodically — in the form of recognized
words (or simple indicators for the absence of any
words) — to the central node. Any application-specific
analysis and usage of the recognized words is handled
as back-end processing on the central node.
Based on different requirements on recognition
accuracy, we describe two practical application scenar-
ios in which our developed DASR system can be
applied. The first scenario involves using a DASR sys-
tem as a speech-based command and control system in
Design and Optimization of a Distributed, Embedded Speech Recognition System
Chung-Ching Shen, William Plishker, and Shuvra S. Bhattacharyya
Dept. of Electrical and Computer Engineering, and Institute for Advanced Computer Studies
University of Maryland at College Park, USA
{ccshen, plishker, ssb}@umd.edu
Figure 1. Basic design flow for automatic speech
recognition systems.
Front-end
Signal Processing
Back-end
Applications
Input speech Feature
Vectors
Feature
Analysis
Sensing /
Sampling
Start
Detection /
Framing
Recognition
In Proceedings of the International Workshop on Parallel and Distributed Real-Time Systems,
Miami, Florida, April 2008.