101
Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Chapter 6
DOI: 10.4018/978-1-4666-0954-9.ch006
1. INTRODUCTION
Human computer interaction through natural lan-
guage conversational interface plays an important
role in improving the usage of computers for the
common man. The success of such speech enabled
man machine communication interface depends
mainly upon the performance of automatic speech
recognition system. State-of-the-art ASR systems
use statistical pattern classification approach, hav-
ing the two well known phases: feature extraction
and pattern classification.
In the architecture of ASR, feature extraction
phase comes under front-end, that converts the
recorded waveform to some form of acoustic
representation known as feature vectors. Back-
end covers the different statistical models such
as acoustic models and language models, along
with searching methods and adaptation techniques
for classification. The features are based on time-
frequency representation of acoustic signals,
which are computed at regular intervals (e.g.,
every 10ms). The feature vectors are decoded into
linguistic units like word, syllable, and phones
R. K. Aggarwal
National Institute of Technology Kurukshetra, India
M. Dave
National Institute of Technology Kurukshetra, India
Recent Trends in Speech
Recognition Systems
ABSTRACT
Ways of improving the accuracy and efficiency of automatic speech recognition (ASR) systems have
been a long term goal of researchers to develop the natural language man machine communication
interface. In widely used statistical framework of ASR, feature extraction technique is used at the front-
end for speech signal parameterization, and hidden Markov model (HMM) is used at the back-end for
pattern classification. This chapter reviews classical and recent approaches of Markov modeling, and
also presents an empirical study of few well known methods in the context of Hindi speech recognition
system. Various performance issues such as number of Gaussian mixtures, tied states, and feature re-
duction procedures are also analyzed for medium size vocabulary. The experimental results show that
using advanced techniques of acoustic models, more than 90% accuracy can be achieved. The recent
advanced models outperform the conventional methods and fit for HCI applications.