1664 | International Journal of Current Engineering and Technology, Vol.4, No.3 (June 2014)
Research Article
International Journal of Current Engineering and Technology
E-ISSN 2277 – 4106, P-ISSN 2347 - 5161
©2014 INPRESSCO
®
, All Rights Reserved
Available at http://inpressco.com/category/ijcet
A Systematic Analysis of Automatic Speech Recognition: An Overview
Taabish Gulzar
Ȧ*
, Anand Singh
Ȧ
, Dinesh Kumar Rajoriya
Ḃ
and Najma Farooq
Ȧ
Ȧ
Department of Electronics and Communication, Dehradun Institute of Technology, Mussourie Diversion Road, Makkwala Dehradun, India
Ḃ
Department of Electronics and Communication, Sagar Institute of Science, Technology and Engineering, Bhopal, M.P, India
Accepted 18 May 2014, Available online 01 June2014, Vol.4, No.3 (June 2014)
Abstract
Most high-flying and primary means of communication among humans is speech. Despite the researches and
developments in the field of automatic speech recognition the accuracy of the said is still a research challenge. This
paper reviews past work comparing modern speech recognition systems and humans to determine how far recent
dramatic progress in technology has evolved towards the objective of human-like performance. An overview of sources of
knowledge is introduced and the use of knowledge to create and verify hypotheses is discussed.
Keywords: Automatic speech recognition, Feature Extraction, Utterance, Dynamic time wrapping, Matching.
1. Introduction
1
From previous several decades human beings tried to
create technologies that could recognize correct speech.
While humans can differentiate speech very easily, they in
fact make use of much acoustic, linguistic and contextual
information. It has been seen that relation between
physical speech signal and the corresponding words is so
much complex and very hard to understand. Both the
research areas of automatic speech recognition (ASR) and
human speech recognition (HSR) observe the recognition
process from the acoustic signal to a series of recognized
units. For ASR, the objective is to automatically transcribe
the speech signal in terms of a sequence of items as close
as possible to a reference transcription (L. Rabiner et al,
1993; F. Jelinek, 1997). In HSR, the attention is on
understanding how human listeners recognise spoken
utterances. On the basis of advances in statistical
modelling of speech, automatic speech recognition (ASR)
systems find extensive application in tasks that make use
of human-machine interface, such as automatic call
processing in telephone networks and query-based
information systems that provide updated travel
information, stock price quotations, weather reports,
embedded systems etc.
1.1 Definition and Basic Model of speech recognition
Speech Recognition also known as Automatic speech
recognition (ASR) is defined as a process of converting a
speech signal into a set of words by a certain algorithm
that can be implemented as a system program or a process
of converting an acoustic signal, captured by a microphone
or a telephone, to a set of words (V. Zue et al, 1996; Z.
*Corresponding author: Taabish Gulzar
Mengjie, 2001). Automatic speech recognition (ASR) is
one of the fastest growing areas in the framework of
speech science and engineering. Research in speech
processing and communication for the most part, was
enthused by people’s desire to build mechanical models to
follow human verbal communication capabilities. The
primary aim of ASR systems is to develop the new
techniques and systems for speech input to machines.
Mathematical representation of speech recognition system
in straightforward equations which contain frontend unit,
model unit, language model unit, and search unit is shown
in Fig. 1.
INPUT SPEECH
Fig 1 shows the basic model of speech recognition.
One of standard approach to large vocabulary continuous
speech recognition is to presume a simple probabilistic
model of speech production whereby a specified word set,
W, generates an acoustic observation sequence Y, with
probability P(W,Y). The objective is then to decode the