Mohan Dholvan and K.Anitha Sheela / Elixir Elec. Engg. 100 (2016) 43403-43409 43403
Introduction
IVR and transaction processing applications used today
provide a user interface that employs either a dual-tone multi
frequency (DTMF) or a touch-tone. Speech Enabled
Interactive Voice Response systems (SEIVRS) systems are
those which provide applications that allow callers to use their
own voice for the completion of their transactions, rather than
DTMF inputs. This provides ease of use and better user
interface. This is rapidly rising as the recent and advanced
innovations in telephony-based remote self-service. A SEIVR
system connects the telephone network with a predefined set
of instructions, thereby serving as a bridge between customer
and computer. A user can access the information from
anywhere and at any time by dialing a specified number after
the connection is established. The response of an IVR system
will be by using computer generated voice responses, to
provide information for an input from a telephone caller. This
input could be given by means of either a speech or a voice
signal and the output response message is dynamically
determined according to an internal menu structure (it is
maintained within the SEIVR application program itself) and
the user input. The SEIVR system is highly efficient and
economical when compared to Dialogic card (very costly) and
also requires regular or periodical up-gradation and
maintenance.
Speech based IVRs have several advantages over keypad
based IVRs. It can be easy for callers to speak their
requirements than punch numbers according to their
requirements. Also recalling the names of people, trains,
places etc could be easier as compared to recalling code
numbers. Statically, speech-enabled systems take
comparatively less time for call completion and lead to more
number of completed calls. According to recent reviews, the
economics of a call that uses speech seemed to be more
beneficial. For example,
Manual Handle of call: $ 1.75 per minute
Speech call: $0.20 per minute
If a part of this, say even 20 % of all the calls handled by
human, could be converted to speech based IVR, then the RoI
(Return on Investment) is absolutely remarkable.
Construction of Speech Enabled IVR systems involves
three modules. ASR module for speech recognition task which
is done by ASR, for performing speech synthesis task done by
TTS module and the module of speech coder which plays a
very vital role at the client - server end. According to our
consideration, SEIVR is a client- server based enquires system
therefore ASR systems are essential for client server based
enquire systems. There exist three approaches for the
implementation of an ASR for client-server based
applications, example: replying remote queries by using
communication channels. The First one is known as
Embedded Speech Recognition (ESR) second is Distributed
Speech recognition (DSR) and finally the third is Network
Speech Recognition (NSR). The ESR and DSR configuration
require very large amounts of computational power either to
decode or to extract features. Hence most of the present
deployments prefer adopting server NSR model for
recognition process.
Hence our main focus in this project is to study the effect
of different narrowband codecs on ASR accuracy. We have
organized the paper it the following way.
Section-II describes the design of speech recognition. We are
dealing with selection of communication network and usage of
source code of various narrowband codec to generate
ARTICLE INFO
Article history:
Received: 3 October 2016;
Received in revised form:
02 November 2016;
Accepted: 06 November 2016;
Keywords
SEIVR, SPHNIX,
TIMIT, ITU-T,
ETSI, TTS,
ESR, DSR,
NSR,RoI,
VoIP, GSM,
DTMF,Context-Independent
(CI), Context-Dependent (CD),
ESR, DSR,
NSR, AM, LM.
Performance Analysis of Speech Enabled IVR Using Narrowband Codec
Mohan Dholvan and Dr.K.Anitha Sheela
ECM Department, SNIST, Hyderabad, Telangana, India. ECE Department, JNTUCEH, Telangana, India.
ABSTRACT
The ultimate goal of the deployment of any voice-centric application is to provide a
natural way of human-machine interaction in end-to-end communication and majority of
the voice-centric applications in today‟s world are promising the same. In this scenario, it
is essential to investigate the performance of Speech-Enabled IVR (SEIVR) under the
effect of different narrowband codecs. In this paper, the performance of SEIVR has been
analyzed by utilizing an ASR engine and speech codecs. SPHNIX-3 has been used as the
ASR engine which is CMU‟s ASR toolkit for speech recognition and executable files of
various narrowband codec are generated with the help of source code and it is taken from
standard organizations such as ITU-T, ETSI and ISO/IEC. The results of this paper are
completely based on the speech data from TIMIT speech database. The major work done
in this paper is to prove that the recognition accuracy of SEIVR increases when there is
an increase in Gaussian mixture from Context-Independent (CI) to Context-Dependent
(CD) under the influence of various narrow band codecs.
© 2016 Elixir All rights reserved.
Elixir Elec. Engg. 100 (2016) 43403-43409
Electrical Engineering
Available online at www.elixirpublishers.com (Elixir International Journal)
Tele:
E-mail address: mohan.aryan19@sreenidhi.edu.in
© 2016 Elixir All rights reserved