Mohan Dholvan and K.Anitha Sheela / Elixir Elec. Engg. 100 (2016) 43403-43409 43403 Introduction IVR and transaction processing applications used today provide a user interface that employs either a dual-tone multi frequency (DTMF) or a touch-tone. Speech Enabled Interactive Voice Response systems (SEIVRS) systems are those which provide applications that allow callers to use their own voice for the completion of their transactions, rather than DTMF inputs. This provides ease of use and better user interface. This is rapidly rising as the recent and advanced innovations in telephony-based remote self-service. A SEIVR system connects the telephone network with a predefined set of instructions, thereby serving as a bridge between customer and computer. A user can access the information from anywhere and at any time by dialing a specified number after the connection is established. The response of an IVR system will be by using computer generated voice responses, to provide information for an input from a telephone caller. This input could be given by means of either a speech or a voice signal and the output response message is dynamically determined according to an internal menu structure (it is maintained within the SEIVR application program itself) and the user input. The SEIVR system is highly efficient and economical when compared to Dialogic card (very costly) and also requires regular or periodical up-gradation and maintenance. Speech based IVRs have several advantages over keypad based IVRs. It can be easy for callers to speak their requirements than punch numbers according to their requirements. Also recalling the names of people, trains, places etc could be easier as compared to recalling code numbers. Statically, speech-enabled systems take comparatively less time for call completion and lead to more number of completed calls. According to recent reviews, the economics of a call that uses speech seemed to be more beneficial. For example, Manual Handle of call: $ 1.75 per minute Speech call: $0.20 per minute If a part of this, say even 20 % of all the calls handled by human, could be converted to speech based IVR, then the RoI (Return on Investment) is absolutely remarkable. Construction of Speech Enabled IVR systems involves three modules. ASR module for speech recognition task which is done by ASR, for performing speech synthesis task done by TTS module and the module of speech coder which plays a very vital role at the client - server end. According to our consideration, SEIVR is a client- server based enquires system therefore ASR systems are essential for client server based enquire systems. There exist three approaches for the implementation of an ASR for client-server based applications, example: replying remote queries by using communication channels. The First one is known as Embedded Speech Recognition (ESR) second is Distributed Speech recognition (DSR) and finally the third is Network Speech Recognition (NSR). The ESR and DSR configuration require very large amounts of computational power either to decode or to extract features. Hence most of the present deployments prefer adopting server NSR model for recognition process. Hence our main focus in this project is to study the effect of different narrowband codecs on ASR accuracy. We have organized the paper it the following way. Section-II describes the design of speech recognition. We are dealing with selection of communication network and usage of source code of various narrowband codec to generate ARTICLE INFO Article history: Received: 3 October 2016; Received in revised form: 02 November 2016; Accepted: 06 November 2016; Keywords SEIVR, SPHNIX, TIMIT, ITU-T, ETSI, TTS, ESR, DSR, NSR,RoI, VoIP, GSM, DTMF,Context-Independent (CI), Context-Dependent (CD), ESR, DSR, NSR, AM, LM. Performance Analysis of Speech Enabled IVR Using Narrowband Codec Mohan Dholvan and Dr.K.Anitha Sheela ECM Department, SNIST, Hyderabad, Telangana, India. ECE Department, JNTUCEH, Telangana, India. ABSTRACT The ultimate goal of the deployment of any voice-centric application is to provide a natural way of human-machine interaction in end-to-end communication and majority of the voice-centric applications in today‟s world are promising the same. In this scenario, it is essential to investigate the performance of Speech-Enabled IVR (SEIVR) under the effect of different narrowband codecs. In this paper, the performance of SEIVR has been analyzed by utilizing an ASR engine and speech codecs. SPHNIX-3 has been used as the ASR engine which is CMU‟s ASR toolkit for speech recognition and executable files of various narrowband codec are generated with the help of source code and it is taken from standard organizations such as ITU-T, ETSI and ISO/IEC. The results of this paper are completely based on the speech data from TIMIT speech database. The major work done in this paper is to prove that the recognition accuracy of SEIVR increases when there is an increase in Gaussian mixture from Context-Independent (CI) to Context-Dependent (CD) under the influence of various narrow band codecs. © 2016 Elixir All rights reserved. Elixir Elec. Engg. 100 (2016) 43403-43409 Electrical Engineering Available online at www.elixirpublishers.com (Elixir International Journal) Tele: E-mail address: mohan.aryan19@sreenidhi.edu.in © 2016 Elixir All rights reserved