Niusha, the first Persian speech-enabled IVR platform M.H. Bokaei , H. Sameti , H. Eghbal-zadeh †† , B. BabaAli , KH. Hosseinzadeh †† , M. Bahrani , H. Veisi , A. Sanian †† Speech Processing Lab, Sharif University of Technology, Tehran, Iran {Bokaei, Babaali, Bahrani, Veisi}@ce.sharif.edu, Sameti@sharif.edu †† ASR-Gooyesh Pardaz Company, Tehran, Iran {h.eghbalzadeh, kh.hosseinzadeh, a.sanian}@asr-gooyesh.com AbstractThis paper introduces Niusha, the first Persian speech-enabled IVR platform. This platform uses Persian recognizer and Persian text-to-speech synthesizer engines in order to interact with users. The platform is designed in a way that it can simply be customized in various domains and its components are adjustable with new words. Keywords-component; speech-enabled IVR systems; VoiceXML; dialogue system; I. Introduction Since the invention of computer, human-computer interaction has been one of the most interesting areas from both academic and industrial viewpoints. The ease of this communication is a basic need for a user of any computer systems. According to phenomenon of data explosion, one of the most commonly used computer systems are information systems such as information kiosks where a user refers to it to gain information in a specific domain. The simplest way to communicate with an information system is to use natural language. For this purpose, spoken dialogue systems are developed that communicate with a user in an interactive environment in order to provide suitable information for the user. A typical dialogue system consists of five distinct modules: automatic speech recognizer, spoken language understanding module, dialogue manager, text generator and text to speech synthesizer. These modules are not perfect and have some errors in generating their outputs. Because of these errors, a commercial dialogue system is not developed yet and academic studies are conducted to improve the accuracy of each module separately. To palliate the need of dialogue systems, Interactive Voice Response (IVR) systems are emerged instead which consists of the same five modules, but each module is implemented in a more limited level and thus the accuracy is improved. Traditionally, touch tone IVR systems are used where a menu is read for the user and he/she uses the buttons on the phone keypad to interact with the system according to the read menu. With improvement of speech recognition module specifically in limited domains, a distinct kind of IVR systems are emerged. These systems are speech enabled IVR systems where the user can say his/her choice and the system recognizes the speech and acts accordingly. With the development of this kind of systems, IVR systems are getting closer to the ultimate dialogue system. In this paper we aim to introduce Niusha, the first Persian speech enabled IVR platform. The main module of an IVR system is its “interaction process manager”. With the use of VoiceXML (VXML) standard for implementing this unit, the whole system can be adapted in different domains easily. The rest of this paper is organized as follows. In Section II speech- enabled IVR systems and the VoiceXML standard are introduced. In Section III Niusha is introduced and the distinct parts of this system are investigated. In section IV the main features of Niusha are introduced and finally in Section V the discussion is concluded and the future works are introduced. II. Concepts In this section we briefly introduce two most important concepts: Interactive voice response systems and VoiceXML standard. A. Interactive Voice Response systems Interactive voice response (IVR) is an automated telephony system that interacts with callers, gathers information and provides the requested information to the caller. An IVR system accepts a combination of voice input and touch-tone keypad selection and provides appropriate responses in the form of voice, fax, callback, e-mail and perhaps other media. An IVR system interacts with its user according to a pre-defined scenario designed in tree structure. User is moved to different states according to his/her answer to the questions being asked by the system. The first generation of IVR systems is the touch-tone IVR systems that read a menu and the caller selects an appropriate choice by pressing a number on the phone keypad. Apparently, this kind of IVR system is incapable to deal with some scenarios. An important limitation of touch-tone IVR systems is that the number of choices must be less than 9. Listening to a menu with several choices exhausts the caller. Usually, a menu with 3 or 4 choices is acceptable. According to this limitation and along with performance improvements in 2010 5th International Symposium on Telecommunications (IST'2010) 978-1-4244-8185-9/10/$26.00 ©2010 IEEE 591