Niusha, the first Persian speech-enabled IVR
platform
M.H. Bokaei
†
, H. Sameti
†
, H. Eghbal-zadeh
††
, B. BabaAli
†
, KH. Hosseinzadeh
††
, M. Bahrani
†
, H. Veisi
†
, A. Sanian
††
†
Speech Processing Lab, Sharif University of Technology, Tehran, Iran
{Bokaei, Babaali, Bahrani, Veisi}@ce.sharif.edu, Sameti@sharif.edu
††
ASR-Gooyesh Pardaz Company, Tehran, Iran
{h.eghbalzadeh, kh.hosseinzadeh, a.sanian}@asr-gooyesh.com
Abstract—This paper introduces Niusha, the first Persian
speech-enabled IVR platform. This platform uses Persian
recognizer and Persian text-to-speech synthesizer engines in
order to interact with users. The platform is designed in a way
that it can simply be customized in various domains and its
components are adjustable with new words.
Keywords-component; speech-enabled IVR systems;
VoiceXML; dialogue system;
I. Introduction
Since the invention of computer, human-computer
interaction has been one of the most interesting areas from
both academic and industrial viewpoints. The ease of this
communication is a basic need for a user of any computer
systems. According to phenomenon of data explosion, one of
the most commonly used computer systems are information
systems such as information kiosks where a user refers to it to
gain information in a specific domain. The simplest way to
communicate with an information system is to use natural
language. For this purpose, spoken dialogue systems are
developed that communicate with a user in an interactive
environment in order to provide suitable information for the
user.
A typical dialogue system consists of five distinct
modules: automatic speech recognizer, spoken language
understanding module, dialogue manager, text generator and
text to speech synthesizer. These modules are not perfect and
have some errors in generating their outputs. Because of these
errors, a commercial dialogue system is not developed yet and
academic studies are conducted to improve the accuracy of
each module separately. To palliate the need of dialogue
systems, Interactive Voice Response (IVR) systems are
emerged instead which consists of the same five modules, but
each module is implemented in a more limited level and thus
the accuracy is improved.
Traditionally, touch tone IVR systems are used where a
menu is read for the user and he/she uses the buttons on the
phone keypad to interact with the system according to the read
menu. With improvement of speech recognition module
specifically in limited domains, a distinct kind of IVR systems
are emerged. These systems are speech enabled IVR systems
where the user can say his/her choice and the system
recognizes the speech and acts accordingly. With the
development of this kind of systems, IVR systems are getting
closer to the ultimate dialogue system.
In this paper we aim to introduce Niusha, the first Persian
speech enabled IVR platform. The main module of an IVR
system is its “interaction process manager”. With the use of
VoiceXML (VXML) standard for implementing this unit, the
whole system can be adapted in different domains easily. The
rest of this paper is organized as follows. In Section II speech-
enabled IVR systems and the VoiceXML standard are
introduced. In Section III Niusha is introduced and the distinct
parts of this system are investigated. In section IV the main
features of Niusha are introduced and finally in Section V the
discussion is concluded and the future works are introduced.
II. Concepts
In this section we briefly introduce two most important
concepts: Interactive voice response systems and VoiceXML
standard.
A. Interactive Voice Response systems
Interactive voice response (IVR) is an automated
telephony system that interacts with callers, gathers
information and provides the requested information to the
caller. An IVR system accepts a combination of voice input
and touch-tone keypad selection and provides appropriate
responses in the form of voice, fax, callback, e-mail and
perhaps other media. An IVR system interacts with its user
according to a pre-defined scenario designed in tree structure.
User is moved to different states according to his/her answer
to the questions being asked by the system.
The first generation of IVR systems is the touch-tone IVR
systems that read a menu and the caller selects an appropriate
choice by pressing a number on the phone keypad.
Apparently, this kind of IVR system is incapable to deal with
some scenarios. An important limitation of touch-tone IVR
systems is that the number of choices must be less than 9.
Listening to a menu with several choices exhausts the caller.
Usually, a menu with 3 or 4 choices is acceptable. According
to this limitation and along with performance improvements in
2010 5th International Symposium on Telecommunications (IST'2010)
978-1-4244-8185-9/10/$26.00 ©2010 IEEE 591