Speech technology on mobile devices for solving the digital divide Samson Lupembe, Daniel Mashao Department of Electrical Engineering, University of Cape Town Rondebosch, Cape Town, South Africa slupembe@crg.ee.uct.ac.za Daniel@eng.uct.ac.za ABSTRACT In helping people to access information the use of speech technology presents a very attractive alternative to other methods. In South Africa, there are many people who cannot access computer information due to language and educational level, yet over 15 million of these people own or use cellphones. This paper discusses the work we are doing to enable these users to access information using their speech. We looked at three ways in which the system can be implemented and chose an implementation that will have fewer requirements in terms of the cellphones users can use. We used the HTK toolkit to experiment with system and found that the idea is practical. The next step would be to design the complete system. 1. INTRODUCTION This is a work in progress paper covering the work that I will be doing towards my Masters degree. I will be looking into a technology that will assist in solving the digital divide. One of the goals of the South African government is to provide services to its citizens via internet technology. Everyone agrees on one thing: someday, in the future, we’ll be talking with computers as easily as we do with humans [1]. For the past couple of years, the South African government has voiced the intention of offering services of an e- government organization a body where information can be accessed at any time by phone or Internet, with public Internet kiosks provided for universal access [2]. This is seen as a process that will enrich and benefit the citizens. SITA, the state information technology agency is tasked with the development of such a service. According to SITA, the current model of state services requires that users be integrators of services. That is a user wanting to start a business has to go to over three departments to get the go ahead. For example the user might need to visit the South African Revenue Services (SARS) to ensure that their tax status is correct, visit a police station or the justice department to confirm that they have no criminal record, and maybe also go to the Department on Trade to register a close corporation and may be go to another department for zoning issues where the business is going to be located. The citizen must do all this and it is inefficient and wastes time. In the model where the government is the integrator the use will be presented with one point of contact where all the forms could be filled and the citizen could save on transport costs and time. It will also be easier for the government to have records of the citizen’s activities. All this is wonderful if the citizens can access the information and make use of it. In South Africa where the majority of the people have no access to computer technology and are not fluent in English this may present a problem and contribute to increasing the digital divide between the haves and the have nots. SITA’s method of enabling citizens to access this information is by building kiosks in all public places such as schools, libraries, shopping malls and other public places. This system will provide a center of excellence in SITA for the skills needed to support clients in the e-Government effort. It will also facilitate the integrative deployment and use of the e- Government concept throughout the Government [3]. The ideal of having kiosks will cost by building the kiosks. As well if kiosks ideal is to be implemented most probably people in the rural areas might not be reached by the services that will be offered by the kiosks because the kiosks would not be build every where. Lack of experience in some individuals will cause them to have fear in experimenting using the kiosk because they would not want to look inexperienced to others. In this study we propose to use speech technology to help enable SITA’s plans. We note that the cellular technology has grown so much in the country. Looking at the availability of the cellphone to the citizens of South Africa, we see the opportunity of facilitating the e-government by using the technology that already exists and that most people are comfortable with which could even allow them to speak in their own languages. Now speech technology has matured to the point where it can be considered for practical applications. The two areas of speech technology that we will need in order to satisfy the requirements for speech access are the automatic speech recognition (ASR) and the text-to-speech (TTS) systems. The ASR is still experiencing active development and works well in noise free environments. TTS is use to speak out the text. TTS is more matured than ASR although it sounds like a computer. Work is being done to make systems that are natural sounding. Our major challenge is the set-up of the system. The system can be set-up in at least three different ways. This is discussed in Section 2 in detail. The other challenge is the performance of the systems. We have done a