Abstract — In this paper we present system which is used for creating transcribed speech database in a fast and efficient manner. The system consists of a client application running on Android based mobile phones and a dedicated server. We also present a database of approximately 55 hours of speech created with this system. Keywords — Android, speech database, transcription. I. INTRODUCTION OOD acoustic models are one of the key prerequisites for successful application of speech recognition. In order to train quality acoustic models appropriate speech database is necessary. But the process of creating a database requires a lot of resources. First of all, appro- priate recording equipment must be provided. Second, but not less complicated requirement, is to provide sufficient number of speakers of the appropriate age and gender structure. And at the end of the process recorded speech has to be transcribed. In order for speech recognition to obtain the best results, speech database which is used for acoustic models training must correspond to the real conditions in which recogni- tion technology is applied. That means that vocabulary, level of noise and quality of microphones have to be similar to the ones used in concrete application as much as possible. Many of existing speech databases are recorded in studio environments or over the telephone channels and do not correspond to real-world conditions. For this reason there is a tendency to create speech corpora for specific applications. Some examples are given in [1] and [2]. Additional problem is also the fact that databases have to be collected for each language separately. Our idea was to create a system which will make the This work was supported by the Ministry of Education, Science and Technological Development of Serbia within the Project "Development of Dialogue Systems in Serbian and other South Slavic Languages" (TR- 32035). Siniša Suzić, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, Novi Sad, Serbia (phone 381-21-475-0204, e- mail: sinisa.suzic@uns.ac.rs) Darko Pekar, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, Novi Sad, Serbia (phone 381-21-475-0204, e- mail: darko.pekar@uns.ac.rs) Vlado Delić, Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, Novi Sad, Serbia (phone 381-21-475-0204, e- mail: vlado.delic@uns.ac.rs) whole process of collecting speech samples and transcri- bing them as much automatized and less resource demanding as possible. In the creation of the system we were inspired by the two recent trends. First of them is increase in the popularity of crowdsourcing, defined as “the act of providing a way for non-experts to complete a task that would normally be reserved for experts“ [3]. The second trend, which inspired us, was the constantly increasing number of smartphone users. Predictions say that soon there will be many countries in which more than 40% of population will use smartphones [4]. The rest of the paper is organized as follows. In the Section II we give a detail description of our system. Section III describes the usage scenario for this system. Section IV presents a database collected with the system. In Sections V and VI some final remarks are given. II. SYSTEM DESCRIPTION Our system has client-server architecture. Client is application designed for Android based mobile phones, while the server is written in C# programming language. Communication between client and server is done over TCP sockets. Messages are exchanged over internally designed XML-based protocol. Constant internet connecti- on on the phone side is not required. More details on client application are given in following subsection. A. Client application Client application is standard Android application which can be installed on all Android phones using Android version 2.0 or higher. Application consists of few screens. The one which is used to register a new subject is shown in Fig. 1. Every subject has to enter the following information: Gender Age Subject name, which must be unique and is used for detecting the subject in the corresponding server database Except this information some additional information are also automatically collected: Phone IMEI (unique identification on every phone) Phone model Does it support stereo recording (detail explanation in subsection A-1) On the Realization of AnSpeechCollector, System for Creating Transcribed Speech Database Siniša Suzić, Darko Pekar, and Vlado Delić G