International Journal of Electrical and Computer Engineering (IJECE) Vol. 9, No. 4, August 2019, pp. 3194 3202 ISSN: 2088-8708, DOI: 10.11591/ijece.v9i4.pp3194-3202 3194 UCSY-SC1: A Myanmar Speech Corpus for Automatic Speech Recognition Aye Nyein Mon 1 , Win Pa Pa 2 , Ye Kyaw Thu 3 1,2 Natural Language Processing Lab, University of Computer Studies, Yangon 3 Language and Speech Science Research Lab., Waseda University, Japan Article Info Article history: Received Dec 18, 2018 Revised Feb 15, 2019 Accepted Mar 21, 2019 Keywords: Automatic Speech Recognition Myanmar Language Speech Corpus Convolutional Neural Network (CNN) ABSTRACT This paper introduces a speech corpus which is developed for Myanmar Au- tomatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in de- veloping the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low- resourced language because of lack of pre-created resources for speech pro- cessing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily con- versations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data. The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three diﬀerent types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on dif- ferent data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisﬁable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2. Copyright 201x Insitute of Advanced Engineeering and Science. All rights reserved. Corresponding Author: Aye Nyein Mon Natural Language Processing Lab, University of Computer Studies, Yangon, Myanmar. Email: ayenyeinmon@ucsy.edu.mm 1. INTRODUCTION Speech is the most natural form of communication among humans. Numerous spoken lan- guages are employed throughout the world. As communication among human beings is mostly done vocally, it is natural for people to expect speech interfaces with the computer. Automatic speech recognition (ASR) means the conversion of spoken words into computer text. A lot of automatic speech recognition research is currently being conducted by the researchers around the world for their languages [1] [2]. Current ASR system use statistical models constructed on speech data. Therefore, speech corpus is important for statistical model based automatic speech recognition and it aﬀects the performance of a speech recognizer. For well-resourced languages, speech researchers have used publicly available resources from online. However, for low-resourced languages, they have to build Journal homepage: http://iaescore.com/journals/index.php/IJECE