Frequency Analysis of Spoken Urdu Numbers Using MATLAB and Simulink S K Hasnain *, Azam Beg ** and Muhammad Samiullah Awan *** Pakistan Navy Engineering College (NUST), Karachi-75350(Pakistan) Abstract This paper describes the frequency analysis of spoken Urdu numbers from ‘sifr’ (zero) to ‘nau’ (nine). Sound samples from multiple speakers were utilized to extract different features. Initial processing of data, i.e., normalizing and time- slicing was done using a combination of Simulink and MATLAB. Afterwards, the same tools were used for calculation of Fourier descriptions and correlations. The correlation allowed comparison of the same words spoken by the same and different speakers. The analysis presented in this paper is seen as the first step in creating an Urdu speech recognition system. Such a system can be potentially utilized in implementation of a voice-driven help setup at call centers of commercial organizations operating in Pakistan/India region. Keywords: Spoken Urdu number processing, Fourier descriptors, Correlation, Speaker independent system, Feature extraction, Simulation. I. INTRODUCTION Automatic speech recognition has been an active research topic for more than four decades. With the advent of digital computing and signal processing, the problem of speech recognition was clearly posed and thoroughly studied. These developments were complemented with an increased awareness of the advantages of conversational systems. The range of the possible applications is wide and includes: voice-controlled appliances, fully featured speech-to-text software, automation of operator-assisted services, and voice recognition aids for the handicapped [1]. The speech recognition problem has sometimes been treated as a speech-to-text conversion problem. Many researchers have worked in this regard. Some commercial software is also available in the market for speech recognition, but mainly in English and other European languages. Correlation exists between objects, phenomena, or signals and occurs in such a way that it cannot be by chance alone. * Author for correspondence. E.mail<hasnain@pnec.edu.pk> ** College of Information Technology, UAE University Al-Ain, UAE. E.mail:<abeg@uaeu.ac.ae> ***Iqra University, Karachi Email:< msuawan@yahoo.com> Unconsciously, the correlation is used every day life. When one looks at a person, car or house, one’s brain tries to match the incoming image with hundreds (or thousands) of images that are already stored in memory [2]. We based our current work on the premise that same word spoken by different speakers is correlated in frequency domain. In the speech recognition research literature, no work has been reported on Urdu speech processing. So we consider our work to be the first such attempt in this direction. The analysis has been limited to number recognition. The process involves extraction of some distinct characteristics of individual words by utilizing discrete (Fourier) transforms and their correlations. The system is speaker-independent and is moderately tolerant to background noise. 2. REVIEW OF DISCRETE TRANSFORMATION & ITS MATLAB IMPLEMENTATION Discrete Fourier transform (DFT) is itself a sequence rather than a function of continuous variable and it corresponds to equally spaced frequency samples of discrete time Fourier transform of a signal. Fourier series representation of the periodic sequence corresponds to discrete Fourier transform of finite length sequence. So we can say that DFT is used for transforming discrete time sequence x(n) of finite length into discrete frequency sequence X[k] of finite length. This means that by using DFT, the discrete time sequence x(n) is transformed into corresponding discrete frequency sequence X[k][2]. DFT is a function of complex frequency. Usually the data sequence being transformed is real. A waveform is sampled at regular time intervals T to produce the sample sequence of N sample values, where n is the sample number from n=0 to n=N-1. { } [ ] T N x T x x nT x ) 1 ( ),..., ( ), 0 ( ) ( − = The data values will be real only when representing the values of a time series such as a voltage waveform. The DFT of is then defined as the sequence of complex values ) (nT x ) (nT x { } [ ] ω ϖ ω ) 1 N ( X ), ........ ( X ), 0 ( X ] k [ X − = in the frequency domain, where ω is the first harmonic frequency given by . NT / 2π ω = Thus ] k [ X ω has real and imaginary components in general, so that for the kth harmonic