Analysis Modification synthesis based Optimized Modulation Spectral Subtraction for speech enhancement Pavan D. Paikrao 1* , Sanjay L. Nalbalwar 2 , Abstract—Traditional analysis modification synthesis (AMS) is fairly applied for spectral subtraction along with Short Time Fourier Transform. Based on this AMS method, we proposed an approach for modified modulation spectral subtraction. Results reported in previous studies shows that the modulation spectral subtraction performs better for speech courted by additive white Gaussian noise to improve speech quality. It gives improved speech quality scores in stationary noise, but it fails to give improved speech quality in the real time noise environment. Also, the computational cost of existing modulation domain spectral subtraction methods is high. Thus we propose an approach of applying minimum statistics noise estimation technique on the real modulation magnitude spectrum along with optimized noise suppression factor and spectral floor to improve speech quality in the real time noise environment. Finally, the objective, subjective and intelligibility evaluation metrics of speech enhancement indicates that the proposed method achieves better performance than the existing spectral subtraction algorithms across different input SNR and noise type along with improved computational time. Computation time is improved by 57.13% as compared to traditional modulation domain spectral subtraction method. The modulation frame duration of 128 ms is found to be a good compromise between shorter and longer frame duration, which gives improved results. Keywords—Optimized modulation spectral subtraction, speech enhancement, Analysis modification synthesis, Noise. I. INTRODUCTION The use of speech enhancement has a spurred great interest in many fields such as speech recognition, feature extraction, hearing aid devices, etc. Human exhibits great capability to differentiate various sounds in noisy environments. But, unfortunately performance of these speech enhancement systems decays when speech is corrupted with stationary or non-stationary background distortions. Speech enhancement is nothing but a process of improving the quality of noisy speech. It means a speech enhancement system reduces that additive noise which corrupts the original speech and makes it annoying to the listener. Thus, in noisy environment conditions there is a crucial need to improve the performance of these systems. Several researchers have proposed different classical speech enhancement techniques [1,2,3,4,5] which remove additive noise. 1 Pavan D. Paikrao is with Department of Electronics & Tele Comm. Engg., Dr. Babasaheb Ambedkar Technological University, Lonere, Dist. Raigad, MS, India. (Corresponding author e-mail: pavan242batu@gmail.com 2 Sanjay L. Nalbalwar is with Department of Electronics & Tele Comm. Engg., Dr. BabasahebAmbedkar Technological University, Lonere, Dist. Raigad, MS, India. The generalized approach for speech enhancement algorithm is to modify or enhance spectral component and reduce background noise. The spectral subtraction method proposed by Berouti [1] and [2] is classical noise suppression methods. These methods use a spectral floor threshold and noise suppression factor which governs the amount of over subtraction in accordance with the SNR level of the input noisy signal. It reported different values of noise suppression factors so as to have different efficient noise suppression paradigm. It is the subject of research to adjust these parameters in different noisy environmental conditions for enhanced speech quality. Over last few decenniums, many speech enhancement methods have been investigated that includes time and frequency domain modifications. According to Kamath’s Multi Band Spectral Subtraction (MBSS) [6], the speech signal is not affected uniformly by additive noise over the entire spectrum. Low frequency components which contain most of the speech signal energy get affected more easily than high frequency components by noise. In this method, the speech signal is divided into a number of non- overlapping bands and spectral subtraction is carried out independently in each band for speech enhancement. More recently, a phase-aware multi-band complex spectral subtraction (MBCSS) method introduced by [7], deals with single channel speech enhancement by improved phase at low input SNR. MBCSS computes spectral amplitude of clean speech signal using phase of clean and noisy speech signals and uses the estimated phase of the clean speech signal for signal reconstruction in the time domain. MBCSS method can dynamically adapt itself according to the varying levels of non-stationary noise and the phase components of speech. Noise is separated by a single channel source separation technique based on group- delay deviation which is effectively utilized in the spectral subtraction method. Many single channel speech enhancement methods employ analysis, modification synthesis (AMS) technique [8,9,10,11]. AMS framework is applied in acoustic domain spectral subtraction to reduce additive noise. Here, we are dealing with the enhancement of speech corrupted by additive noise. In speech enhancement process, this additive noise can be put into two categories as stationary noise, i.e. additive white Gaussian noise (AWGN) and non-stationary noise (real time background noise). AWGN is linear and Time Invariant. While real time background noise is produced by dynamic environments. For example car noise, train noise, airport noise, or many other man made noise, etc. are non-stationary noises. In a non-stationary environment, noise estimation is a difficult task if the noise power INTERNATIONAL JOURNAL OF CIRCUITS, SYSTEMS AND SIGNAL PROCESSING Volume 11, 2017 ISSN: 1998-4464 343