1 Abstract — A novel unsupervised machine learning algorithm for single channel source separation (SCSS) is presented. The proposed method is based on nonnegative matrix factorization which is optimized under the framework of maximum a posteriori (MAP) probability and Itakura-Saito (IS) divergence. The method enables a generalized criterion for variable sparseness to be imposed onto the solution and prior information to be explicitly incorporated through the basis vectors. In addition, the method is scale invariant where both low and high energy components of a signal are treated with equal importance. The proposed algorithm is a more complete and efficient approach for matrix factorization of signals that exhibit temporal dependency of the frequency patterns. Experimental tests have been conducted and compared with other algorithms to verify the efficiency of the proposed method. I. INTRODUCTION ONNEGATIVE Matrix Factorization (NMF) is an emerging machine learning technique [1-5] for data mining, dimensionality reduction, pattern recognition, object detection, classification, and blind source separation (BSS) [6-9]. In recent times, single channel source separation (SCSS) is becoming more important especially using matrix factorization methods [10–28]. The SCSS problem can be treated with one observation and several unknown sources, namely: 1 () () I i i yt xt    (1) where 1, , i I  denotes the number of sources and 1, 2, , t T  denotes time index and the goal is to estimate the sources () i xt when only the observation signal () yt is available. NMF-based methods exploit an appropriate time-frequency (TF) analysis on the mono input recordings, yielding a TF representation. The decomposition is usually sought after through the minimization problem   .2 , min , subject to 0, 0 D   DH Y DH D H (2) where .2 s FT    Y is the power TF representation of mixture () yt while F I    D and s IT    H are two nonnegative matrices. F and s T represent total frequency units and time slots, Bin Gao is with the School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China. W. L. Woo is with the School of Electrical and Electronic Engineering, Newcastle University, England, United Kingdom. Bingo W-K Ling is with the Faculty of Engineering, Guangdong University of Technology, China. Correspondence author email address: w.l.woo@ncl.ac.uk respectively in the TF domain. The matrix D can be compressed and reduced to its integral components such that it contains a set of spectral basis vectors, and H is a code matrix which describes the amplitude of each basis vector at each time point. The distance function   .2 , D Y DH is separable measure of fit. Commonly used cost functions for NMF are the generalized Kullback-Leibler (KL) divergence and Least Square (LS) distance [12]. NMF decomposition is not unique [14] and to overcome this limitation, a sparseness constraint [15, 16] can be added to the cost function. This can be achieved by regularization using the L 1 -norm. Over the few years, several types of prior over D and H have been proposed and maximum a-posteriori (MAP) criterion is used to optimise the spectral basis, code and prior parameters. These methods include the followings: NMF with Temporal Continuity and Sparseness Criteria [15] (NMF-TCS) based on factorizing the magnitude spectrogram of the mixed signal into a sum of components, which include the temporal continuity and sparseness criteria into the separation framework. Automatic Relevance Determination NMF (NMF-ARD) [27, 28] exploits a hierarchical Bayesian framework sparse NMF that amounts to imposing an exponential prior for pruning and thereby enables estimation of the NMF model order. Bayesian NMF methods using Gamma distribution prior have also been proposed in [25]. Regardless of the cost function and different prior constraint being used, the standard NMF or MAP NMF models [27, 28, 31] are only satisfactory for solving source separation provided that the spectral frequencies of the audio signal do not change over time. However, this is not the case for many realistic audio signals. As a result, the spectral basis obtained via the NMF or MAP NMF decomposition is not adequate to capture the temporal dependency of the frequency patterns within the signal. In addition, most methods developed so far work only for music separation and have some important limitations that explicitly employ some prior knowledge about the sources. As a consequence, those methods are able to deal only with a very specific set of signals and situations. In recent years, research has been undertaken to extend the sparse NMF to a two-dimensional convolution of D and H which culminated to the SNMF2D [16]. This allows the SNMF2D to capture both the temporal structure and the pitch change of a source. However, the drawbacks of SNMF2D originate from its lack of a generalized criterion for controlling the sparsity of H . In practice, the sparsity parameter is set manually. SNMF2D imposes uniform sparsity on all temporal codes and this is equivalent to enforcing each temporal code to be identical to a fixed distribution according to the selected sparsity parameter. In addition, by assigning the fixed distribution onto each individual code this inevitably constrains all codes to be stationary. However, audio signals are Machine Learning Source Separation using Maximum A Posteriori Nonnegative Matrix Factorization Bin Gao, Member, IEEE, W.L. Woo, Senior Member, IEEE, and Bingo W-K. Ling, Senior Member, IEEE N Published as Bin Gao, W.L. Woo and Bingo W-K. Ling, “Machine Learning Source Separation using Maximum A Posteriori Nonnegative Matrix Factorization,” IEEE Trans. on Cybernetics, vol. 44, no. 7, pp. 1169-1179, 2014.