Versatile sparse matrix factorization: Theory and applications Yifeng Li a,n , Alioune Ngom b a Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, Canada BC V5Z 4H4 b School of Computer Science, University of Windsor, Windsor, Canada ON N9B 3P4 article info Article history: Received 24 December 2013 Received in revised form 17 May 2014 Accepted 17 May 2014 Keywords: Versatile sparse matrix factorization Non-negative matrix factorization Sparse representation Feature extraction Feature selection Biological processes identiﬁcation abstract In the recent years, non-negative matrix factorization and sparse representation models have been successfully applied in high-throughput biological data analysis due to its interpretability and robustness to noise. In this paper, we propose a uniﬁed matrix factorization model, coined versatile sparse matrix factorization (VSMF) model, for biological data analysis. We discuss the modelling, optimization, and applications of VSMF. We show that many well-known sparse matrix factorization models are speciﬁc cases of our VSMF. Through tuning parameters, sparsity, smoothness, and non-negativity can be easily controlled in VSMF. Our computational experiments for feature extraction, feature selection, and clustering corroborate the advantages of VSMF. & 2014 Elsevier B.V. All rights reserved. 1. Introduction Non-negative matrix factorization (NMF) [14] and dictionary learning in sparse representation (SR) [6] are important members of the family of low-rank sparse matrix factorization models. They decompose a matrix into a basis matrix (or called dictionary) and a coefﬁcient matrix. Each column of the basis matrix is called a basis vector (or meta-sample, or dictionary atom). Similarly, each column of the coefﬁcient matrix is a coefﬁcient vector. They have been applied in various analyses of high-throughput biological data. A prevalent application of NMF is data clustering [3]. The basis idea is that the largest coefﬁcient in a coefﬁcient vector indicates which class the corresponding sample belongs to. Thus, NMF can be applied as either a crisp or soft clustering method. The columns of the input matrix are grouped according to the columns of the coefﬁcient matrix, whereas the rows of the input matrix can be clustered according to the rows of the basis matrix. This results in a NMF-based biclustering methodology [4]. Just as factor analysis, principal component analysis, and inde- pendent component analysis, NMF and SR can be used as a dimensionality reduction technique [19]. The principle is that a coefﬁcient vector is the representation of a sample (column) of the input matrix in the feature space spanned by the basis vectors. Furthermore, analyzing the basis vectors and coefﬁcient vectors helps us to identify candidate patterns for further investigation. In this effort, NMF has been applied in biological process identi- ﬁcation [13] and transcriptional regulatory network inference [21]. Many variants of NMF and SR have been invented for various considerations. We name some representatives as follows. These variants are compared in details in Section 2. Semi-NMF is proposed in [5] for data of mixed signs. It extends the applicability of NMF. Sparse NMF is introduced to guarantee sparse results [11] in either the basis matrix or coefﬁcient matrix. We propose kernel NMF in [17] to deal with nonlinearity in microarray data. Its dimension-free property is beneﬁcial for optimization. Thus, kernel NMF also works for relational and interaction data. While NMF uses non-negativity to induce sparsity, l 1 -norm is also applied to produce sparse results in the l 1 -regularized dictionary learning (DL-l 1 LS) models [19]. The difference between NMF and DL-l 1 LS is that the latter allows negative values. However, the following challenges have not been well addressed yet. First, a uniﬁed model is very necessary for these variants from both theoretical and practical perspectives. Second, sparsity is usually constrained on the coefﬁcient matrix, while the sparsity of basis matrix is not guaranteed in most sparse models. Third, l 1 -norm is the most popular way to induce sparsity. However, it does not guarantee that a group of correlated variables can be selected or discarded simultaneously. In this paper, in order to address these challenges, we propose a versatile sparse matrix factorization (VSMF) model. The contribu- tions of this study includes 1. With its six parameters, VSMF can easily control sparsity, smoothness, and non-negativity on both basis matrix and Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing http://dx.doi.org/10.1016/j.neucom.2014.05.076 0925-2312/& 2014 Elsevier B.V. All rights reserved. n Corresponding author. E-mail addresses: yifeng@cmmt.ubc.ca (Y. Li), angom@uwindsor.ca (A. Ngom). Please cite this article as: Y. Li, A. Ngom, Versatile sparse matrix factorization: Theory and applications, Neurocomputing (2014), http: //dx.doi.org/10.1016/j.neucom.2014.05.076i Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎