A New Kernel Non-Negative Matrix Factorization and Its Application in Microarray Data Analysis Yifeng Li and Alioune Ngom AbstractNon-negative factorization (NMF) has been a pop- ular machine learning method for analyzing microarray data. Kernel approaches can capture more non-linear discriminative features than linear ones. In this paper, we propose a novel kernel NMF (KNMF) approach for feature extraction and clas- sification of microarray data. Our approach is also generalized to kernel high-order NMF (HONMF). Extensive experiments on eight microarray datasets show that our approach generally outperforms the traditional NMF and existing KNMFs. Prelim- inary experiment on a high-order microarray data shows that our KHONMF is a promising approach given a suitable kernel function. Index Terms— Kernel Non-Negative Matrix Factorization, Microarray Data, Classification, Feature Extraction. I. I NTRODUCTION N ON-NEGATIVE matrix factorization (NMF) has been an important machine learning approach since the work of Lee and Seung [1]. It generally decomposes a non-negative matrix × into two -rank (,) non-negative factors × and × , as formulated in Equation 1: + + + , (1) where + indicates matrix is non-negative. Each col- umn of is approximated by a nonlinear combination of columns of , where the coefficient is the corresponding col- umn in , therefore is called basis matrix, and is called coefficient matrix. NMF sometimes generates sparse factors which is very useful for interpretation. Optimization algo- rithms, such as multiple update rules [2] and non-negative least squares [3], have been devised to solve the non-convex problem in Equation 1. Many variants, including sparse-NMF [4], semi-NMF [5], convex-NMF [5], orthogonal-NMF [6], and weighted-NMF [7], have been proposed in literature. Two kernel NMF (KNMF) extensions have been proposed in [17] and [5]. We shall introduce these two approaches in Section II. NMF can be applied as clustering [8], feature extraction [9], feature selection [10], and classification [11] approaches. NMF has also been generalized to high-order NMF(HONMF) to factorize tensor data in [12]. The defini- tion of tensor will be give later. Microarray technique has been developing for over one decade [13]. It can conveniently monitor the activities of thousands of genes through measuring the abundance of Yifeng Li and Alioune Ngom are with the School of Computer Science, University of Windsor, Windsor, Ontario, Canada (email: {li11112c, an- gom}@uwindsor.ca). This research has been supported by IEEE CIS Walter Karplus Summer Research Grant 2010, Ontario Graduate Scholarship 2011-2012, and Cana- dian NSERC Grants #RGPIN228117-2011. the corresponding mRNA. Numerous microarray datasets have been produced from diverse tissues and species under different conditions for various purposes. We categorize them into three types. If the gene expression levels of different samples are measured once, this results in the static gene- sample data. If the snap-shots of the gene activities of one or multiple similar samples are taken in a sequence of time points, a gene-time-series dataset is produced. The third type is called high-order tensor data which are much more com- plicated. The definition of tensor in tensor/multilinear algebra is the generalization of matrix and vector from matrix/linear algebra [14]. The order of a tensor is the number of axes needed to hold it. A vector is an 1-order tensor. A matrix is a 2-order tensor. The aforementioned gene-sample and gene-time data are hence 2-order tensors. A gene-sample- time (GST) dataset is a 3-order tensor. GST data are the combination of gene-sample and gene-time data. It can be defined as the gene expression levels of different samples are measured across the time. For each sample, it forms a gene-time matrix. Microarray technique has been widely applied in laboratories for genomic studies and medical diagnosis. Machine learning is the main computational tool to analyze microarray data. Clustering samples or genes can discover subtypes of a disease and genomic patterns. Feature selection can be applied to biomarker identification. New discriminative features as the combination of existing features can be generated by feature extraction. Classification approaches coupled with feature selection or feature extrac- tion are applied to predict diseases. However, it has many issues in microarray data. The issues include high noise, missing values, high dimensionality, sparse and few sampling time points, to name a few. These issues led to many challenging computational problems such as low accuracy, expensive computational cost, mathematical difficulty, poor scalability, and so on. NMF has been applied as an important machine learning tool in the aspects of clustering [8], feature extraction [10], feature selection [10], and classification [15], for microarray data analysis. HONMF has also been used as a novel feature extraction method of GST data in drug/dose response prediction [16]. Generally speaking, kernel approaches can capture more nonlinear information than their linear counterparts, and therefore might improve the performance of applications. In this paper, we proposed a new kernel approach which is the extension of semi-NMF, and applied it to feature extraction and classification for gene-sample data. We also propose an approach of kernel HONMF, and use it as feature extraction method for GST data. 978-1-4673-1191-5/12/$31.00 ©2012 IEEE 371