SPARSE LOADING NOISY PCA USING AN l 0 PENALTY M.O. Ulfarsson † and V. Solo ‡ † University of Iceland, Dept. Electrical Eng., Reykjavik, ICELAND ‡ University of New South Wales, School of Electrical Eng., Sydney, AUSTRALIA ABSTRACT In this paper we present a novel model based sparse princi- pal component analysis method based on the l 0 penalty. We develop an estimation method based on the generalized EM algorithm and iterative hard thresholding and an associated model selection method based on Bayesian information cri- terion (BIC). The method is compared to a previous sparse PCA method using both simulated data and DNA microarray data. Index Terms— principal component analysis, sparsity, estimation. 1. INTRODUCTION Sparse principal component analysis is a relatively recent ex- tension of traditional principal component analysis (PCA). The idea is to seek sparse component loadings while explain- ing as much variance of the data set as possible. There are two kinds of sparse PCA: sparse variable PCA (svPCA) [1, 2] and sparse loading PCA (slPCA) [3, 4, 5, 6, 7]. svPCA removes some of the original variables completely by simultaneously zeroing out all their loadings. slPCA keeps all of the original variables but zeroes out some of their loadings. Sparse PCA has, for example, been found useful for genomic data sets [4, 5, 6, 7] and medical imaging data [1, 2]. In this paper we develop a novel model based slPCA method using the l 0 penalty which we call sparse loading noisy PCA (slnPCA). An estimation method based on the generalized EM algorithm is constructed. We modify BIC [8] to choose the associated tuning parameters. The method is compared to the sparse loading PCA method in [4] by using simulations, and a DNA expression microarrays data set. In section 2 we review noisy PCA (nPCA). The slnPCA method is described in section 3, where we develop an esti- mation and model selection method. We also give formulas for computing the explained variance of the slnPCA. Section 4 gives simulation examples and a DNA microarray data ex- ample. Finally in section 5 conclusions are drawn. 1.1. Notation Vectors and matrices are denoted by boldface letters. x ∼ N (m, C) means that the random vector x is Gaussian dis- tributed with mean m and covariance C. I (x = 0) is the indicator function that is equal to 1 if x is not equal to zero and equal to 0 else. If G is an M × r matrix we denote its column vectors by g (j) ,j =1, ..., r and its row vectors by g T i ,i =1, ..., M . ⊙ is the Hadamard product. 2. NOISY PCA Noisy PCA (nPCA) [9] is a structured covariance model whose maximum likelihood estimator yields PCA. The model is given by y t = Gu t + ǫ t , t =1, ..., T , (1) where y t is a zero-mean M × 1 vector, G is an M × r loading matrix, ǫ t ∼ N (0,σ 2 I M ), u t ∼ N (0, I r ), and ǫ t and u t are uncorrelated. The normalized (divided by T ) log-likelihood function is given by l θ (Y )= − 1 2 tr(S y Ω −1 ) − 1 2 log |Ω|, where Ω = GG T + σ 2 I M , Y =[y 1 , ..., y T ] T , S y = 1 T YY T and θ =(vec(G) T ,σ 2 ) T . It can be shown that the maximum likelihood estimates of G and σ 2 are given by ˆ G = P r (L r − ˆ σ 2 I r ) 1/2 R T ˆ σ 2 = 1 M − r M  j=r+1 l j , where L r = diag(l 1 , ..., l r ) is a diagonal matrix containing the r largest eigenvalues of S y ; P r contains the r ﬁrst eigen- vectors of S y in its columns; R is an arbitrary rotation matrix. The estimated noisy principal components (nPCs) ˆ U = [ˆ u 1 , ..., ˆ u T ] T are given by ˆ U = Y ˆ GW −1 , where W = ˆ G T ˆ G +ˆ σ 2 I r . 3. SPARSE LOADING NPCA We obtain sparse solutions by penalizing the number of en- tries in G. The penalized likelihood is then J θ (Y )= −l θ (Y )+ h 2 ‖G‖ 0 , 3597 978-1-4673-0046-9/12/$26.00 ©2012 IEEE ICASSP 2012