SPARSE LOADING NOISY PCA USING AN l
0
PENALTY
M.O. Ulfarsson
†
and V. Solo
‡
†
University of Iceland, Dept. Electrical Eng., Reykjavik, ICELAND
‡
University of New South Wales, School of Electrical Eng., Sydney, AUSTRALIA
ABSTRACT
In this paper we present a novel model based sparse princi-
pal component analysis method based on the l
0
penalty. We
develop an estimation method based on the generalized EM
algorithm and iterative hard thresholding and an associated
model selection method based on Bayesian information cri-
terion (BIC). The method is compared to a previous sparse
PCA method using both simulated data and DNA microarray
data.
Index Terms— principal component analysis, sparsity,
estimation.
1. INTRODUCTION
Sparse principal component analysis is a relatively recent ex-
tension of traditional principal component analysis (PCA).
The idea is to seek sparse component loadings while explain-
ing as much variance of the data set as possible.
There are two kinds of sparse PCA: sparse variable PCA
(svPCA) [1, 2] and sparse loading PCA (slPCA) [3, 4, 5, 6, 7].
svPCA removes some of the original variables completely by
simultaneously zeroing out all their loadings. slPCA keeps all
of the original variables but zeroes out some of their loadings.
Sparse PCA has, for example, been found useful for genomic
data sets [4, 5, 6, 7] and medical imaging data [1, 2].
In this paper we develop a novel model based slPCA
method using the l
0
penalty which we call sparse loading
noisy PCA (slnPCA). An estimation method based on the
generalized EM algorithm is constructed. We modify BIC [8]
to choose the associated tuning parameters. The method is
compared to the sparse loading PCA method in [4] by using
simulations, and a DNA expression microarrays data set.
In section 2 we review noisy PCA (nPCA). The slnPCA
method is described in section 3, where we develop an esti-
mation and model selection method. We also give formulas
for computing the explained variance of the slnPCA. Section
4 gives simulation examples and a DNA microarray data ex-
ample. Finally in section 5 conclusions are drawn.
1.1. Notation
Vectors and matrices are denoted by boldface letters. x ∼
N (m, C) means that the random vector x is Gaussian dis-
tributed with mean m and covariance C. I (x = 0) is the
indicator function that is equal to 1 if x is not equal to zero
and equal to 0 else. If G is an M × r matrix we denote its
column vectors by g
(j)
,j =1, ..., r and its row vectors by
g
T
i
,i =1, ..., M . ⊙ is the Hadamard product.
2. NOISY PCA
Noisy PCA (nPCA) [9] is a structured covariance model
whose maximum likelihood estimator yields PCA. The model
is given by
y
t
= Gu
t
+ ǫ
t
, t =1, ..., T , (1)
where y
t
is a zero-mean M × 1 vector, G is an M × r loading
matrix, ǫ
t
∼ N (0,σ
2
I
M
), u
t
∼ N (0, I
r
), and ǫ
t
and u
t
are
uncorrelated. The normalized (divided by T ) log-likelihood
function is given by
l
θ
(Y )= −
1
2
tr(S
y
Ω
−1
) −
1
2
log |Ω|,
where Ω = GG
T
+ σ
2
I
M
, Y =[y
1
, ..., y
T
]
T
, S
y
=
1
T
YY
T
and θ =(vec(G)
T
,σ
2
)
T
. It can be shown that the
maximum likelihood estimates of G and σ
2
are given by
ˆ
G = P
r
(L
r
− ˆ σ
2
I
r
)
1/2
R
T
ˆ σ
2
=
1
M − r
M
j=r+1
l
j
,
where L
r
= diag(l
1
, ..., l
r
) is a diagonal matrix containing
the r largest eigenvalues of S
y
; P
r
contains the r first eigen-
vectors of S
y
in its columns; R is an arbitrary rotation matrix.
The estimated noisy principal components (nPCs)
ˆ
U =
[ˆ u
1
, ..., ˆ u
T
]
T
are given by
ˆ
U = Y
ˆ
GW
−1
, where W =
ˆ
G
T
ˆ
G +ˆ σ
2
I
r
.
3. SPARSE LOADING NPCA
We obtain sparse solutions by penalizing the number of en-
tries in G. The penalized likelihood is then
J
θ
(Y )= −l
θ
(Y )+
h
2
‖G‖
0
,
3597 978-1-4673-0046-9/12/$26.00 ©2012 IEEE ICASSP 2012