1018 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 7, JULY 2016
Feature Selection Embedded Subspace Clustering
Chong Peng, Zhao Kang, Ming Yang, and Qiang Cheng, Senior Member, IEEE
Abstract—We propose a new subspace clustering method that
integrates feature selection into subspace clustering. Rather than
using all features to construct a low-rank representation of the data,
we find such a representation using only relevant features, which
helps in revealing more accurate data relationships. Two variants
are proposed by using both convex and nonconvex rank approxi-
mations. Extensive experimental results confirm the effectiveness
of the proposed method and models.
I. INTRODUCTION
H
IGH-DIMENSIONAL data draw increasing attentions in
numerous applications of machine learning and data min-
ing. As such data often lie on low-dimensional structures rather
than uniformly distributed, it is helpful to reveal and preserve
latent structures of such data by recovering low-dimensional
subspaces. An important topic of low-dimensional subspace re-
covery is subspace clustering, which groups data points into
different clusters in different low-dimensional subspaces.
Recently different approaches have been attempted for sub-
space clustering. For example, sparse subspace clustering (SSC)
[1], [2] seeks a sparse representation of the data, low-rank rep-
resentation (LRR) [3], [4], and low-rank subspace clustering
(LRSC) [5], [6] try to find low-rank subspaces with low-rank
representations. These subspace clustering methods are capa-
ble of building data relationships in the original feature space
by representing an arbitrary observation with a linear combi-
nation of all data points. These methods explore the structures
of high-dimensional data using all features, which, however, is
potentially problematic. This is because for high-dimensional
data usually there are many irrelevant features that interfere and
degrade the learning performance [7], [8]. In fact, feature selec-
tion has been proven important for learning on high-dimensional
data and, thus, is often applied as a preprocessing step. It is
noted that unsupervised feature selection is often transformed
into a supervised one by the existing unsupervised feature se-
lection methods, such as multiclass feature selection [9], which
requires clustering for preprocessing. Moreover, it is found to
be inappropriate for some problems to involve a two-step pro-
cessing procedure [10], [11]. In this letter, we propose to simul-
taneously perform feature selection and subspace clustering in
a single, seamlessly integrated framework to enhance learning
performance.
For low-rank learning, as shown in [12] and [13], replacing
the nuclear norm by some nonconvex approximations of the
rank function can improve the performance for subspace clus-
Manuscript received April 20, 2016; revised May 16, 2016; accepted May 16,
2016. Date of publication May 26, 2016; date of current version June 23, 2016.
This work was supported by NSF under Grant IIS-1218712. The associate editor
coordinating the review of this manuscript and approving it for publication was
Prof. Marco Duarte.
The authors are with the Department of Computer Science, Southern Illinois
University Carbondale, Carbondale, IL62901 USA (e-mail: pchong@siu.edu;
zhao.kang@siu.edu; ming.yang@siu.edu; qcheng@cs.siu.edu).
Color versions of one or more of the figures in this letter are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LSP.2016.2573159
tering. In this letter, by integrating a capability of feature selec-
tion, we show that the nuclear norm-based subspace clustering
can achieve significantly improved performance, which is often
comparable to that of using the nonconvex rank approximations.
The main contributions of this letter are summarized as
follows:
1) Feature selection is integrated with the subspace cluster-
ing into a seamless framework, enabling the recovery of
subspaces with the most relevant features.
2) Our framework allows for efficient optimization, each step
of which admits a closed-form solution.
3) Extensive experimental results demonstrate the effective-
ness of the integrated framework and show significant
improvement compared to state-of-the-art algorithms.
4) It is found that, by integrating with feature selection,
the nuclear norm-based subspace clustering can achieve
promising performance comparable to that of using the
nonconvex rank approximations.
II. LOW-RANK REPRESENTATION
Recently, spectral clustering-based methods have been shown
to be effective for subspace clustering applications [1], [5],
[12]–[20]; see [21] for reviews. Briefly, given the data X =
[x
1
,x
2
,...,x
n
] ∈R
d ×n
sampled from a union of K subspaces
in R
d
, typical subspace clustering methods such as SSC and
LRR model the data to be self-expressive with X = XZ + E.
Here E represents the noise or outliers, and Z ∈R
n ×n
is
a weight matrix, where SSC requires Z to be sparse while
LRR requires Z to be low-rank. The problem of LRR is
formulated as
min
Z
‖Z ‖
∗
+ λ‖E‖
ℓ
s.t. X = XZ + E (1)
where ‖Z ‖
∗
is the nuclear norm defined as
∑
n
i =1
σ
Z
i
with σ
Z
i
being the ith largest singular value of Z , ‖E‖
ℓ
can be differ-
ent norms, and λ is a balancing parameter. In the literature, it
has been shown that the nuclear norm cannot approximate the
rank function well if there are dominant singular values [12],
[20] and some nonconvex rank approximations have been de-
veloped to enhance the performance of model (1). A typical
rank approximation is the log-determinant rank approximation
employed in [12], [20], and [13], defined as log det(I + Z
T
Z ),
where I ∈R
n ×n
is an identity matrix.
III. FEATURE SELECTION EMBEDDED SUBSPACE
CLUSTERING (FSC)
Given X, we use a feature selection vector p ∈{0, 1}
d
to find
relevant features by diag(p)X, with which irrelevant features are
made zeros. For the selected data diag(p)X, following [4], it can
be modeled as diag(p)X = diag(p)XZ + E, where Z is low-
rank. Therefore, the problem of recovering a low-rank subspace
from the selected data with M features can be formulated as
min
Z,p
1
2
‖diag(p)X − diag(p)XZ ‖
2
F
+ λ‖Z ‖
∗
s.t. ‖p‖
0
= M
(2)
1070-9908 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.