1018 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 7, JULY 2016 Feature Selection Embedded Subspace Clustering Chong Peng, Zhao Kang, Ming Yang, and Qiang Cheng, Senior Member, IEEE Abstract—We propose a new subspace clustering method that integrates feature selection into subspace clustering. Rather than using all features to construct a low-rank representation of the data, we ﬁnd such a representation using only relevant features, which helps in revealing more accurate data relationships. Two variants are proposed by using both convex and nonconvex rank approxi- mations. Extensive experimental results conﬁrm the effectiveness of the proposed method and models. I. INTRODUCTION H IGH-DIMENSIONAL data draw increasing attentions in numerous applications of machine learning and data min- ing. As such data often lie on low-dimensional structures rather than uniformly distributed, it is helpful to reveal and preserve latent structures of such data by recovering low-dimensional subspaces. An important topic of low-dimensional subspace re- covery is subspace clustering, which groups data points into different clusters in different low-dimensional subspaces. Recently different approaches have been attempted for sub- space clustering. For example, sparse subspace clustering (SSC) [1], [2] seeks a sparse representation of the data, low-rank rep- resentation (LRR) [3], [4], and low-rank subspace clustering (LRSC) [5], [6] try to ﬁnd low-rank subspaces with low-rank representations. These subspace clustering methods are capa- ble of building data relationships in the original feature space by representing an arbitrary observation with a linear combi- nation of all data points. These methods explore the structures of high-dimensional data using all features, which, however, is potentially problematic. This is because for high-dimensional data usually there are many irrelevant features that interfere and degrade the learning performance [7], [8]. In fact, feature selec- tion has been proven important for learning on high-dimensional data and, thus, is often applied as a preprocessing step. It is noted that unsupervised feature selection is often transformed into a supervised one by the existing unsupervised feature se- lection methods, such as multiclass feature selection [9], which requires clustering for preprocessing. Moreover, it is found to be inappropriate for some problems to involve a two-step pro- cessing procedure [10], [11]. In this letter, we propose to simul- taneously perform feature selection and subspace clustering in a single, seamlessly integrated framework to enhance learning performance. For low-rank learning, as shown in [12] and [13], replacing the nuclear norm by some nonconvex approximations of the rank function can improve the performance for subspace clus- Manuscript received April 20, 2016; revised May 16, 2016; accepted May 16, 2016. Date of publication May 26, 2016; date of current version June 23, 2016. This work was supported by NSF under Grant IIS-1218712. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Marco Duarte. The authors are with the Department of Computer Science, Southern Illinois University Carbondale, Carbondale, IL62901 USA (e-mail: pchong@siu.edu; zhao.kang@siu.edu; ming.yang@siu.edu; qcheng@cs.siu.edu). Color versions of one or more of the ﬁgures in this letter are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/LSP.2016.2573159 tering. In this letter, by integrating a capability of feature selec- tion, we show that the nuclear norm-based subspace clustering can achieve signiﬁcantly improved performance, which is often comparable to that of using the nonconvex rank approximations. The main contributions of this letter are summarized as follows: 1) Feature selection is integrated with the subspace cluster- ing into a seamless framework, enabling the recovery of subspaces with the most relevant features. 2) Our framework allows for efﬁcient optimization, each step of which admits a closed-form solution. 3) Extensive experimental results demonstrate the effective- ness of the integrated framework and show signiﬁcant improvement compared to state-of-the-art algorithms. 4) It is found that, by integrating with feature selection, the nuclear norm-based subspace clustering can achieve promising performance comparable to that of using the nonconvex rank approximations. II. LOW-RANK REPRESENTATION Recently, spectral clustering-based methods have been shown to be effective for subspace clustering applications [1], [5], [12]–[20]; see [21] for reviews. Brieﬂy, given the data X = [x 1 ,x 2 ,...,x n ] ∈R d ×n sampled from a union of K subspaces in R d , typical subspace clustering methods such as SSC and LRR model the data to be self-expressive with X = XZ + E. Here E represents the noise or outliers, and Z ∈R n ×n is a weight matrix, where SSC requires Z to be sparse while LRR requires Z to be low-rank. The problem of LRR is formulated as min Z ‖Z ‖ ∗ + λ‖E‖ ℓ s.t. X = XZ + E (1) where ‖Z ‖ ∗ is the nuclear norm deﬁned as ∑ n i =1 σ Z i with σ Z i being the ith largest singular value of Z , ‖E‖ ℓ can be differ- ent norms, and λ is a balancing parameter. In the literature, it has been shown that the nuclear norm cannot approximate the rank function well if there are dominant singular values [12], [20] and some nonconvex rank approximations have been de- veloped to enhance the performance of model (1). A typical rank approximation is the log-determinant rank approximation employed in [12], [20], and [13], deﬁned as log det(I + Z T Z ), where I ∈R n ×n is an identity matrix. III. FEATURE SELECTION EMBEDDED SUBSPACE CLUSTERING (FSC) Given X, we use a feature selection vector p ∈{0, 1} d to ﬁnd relevant features by diag(p)X, with which irrelevant features are made zeros. For the selected data diag(p)X, following [4], it can be modeled as diag(p)X = diag(p)XZ + E, where Z is low- rank. Therefore, the problem of recovering a low-rank subspace from the selected data with M features can be formulated as min Z,p 1 2 ‖diag(p)X − diag(p)XZ ‖ 2 F + λ‖Z ‖ ∗ s.t. ‖p‖ 0 = M (2) 1070-9908 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.