Sparse semi-supervised support vector machines by DC programming and DCA Hoai Minh Le a,b , Hoai An Le Thi n,b , Manh Cuong Nguyen b a Modeling and Optimization of Complex Systems Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam b Laboratory of Theoretical and Applied Computer Science EA 3097, University of Lorraine, Ile du Saulcy, 57045 Metz, France article info Article history: Received 9 June 2014 Received in revised form 22 November 2014 Accepted 23 November 2014 Communicated by T. Heskes Keywords: Semi-supervised SVM Feature selection Non-convex optimization DC approximation Exact penalty DC programming and DCA abstract This paper studies the problem of feature selection in the context of Semi-Supervised Support Vector Machine (S3VM). The zero norm, a natural concept dealing with sparsity, is used for feature selection purpose. Due to two nonconvex terms (the loss function of unlabeled data and the ℓ 0 term), we are faced with a NP hard optimization problem. Two continuous approaches based on DC (Difference of Convex functions) programming and DCA (DC Algorithms) are developed. The ﬁrst is DC approximation approach that approximates the ℓ 0 -norm by a DC function. The second is an exact reformulation approach based on exact penalty techniques in DC programming. All the resulting optimization problems are DC programs for which DCA are investigated. Several usual sparse inducing functions are considered, and six versions of DCA are developed. Empirical numerical experiments on several Benchmark datasets show the efﬁciency of the proposed algorithms, in both feature selection and classiﬁcation. & 2014 Elsevier B.V. All rights reserved. 1. Introduction In machine learning, supervised learning is a task of inferring a predictor function (classiﬁer) from a labeled training dataset. Each example in training set consists of an input object and a label. The objective is to build a predictor function which can be used to identify the label of new examples with the highest possible accuracy. Nevertheless, in most of real word applications, a large portion of training data are unlabeled, and supervised learning cannot be used in these contexts. To deal with this difﬁculty, recently, in the machine learning community, there has been an attracting increasing attention in using semi-supervised learning methods. In contrast to supervised methods, semi-supervised learning methods take into account both labeled and unlabeled data to construct prediction models. We are interested in semi-supervised classiﬁcation, more pre- cisely, in the so-called Semi-Supervised Support Vector Machines (S3VM). Among the semi-supervised classiﬁcation methods, the large margin approach S3VM, which extends the Support Vector Machine (SVM) to semi-supervised learning concept, is certainly the most popular [5–13,37,39,41,47,54,59]. An extensive overview of semi-supervised classiﬁcation methods can be found in [61]. S3VM was originally proposed by Vapnik and Sterin in 1977 [57] under the name of transductive support vector machine. Later, in 1999, Bennett and Demiriz [2] proposed the ﬁrst optimization formulation of S3VM which is described as follows. Given a training set which consists of m labeled points fðx i ; y i Þ A R n f 1; 1g; i ¼ 1; …; mg and p unlabeled points fx i A R n ; i ¼ ðm þ 1Þ; …; ðmþ pÞg. We are to ﬁnd a separating hyperplane P ¼ fx∣x A R n ; x T w ¼ bg, far away from both the labeled and unlabeled points. Hence, the optimization problem of S3VM takes the form min w;b ‖w‖ 2 2 þ α ∑ m i ¼ 1 Ly i ð〈w; x i 〉 þ bÞ   þ β ∑ mþ p i ¼ mþ 1 L j〈w; x i 〉 þ bj ð Þ: ð1Þ Here, the ﬁrst two terms deﬁne a standard SVM while the third one incorporates the loss function of unlabeled data points. The loss function of labeled and unlabeled data points are weighted by penalty parameters α 40 and β 40. Usually, in classical SVM one uses the hinge loss function LðuÞ¼ maxf0; 1  ug which is convex. On the contrary, the problem (1) is nonconvex, due to the nonconvexity of the third term. There are two broad strategies for solving the optimization problem (1) of S3VM: the combinatorial methods (Mixed Integer Programming [2], Branch and Bound algorithm [6]) and the continuous optimization methods such as self-labeling heuristic S 3 VM light [15], gradient descent [5], deterministic annealing [53], semi-deﬁnite programming [4], DC programming [9]. Combinator- ial methods are not available for massive data sets in real Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing http://dx.doi.org/10.1016/j.neucom.2014.11.051 0925-2312/& 2014 Elsevier B.V. All rights reserved. n Corresponding author. Tel.: þ33 3 87 31 54 41. E-mail address: hoai-an.le-thi@univ-lorraine.fr (H.A. Le Thi). Please cite this article as: H.M. Le, et al., Sparse semi-supervised support vector machines by DC programming and DCA, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.11.051i Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎