A Simple Hybrid Method for Semi-Supervised Learning Hern´an C. Ahumada 1,2,⋆ and Pablo M. Granitto 1 1 CIFASIS, French Argentine International Center for Information and Systems Sciences, UPCAM, France / UNR-CONICET, Argentina Bv. 27 de Febrero 210 Bis, 2000, Rosario, Argentina 2 Facultad de Tecnolog´ ıa y Ciencias Aplicadas - Universidad Nacional de Catamarca Maximio Victoria 55, 4700, Catamarca, Argentina {ahumada,granitto}@cifasis-conicet.gov.ar Abstract. We introduce and describe the Hybrid Semi-Supervised Meth- od (HSSM) for learning. This is the ﬁrst hybrid method aimed to solve problems with both labeled and unlabeled data. The new method uses an unsupervised stage in order to decompose the full problem into a set of simpler subproblems. HSSM applies simple stopping criteria during the unsupervised stage, which allows the method to concentrate on the diﬃcult portions of the original problem. The new algorithm also makes use of a simple strategy to select at each subproblem a small subset of unlabeled samples that are relevant to modify the decision surface. To this end, HSSM trains a linear SVM on the available labeled samples, and selects the unlabeled samples that lie within the margin of the trained SVM. We evaluated the new method using a previously introduced setup, which includes datasets with very diﬀerent properties. Overall, the error levels produced by the new HSSM are similar to other SSL methods, but HSSM is shown to be more eﬃcient than all previous methods, using only a small fraction of the available unlabeled data. Keywords: Semi-supervised learning, Hybrid methods, Classiﬁcation. 1 Introduction Semi-supervised learning (SSL) is a learning paradigm that recently has gained interest by researchers [3]. The main feature of SSL is its ability to use a few labeled examples together with many unlabeled examples. SSL has a high prac- tical value in many real world applications where giving a label to an example is an expensive and consuming time task [14]. The goal of SSL methods is to improve the performance with respect to su- pervised methods when labeled data is scarce or expensive [14]. However, SSL methods usually need a large number of unlabeled examples to obtain similar or better results than supervised methods. Therefore, SSL methods are normally ⋆ Author to whom all correspondence should be addressed. Authors acknowledge grant support from ANPCyT PICT 237. L. Alvarez et al. (Eds.): CIARP 2012, LNCS 7441, pp. 138–145, 2012. © Springer-Verlag Berlin Heidelberg 2012