A Simple Hybrid Method
for Semi-Supervised Learning
Hern´an C. Ahumada
1,2,⋆
and Pablo M. Granitto
1
1
CIFASIS, French Argentine International Center for Information and Systems
Sciences, UPCAM, France / UNR-CONICET, Argentina
Bv. 27 de Febrero 210 Bis, 2000, Rosario, Argentina
2
Facultad de Tecnolog´ ıa y Ciencias Aplicadas - Universidad Nacional de Catamarca
Maximio Victoria 55, 4700, Catamarca, Argentina
{ahumada,granitto}@cifasis-conicet.gov.ar
Abstract. We introduce and describe the Hybrid Semi-Supervised Meth-
od (HSSM) for learning. This is the first hybrid method aimed to solve
problems with both labeled and unlabeled data. The new method uses
an unsupervised stage in order to decompose the full problem into a set
of simpler subproblems. HSSM applies simple stopping criteria during
the unsupervised stage, which allows the method to concentrate on the
difficult portions of the original problem. The new algorithm also makes
use of a simple strategy to select at each subproblem a small subset of
unlabeled samples that are relevant to modify the decision surface. To
this end, HSSM trains a linear SVM on the available labeled samples, and
selects the unlabeled samples that lie within the margin of the trained
SVM. We evaluated the new method using a previously introduced setup,
which includes datasets with very different properties. Overall, the error
levels produced by the new HSSM are similar to other SSL methods, but
HSSM is shown to be more efficient than all previous methods, using
only a small fraction of the available unlabeled data.
Keywords: Semi-supervised learning, Hybrid methods, Classification.
1 Introduction
Semi-supervised learning (SSL) is a learning paradigm that recently has gained
interest by researchers [3]. The main feature of SSL is its ability to use a few
labeled examples together with many unlabeled examples. SSL has a high prac-
tical value in many real world applications where giving a label to an example
is an expensive and consuming time task [14].
The goal of SSL methods is to improve the performance with respect to su-
pervised methods when labeled data is scarce or expensive [14]. However, SSL
methods usually need a large number of unlabeled examples to obtain similar or
better results than supervised methods. Therefore, SSL methods are normally
⋆
Author to whom all correspondence should be addressed. Authors acknowledge grant
support from ANPCyT PICT 237.
L. Alvarez et al. (Eds.): CIARP 2012, LNCS 7441, pp. 138–145, 2012.
© Springer-Verlag Berlin Heidelberg 2012