Regularized Multi-Class Semi-Supervised Boosting ∗ Amir Saffari Christian Leistner Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {saffari,leistner,bischof}@icg.tugraz.at Abstract Many semi-supervised learning algorithms only deal with binary classification. Their extension to the multi-class problem is usually obtained by repeatedly solving a set of binary problems. Additionally, many of these methods do not scale very well with respect to a large number of unlabeled samples, which limits their applications to large-scale problems with many classes and unlabeled samples. In this paper, we directly address the multi-class semi-supervised learning problem by an efficient boosting method. In particular, we introduce a new multi-class margin-maximizing loss function for the unlabeled data and use the generalized expectation regularization for incorporating cluster priors into the model. Our approach enables efficient usage of very large data sets. The performance and efficiency of our method is demonstrated on both standard ma- chine learning data sets as well as on challenging object categorization tasks. 1. Introduction Supervised learning algorithms requires a huge amount of labeled data which are often hard or costly to obtain. Semi-supervised methods offer an inter- esting solution to this requirement by learning from both labeled and unlabeled data. In the literature, one can find three main semi-supervised learning paradigms: 1) Some algorithms learn the cluster or manifold structure of the feature space with unla- beled samples and use it as an additional cue for the * This work has been supported by the Austrian Joint Research Project Cognitive Vision under projects S9103-N04 and S9104- N04, the FFG project EVis (813399) under the FIT-IT program and the Austrian Science Fund (FWF) under the doctoral program Confluence of Vision and Graphics W1209. supervised learning process, for example cluster ker- nels [8], label propagation [28], Laplacian SVMs [3]. 2) Some methods, such as Transductive Support Vec- tor Machines (TSVM) [12, 24], try to maximize the margin of the unlabeled samples by pushing away the decision boundary from dense regions of feature space. 3) In co-training [4] two initial classifiers are trained on some labeled data and then they label un- labeled data for re-training of the other one. The computational complexity of many of state- of-the-art semi-supervised methods limits their ap- plication to large-scale problems [17]. This is spe- cially counter-productive for computer vision tasks, such as object recognition or categorization, where a huge amount of unlabeled data is very easy to obtain, for example, via Web downloads. Most of recent research has focused on binary classification problems, where multi-class problems are often tackled by applying the same algorithm to a set of decomposed binary tasks. Typical approaches are the 1-vs.-all, 1-vs.-1, and error correcting out- put codes [1]. However, multiple repetition of an al- ready heavy-duty algorithm is not an attractive option for solving problems with many classes and samples. Also, in the case of the 1-vs.-all approach one intro- duces additional problems, such as producing unbal- anced datasets or uncalibrated classifiers. Hence, having an inherent multi-class semi- supervised algorithm with low computational com- plexity is very interesting for large-scale applica- tions. Methods addressing both of these issues are very rare to find in the literature. Xu and Schuur- mans [27] introduce a multi-class extension to the TSVM which, as stated in the paper, is computa- tionally more intensive than the original TSVM for- mulation. Song et al. [25], and Rogers and Giro- lami [20] propose the use of Gaussian Processes, while Azran [2] use Markov random walks over a graph for solving the multi-class semi-supervised 1