Learning with few examples for binary and multiclass classification using regularization of randomized trees Erik Rodner ⇑ , Joachim Denzler Chair for Computer Vision, Friedrich Schiller University of Jena, Germany article info Article history: Received 30 March 2009 Available online 18 September 2010 Communicated by T.K. Ho Keywords: Object categorization Randomized trees Few examples Interclass transfer Transfer learning abstract The human visual system is often able to learn to recognize difficult object categories from only a single view, whereas automatic object recognition with few training examples is still a challenging task. This is mainly due to the human ability to transfer knowledge from related classes. Therefore, an extension to Randomized Decision Trees is introduced for learning with very few examples by exploiting interclass relationships. The approach consists of a maximum a posteriori estimation of classifier parameters using a prior distribution learned from similar object categories. Experiments on binary and multiclass classi- fication tasks show significant performance gains Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction During the last few decades, research in machine learning and computer vision has led to many new object representations and improved algorithms for numerical classification. Despite the suc- cess of this development, there is still an unanswered question: how does one learn object models from few training examples? On the one hand, this question is motivated from industrial demand. In many applications, gathering hundreds or thousands of training images is either expensive or nearly impossible (Platzer et al., 2008). Building robust classification systems in those settings therefore requires complex specialized methods, that indirectly incorporate human prior knowledge about the task. On the other hand, progress on learning with few examples is an important challenge and an essential step towards closing the gap between human and computer vision abilities. The human visual recognition system is often easily able to learn a new object category, such as a new animal class, from just a single view. At first glance, this observation seems to contradict to the classical theory. The parameters of object models often exceed the available number of training examples in multiple dimensions. From a mathematical point of view, this results in an ill-posed optimization problem, especially in cases with only a few training examples. Therefore the only possibility to solve this problem is to regularize the optimization using prior knowledge. In previous algorithms this prior knowledge was often derived from abstract assumptions or was manually tuned during the development. However psychological studies (Jones et al., 1993) suggest that a key component of the human ability to recognize a class from a limited number of examples is the concept of interclass transfer. This paradigm is also known as knowledge transfer, learning to learn or transfer learning. It states that prior knowledge from previously learned object categories is the most important additional infor- mation source when learning object models from weak representa- tions (Fei-Fei, 2006). To give an illustrative example of this idea, consider the recognition of a new animal class such as an okapi. With the aid of our prior knowledge from related animal classes (giraffe, zebra, antelope, etc.), we are able to generalize quickly from a single view. In this work, a concept is presented how prior knowledge of re- lated classes (often also called support classes) can be used to in- crease the generalization ability of a discriminative classifier. The underlying idea is a maximum a posteriori (MAP) estimation of parameters using a prior distribution estimated from similar object categories. Furthermore, the application of this idea to Randomized Decision Trees, as introduced by Geurts et al. (2006), is demon- strated. The paper is based on our previous work in (Rodner and Denzler, 2008) that concentrates on multiclass classification. Stud- ies are extended by showing the applicability of the approach to binary classification. An additional experiment also emphasizes that the information transferred is not generic prior knowledge unrelated to interclass relationships. The remainder of the paper is organized as follows. After previ- ous work in the field of learning with weak representations is briefly reviewed, it is shown that Bayesian estimation using a prior 0167-8655/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2010.08.009 ⇑ Corresponding author. E-mail addresses: Erik.Rodner@uni-jena.de (E. Rodner), joachim.denzler@uni- jena.de (J. Denzler). URL: http://www.inf-cv.uni-jena.de (J. Denzler). Pattern Recognition Letters 32 (2011) 244–251 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec