A novel learning approach to multiple tasks based on boosting methodology Pipei Huang a, * , Gang Wang b , Shiyin Qin a a School of Automation Science and Electrical Engineering, Beihang University, 100191 Beijing, China b Machine Learning Group, Microsoft Research Asia, Sigma Center, 49 Zhichun Road, 100190 Beijing, China article info Article history: Received 24 November 2009 Available online 27 May 2010 Communicated by R.C. Guido Keywords: Boosting Multi-task learning Inductive transfer learning Multiple tasks Text classiﬁcation abstract Boosting has become one of the state-of-the-art techniques in many supervised learning and semi-super- vised learning applications. In this paper, we develop a novel boosting algorithm, MTBoost, for multi-task learning problem. Many previous multi-task learning algorithms can only solve the problem in low or moderate dimensional space. However, the MTBoost algorithm is capable of working for very high dimensional data such as in text mining where the feature number is beyond several 10,000. The exper- imental results illustrate that the MTBoost algorithm provides signiﬁcantly better classiﬁcation perfor- mance than supervised single task learning algorithms. Moreover, MTBoost outperforms some other typical multi-task learning methods. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction In traditional supervised learning problem, in order to produce a classiﬁer that generalizes well to the unseen data, there are often plentiful labeled data required for the current task. If the labeled data is not adequate, semi-supervised learning can be applied, which makes use of unlabeled data to obtain a generalized model. These two learning problems are also referred to as the single task learning (STL), since these approaches solve one classiﬁcation task at a time, only using one corresponding data set. STL often assumes that samples are drawn independently from an identical distribution. In real applications, the classiﬁcation problem can often be viewed as consisting of multiple correlated subtasks, and each task is a separately STL problem. Some of the tasks may be highly corre- lated with each other. Since the data in different tasks may come from different distributions, the strategy of isolating each task and learning the classiﬁers independently does not exploit the po- tential information extracted from other classiﬁcation tasks. On the other hand, since the tasks are not identical, it is not appropriate to simply pool them and treat them as a new single task problem. The intuition in multi-task learning (MTL) is to take advantage of the knowledge from the related tasks and obtain a better model to en- hance the overall classiﬁcation performance. Hence, the algorithms in MTL are essentially distinct from standard STL algorithms. The fact that some of the classiﬁcation tasks are correlated im- plies the possibility of knowledge transfer from one task to an- other. Multi-task learning is an inductive transfer learning whose critical purpose is to improve the generalization performance cor- responding to the current task with the help from other tasks. Through extracting shared information among related tasks and integrating transferable knowledge into the classiﬁer, the training data for each task is strengthened and the generalization of the resulting classiﬁer is improved. This transferable knowledge is par- ticularly important when there is only a limited amount of training data available for learning each classiﬁer. In this paper, we develop a novel boosting algorithm, MTBoost, to handle the multi-task learning problem. The boosting algorithm predicts samples using a weighted vote over a set of weak classiﬁ- ers. Freund and Schapire (1996) proposed the AdaBoost algorithm, which is the most typical version of boosting algorithm. It received many improved modiﬁcations in extensive studies (Schapire and Singer, 1999, 2000; Friedman et al., 2000). Most importantly, Fried- man et al. (2000) imported an additive logistic regression model into AdaBoost and exactly explained the boosting algorithm from a statistical view. From the perspectives of additive regression model and exponential loss function, Friedman et al. (2000) pro- posed some new boosting algorithms including ‘‘GentleBoost” and ‘‘LogitBoost”. Our proposed algorithm, MTBoost, is based on the GentleBoost algorithm. In GentleBoost, it uses the exponential loss function J(.) (Friedman et al., 2000): J ¼ Ee zHðxÞ   ; ð1Þ where x and z denote an input training sample and an output label {+1, 1}, respectively. H(x) represents the additive model such as 0167-8655/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2010.05.019 * Corresponding author. E-mail address: huangpipei@gmail.com (P. Huang). Pattern Recognition Letters 31 (2010) 1693–1700 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec