Multi-task proximal support vector machine $ Ya Li a,1 , Xinmei Tian a,n , Mingli Song b , Dacheng Tao c a University of Science and Technology of China, Hefei, Anhui 230026, PR China b Zhejiang University, Hangzhou, Zhejiang, China c University of Technology, Sydney, Australia article info Article history: Received 17 October 2014 Received in revised form 15 January 2015 Accepted 16 January 2015 Available online 3 February 2015 Keywords: Multi-task learning Support vector machines Proximal classifiers abstract With the explosive growth of the use of imagery, visual recognition plays an important role in many applications and attracts increasing research attention. Given several related tasks, single-task learning learns each task separately and ignores the relationships among these tasks. Different from single-task learning, multi-task learning can explore more information to learn all tasks jointly by using relation- ships among these tasks. In this paper, we propose a novel multi-task learning model based on the proximal support vector machine. The proximal support vector machine uses the large-margin idea as does the standard support vector machines but with looser constraints and much lower computational cost. Our multi-task proximal support vector machine inherits the merits of the proximal support vector machine and achieves better performance compared with other popular multi-task learning models. Experiments are conducted on several multi-task learning datasets, including two classification datasets and one regression dataset. All results demonstrate the effectiveness and efficiency of our proposed multi-task proximal support vector machine. & 2015 Elsevier Ltd. All rights reserved. 1. Introduction Given the explosive growth the use of imagery in the era of big data, visual recognition has become an important problem. Var- ious image classification and recognition methods have been proposed and have achieved much success [1–9]. Some feature learning methods are also proposed to improve the performance of image classification and recognition [10–13]. When learning a visual recognition task, it can often be viewed as a combination of multiple correlated subtasks [14]. Considering multi-label image classification, for example, one particular image may contain multiple objects corresponding to different labels. Obviously, there are correlations among these labels. Traditional single-task learn- ing methods, for example, SVMs and Bayesian models, learn to classify these labels separately and ignore correlations among them. It would be desirable to explore shared information across different subtasks and apply the information to learn all the subtasks jointly. Inspired by this idea, various methods are proposed to learn multiple tasks jointly rather than separately. This is often called the multi-task learning (MTL) [15], learning to learn [16] or inductive bias learning [17]. All these methods tend to learn multiple tasks together and improve the performance of single-task learning models. The most important and difficult problem in multi-task learn- ing is to discover the shared information among tasks and maintain the independence of each task. Considering the classifi- cation of vehicles (see Fig. 1), we have various types of vehicles, such as sports cars, family cars and buses corresponding to different classification tasks. These cars have shared features as well as unique characteristics. For example, all cars have four wheels and two headlights. However, sports cars usually have a lower and racing body, family cars often have medium size, and buses have a bigger body. Single-task learning only uses the information of the independent task, while multi-task learning will use all the information among the tasks. If a multi-task learning method can find the shared features of these vehicles and distinguish differences among the vehicles, each learning task will have much more additional information from other tasks. Conversely, noise will be added to the current learning task. Existing multi-task learning methods mainly have two ways to discover relationships among different tasks. One way is to assume that different tasks share common parameters [18,14,19–23] such as a Bayesian model sharing a common prior [14] or a Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition http://dx.doi.org/10.1016/j.patcog.2015.01.014 0031-3203/& 2015 Elsevier Ltd. All rights reserved. ☆ This work is supported by the NSFC under the Contract nos. 61201413 and 61390514, the Fundamental Research Funds for the Central Universities Nos. WK2100060011 and WK2100100021, and the Specialized Research Fund for the Doctoral Program of Higher Education No. WJ2100060003. Australian Research Council Projects: DP-140102164, ARC FT-130101457, and ARC LP-140100569. n Corresponding author. Tel.: þ86 183 551 026 90; fax: þ86 551 636 013 40 E-mail addresses: muziyiye@mail.ustc.edu.cn (Y. Li), xinmei@ustc.edu.cn (X. Tian), brooksong@ieee.org (M. Song), Dacheng.Tao@uts.edu.au (D. Tao). 1 CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, China Pattern Recognition 48 (2015) 3249–3257