Relevance vector machine based inﬁnite decision agent ensemble learning for credit risk analysis Shukai Li a,⇑ , Ivor W. Tsang a , Narendra S. Chaudhari b a Centre for Computational Intelligence, Nanyang Technological University, Singapore b Department of Computer Science and Engineering, Indian Institute of Technology Indore, India article info Keywords: Credit risk analysis Boosting Relevance vector machine Perceptron Kernel abstract In this paper, a relevance vector machine based inﬁnite decision agent ensemble learning (RVM Ideal ) sys- tem is proposed for the robust credit risk analysis. In the ﬁrst level of our model, we adopt soft margin boosting to overcome overﬁtting. In the second level, the RVM algorithm is revised for boosting so that different RVM agents can be generated from the updated instance space of the data. In the third level, the perceptron Kernel is employed in RVM to simulate inﬁnite subagents. Our system RVM Ideal also shares some good properties, such as good generalization performance, immunity to overﬁtting and predicting the distance to default. According to the experimental results, our proposed system can achieve better performance in term of sensitivity, speciﬁcity and overall accuracy. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Credit risk, the chance that money owed may not be repaid. There is little doubt, however, that the awareness of credit risk has contin- ued to grow. This has been accompanied by an increasing recogni- tion across many sectors of the economy that credit risk needs to be actively managed (Servigny & Renault, 2004). People pay high attention to the potential loss of credit assets in the future, such as: changes in the credit quality (including downgrades or upgrades in credit ratings), variations of credit spreads, and the default event. The role of credit risk analysis is to assess and evaluate the potential credit risk with any customer or borrower, and to advise on deci- sions about granting credit or providing loans or borrowing facilities (Graham & Coyle, 2000). In other words, credit risk analysis is the method by which one calculates the creditworthiness of a person, business or organization. For many credit granting institutions like commercial banks and credit companies, the ability to discriminate non-default customers from default ones is crucial for the success in their business. Credit risk analysis has become to attract much more attention from ﬁnancial institutions because of the Asian Financial Crisis in 1997, the subprime mortgage crisis during 2007 and 2009, and Basel II (Lang, Mester, & Vermilyea, 2008) published in 2004. Furthermore, as business competitions for more market share and proﬁt become more and more serious, some ﬁnancial institu- tions undertake more risks to achieve competitive superiority in the market. Accessibility of large databases, and advances in statistical and machine learning methods to generate efﬁcient credit risk models have changed this area fundamentally in last decades. The predic- tion of credit risk has also been widely studied after realizing its practical purposes like early warning signals for defaults by obligors. This kind of techniques are widely applied in the corporate and personal credit risk analysis in which there is a need to predict the credit risk of a potential obligor before the debit is approved and extended. Besides that, ﬁnancial institutions are driven by obligees to employ powerful credit risk models to assess the credit risk of debit. Hence, more accurate quantitative models for prediction are essential in order to perform more accurate credit risk analysis of loan portfolios and access the obligor’s creditworthiness. 1.1. The state of the art Credit risk analysis is important but also complicated. The most reliable customer may also default his or her debt. Besides that, there are some noisy data from corporate ﬁnancial statements and personal credit question forms. In order to generate a robust credit risk analysis model, this line of research has began since 1960s, like Beaver (Beaver, 1966) who is one of the earliest researchers to study the prediction of credit risk. Beaver’s analysis includes studying one ﬁnancial ratio at each time and deciding a cutoff threshold for every ratio. Hereafter, quantitative models (Galindo & Tamayo, 2000; Thomas, 2000) such as linear discrimi- nant analysis and logistic regression (Ederington, 1985) have been applied to predict the credit level of new clients. In addition to 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.10.022 ⇑ Corresponding author. Tel.: +65 81786016; fax: +65 67926559. E-mail address: empiremaths@hotmail.com (S. Li). Expert Systems with Applications 39 (2012) 4947–4953 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa