International Journal of Computer and Information Technology (ISSN: 2279 – 0764) Volume 03– Issue 04, July 2014 www.ijcit.com 805 Investigation of Multilayer Perceptron and Class Imbalance Problems for Credit Rating Zongyuan Zhao, Shuxiang Xu, Byeong Ho Kang Mir Md Jahangir Kabir School of Computing and Information Systems University of Tasmania Tasmania, Australia Yunling Liu College of Information and Electrical Engineering, China Agricultural University Beijing, China Email: lyunling {at} 163.com Abstract— Multilayer perceptron (MLP) neural network is widely used in automatic credit scoring systems with high accuracies and efficiencies. However, class imbalance problems severely harm the prediction accuracy, when the number of instances in one class greatly overweighs the other class. In credit scoring datasets, class imbalance problems exist in fault detection models since there are always less unqualified cases than approved applications. In this work, we investigate the affection of different MLP structure to the prediction ability and develop a novel instance selection method to solve class imbalance problems in German credit datasets. We train 34 models 20 times with different initial weights and training instances. Each model has 6 to 39 hidden units in one hidden layer. Our test results prove that the prediction accuracy of the optimized model with our new instance selection methods is 5% higher than the best result reported in the relevant literature of recent years. We also summarize the tendency of scoring accuracy when the numbers of hidden units in MLP increases. The results of this work can be applied not only for credit scoring, but also in other MLP neural network applications, especially when the distribution of instances in a dataset is imbalanced. Keywords: credit scoring; neural network; German credit; class imbalance I. INTRODUCTION Credit rating has shown the ability to decrease credit risks and reduce “bad” loans, and has been widely used in banks and other financial institutes[1]. It is a set of decision models and their underlying techniques that help lenders judge whether an application of credit should be approved or rejected [2]. Basically credit scoring system can be mainly divided into two kinds: new credit application judgment and prediction of bankrupt after lending. The first kind uses personal information and financial status of a loan applicant as inputs to calculate a score. If the score is higher than a “safe level”, the applicant has high possibility to preform good credit behavior. On the contrary, a low score means high risk for the loan so the lender needs to take careful consideration of the application. The other kind of credit scoring focuses on the credit record of existing customers. From the payment history of a customer, a financial institution can predict a customer’s payment ability and alter his/her credit level. This paper only focuses on the application scoring. In recent years, artificial neural networks (ANN) has shown its advantages in credit scoring in comparison with linear probability models, discriminant analysis and other statistical techniques [3]. Compared with traditional credit scoring which is achieved by professional bank managers, automatic scoring has some obvious advantages: it saves costs and time for evaluating new credit applications; it is consistent and objective [4]. As a kind of ANN model, Multilayer perceptron (MLP) models have been widely utilized[2, 5, 6] which perform competitive prediction ability against other methods [7, 8]. In [9] back-propagation (BP) algorithm was developed and now has been widely used in training MLP feed-forward neural networks. Memetic pareto artificial neural network (MPANN) optimized BP algorithm using a multi-objective evolutionary algorithm and a gradient based local search [10]. This training method could reduce training time and at the same time enhance classification accuracy. The paper also presented a self-adaptive version called SPANN, which was obviously faster than BP and able to largely reduce computational complexity. Many tests showed that RBF, LS- SVM and BP classifiers yielded very good performance with eight credit scoring datasets [11]. But at the same time, some linear classifiers such as LDA and LOG also generated good results. This indicated that the performance differences between some models were not obvious [12]. Another test got similar results by testing the accuracy of several automatic scoring models using the German, Australian and Japanese credit datasets [13]. It reported that comparing with BP, C4.5 decision tree performed a little better for credit scoring but both of them could achieve high accuracies. Also, Nearest Neighbor and Naïve Bayes classifiers appeared to be the worst in their tests. Improvements of neural networks include altering the ratios of training and testing datasets, the number of hidden nodes, and the training iterations. A nine learning schemes