Review of Business Information Systems – Fourth Quarter 2013 Volume 17, Number 4 2013 The Clute Institute Copyright by author(s) Creative Commons License CC-BY 135 An Investigation Of The Effect Of Variable Reduction On Classification Accuracy Rates Of Consumer Loans Jozef Zurada, University of Louisville, USA ABSTRACT The profitability of loan granting institutions depends largely on the institutions’ ability to accurately evaluate credit risk. Their goal is to maximize income by issuing as many good loans to consumers as possible while minimizing losses associated with bad loans. Financial institutions have been using various computational intelligence methods and statistical techniques to improve credit risk prediction accuracy. This paper examines historical data from consumer loans issued by a German bank to individuals. The data consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off and defaulted upon. This paper examines and compares the classification effectiveness of four computational intelligence techniques: 1) logistic regression (LR), 2) neural networks (NNs), 3) support vector machines (SVM), and 4) k-nearest neighbor (kNN) on three data sets to predict whether a consumer defaulted or paid off a loan. The first data set contains a full set of 20 input variables. The second and third data sets contain a reduced set of ten and six variables, respectively. The results from computer simulation show a limited effect of variable reduction on improvement in the classification performance. Keywords: Classification; Loan-Granting Decisions; Variable Reduction; Logistic Regression; Neural Networks; Support Vector Machines; Decision Trees INTRODUCTION any financial services institutions are developing credit scoring models to support their credit decisions. The ultimate objective of these models is to increase accuracy in loan-granting decisions so that more creditworthy applicants are granted credit, thereby increasing profits, and non- creditworthy applicants are denied credit, thus decreasing losses. Even a slight improvement in accuracy rates may translate into significant future savings measured in millions of US dollars. Determining whether a particular consumer should receive a loan is an inherently complex and, to a large extent, unstructured process. A financial institution must examine many independent financial attributes of each loan candidate in an accurate, prompt, and cost effective manner. The financial institution approximates the risk of default by the candidate and weighs that risk against the benefit of potential earnings on the loan. Any improvement in making a reliable distinction between those who are likely to repay the loan and those who are not would allow the bank to reject the riskiest loans and to adjust the terms of the granted loans according to the risk of default. The volume and complexity of raw data inherent in credit-risk assessment can be tackled by several traditional statistical techniques and newer computational intelligence methods. This paper examines and compares the classification effectiveness of four computational intelligence techniques (LR, NNs, SVM, and kNN) on three data sets to predict whether a consumer defaulted or paid off a loan. The first data set contains a full set of 20 input variables. The second and third data sets contain a reduced set of ten and six variables, respectively. This paper contains sections on literature review; an explanation of the fundamentals of logistic regression (LR), neural networks (NN), support vector machines (SVM), and the k-nearest neighbor M