Vol.7 (2017) No. 6 ISSN: 2088-5334 Credit Card Detection System Based on Ridit Approach Norbaiti Tukiman # , Norhaiza Ahmad * , Suhana Mohamed $ , Zarith Sofiah Othman # , CT Munirah Niesha Mohd Shafee # , Zairi Ismael Rizman & # Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 81750 Pasir Gudang, Johor, Malaysia E-mail: norbaiti289@johor.uitm.edu.my, zarithsofiah@johor.uitm.edu.my, ctmun518@johor.uitm.edu.my * Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia, 81200 Skudai, Johor, Malaysia E-mail: norhaiza@utm.my $ Department of Finance, Faculty of Business Management, Universiti Teknologi MARA, 81750 Pasir Gudang, Johor, Malaysia E-mail: suhan291@johor.uitm.edu.my & Faculty of Electrical Engineering, Universiti Teknologi MARA, Dungun, Terengganu, Malaysia E-mail: zairi576@tganu.uitm.edu.my Abstract— Fraud detection is one of the important agendas in financial and insurance institutions to protect the institutions from fraudsters and loss. The losses to the financial institutions are huge, and the need to detect the fraud at an early stage is critical to the institutions. If the numbers of fraud are not properly managed, the impact may lead to the closure of the institutions. Many predictive analytic systems or models have been proposed to identify and detect the frauds. Hence, this paper examines the effect of different response variables of credit card history as the reference group which used an unsupervised scoring method namely an Identified Distribution (RIDIT) based on a statistically significant test. We illustrate the method using German Credit card dataset retrieved from UCI Machine Learning Data System. The result generates scores and significant value of chi-square test that reflect response variables being classified as reference group or comparison groups, which more or less affected by the response credit card history in fraud detection. Keywords— fraud; RIDIT; score; system; approach I. INTRODUCTION White collar crime happened in most of the countries in the world. Among the famous crime is the credit card fraud which causes billions of dollars were robbed from the bank. The process of identifying the credit card fraud in financial data is very crucial, and yet a challenging task since the database in the transactions is large and highly dimensional [1]-[4]. Recently, loss of billions of dollars occurs and arising in credit card fraud for credit card holders and corporate companies. Based on a report by Internet Crime Complaint Centre (IC3) showed an increasing value of complaints with similar dollar loss for five years from 2011 to 2015. In 2015, there are about 288,012 complaints received with dollar loss reports about $1,070,711,522 million which increase 33.8% or $270,219,449 million compare to year 2014. We illustrate the complaint and corresponding dollar loss report by IC3 in Fig. 1 and Fig. 2. Thus, from the huge number of loss figure, it is crucial to prevent or fight the fraud. A fraudster defined as a perverted person whom intent to get some benefit from other source or party without the legal right. Currently, fraudsters may use high and sophisticated methods in electronic commerce and technology tools or equipment in their fraudulent activities to gain more money. The previous literature discusses the issues and challenges in detection fraud which involve the concept of drift, real-time detection, earliness of detection, skewed distribution, big data, misclassification of data, cost sensitivity and so on [5]. Therefore, in dealing with the transaction of data, typically these data are imbalanced, having the small number of fraud compared to non-fraud cases. There are many predictive methods namely supervised method to detect fraud such as support vector machine [6], neural network [7], [8] and decision tree [9], [10]. While, an unsupervised method, a few techniques have been applied such as Self- organizing Map [11], [12], Principal Component Analysis [13], [14] and Peer Group Analysis [15]. However, most of these methods are dependent on distribution which is not appropriate for ordered categorical data. One unsupervised predictive method to counter this issue is using an Identified 2071