Detecting Credit Card Fraud by ANN and Logistic Regression Y. Sahin Marmara University Goztepe Campus, Kadikoy Istanbul, 34722, Turkey ysahin@marmara.edu.tr E. Duman Dogus University Acibadem Campus, Kadikoy Istanbul, 34722, Turkey eduman@dogus.edu.tr Abstract—With the developments in information technology and improvements in communication channels, fraud is spreading all over the world, resulting in huge financial losses. Though fraud prevention mechanisms such as CHIP&PIN are developed, these mechanisms do not prevent the most common fraud types such as fraudulent credit card usages over virtual POS terminals through Internet or mail orders. As a result, fraud detection is the essential tool and probably the best way to stop such fraud types. In this study, classification models based on Artificial Neural Networks (ANN) and Logistic Regression (LR) are developed and applied on credit card fraud detection problem. This study is one of the firsts to compare the performance of ANN and LR methods in credit card fraud detection with a real data set. Keywords- Credit card fraud detection, ANN, logistic regression, classification I. INTRODUCTION Fraud can be defined as wrongful or criminal deception intended to result in financial or personal gain [1], or to damage another individual without necessarily leading to direct legal consequences. The two main mechanisms to avoid frauds and losses due to fraudulent activities are fraud prevention and fraud detection systems. Fraud prevention is the proactive mechanism with the goal of disabling the occurrence of fraud. Fraud detection systems come into play when the fraudsters surpass the fraud prevention systems and start a fraudulent transaction. Nobody can understand whether a fraudulent transaction has passed the prevention mechanisms. Accordingly, the goal of the fraud detection systems is to check every transaction for the possibility of being fraudulent regardless of the prevention mechanisms, and to identify fraudulent ones as quickly as possible after the fraudster has begun to perpetrate a fraudulent transaction. A review of the fraud detection systems can be found in [2-5]. With the developments in the information technology and improvements in the communication channels, fraud is spreading all over the world with results of huge financial losses. Though fraud can be perpetrated through many types of media, including mail, wire, phone and the Internet, online media such as Internet are the most popular ones. Because of the international availability of the web and ease with which users can hide their location and identity over Internet transactions, there is a rapid growth of committing fraudulent actions over this medium. Furthermore, with the improvements in the bandwidth of internetworking channels, fraudsters have the chance to form fraud networks among themselves through information change and collaboration all over the world. As a result, frauds committed over internet such as online credit card frauds become the most popular ones because of their nature. Credit card frauds can be made in many ways such as simple theft, application fraud, counterfeit cards, never received issue (NRI) and online fraud (where the card holder is not present). In online fraud, the transaction is made remotely and only the card’s details are needed. A manual signature, a PIN or a card imprint are not required at the time of purchase. Though prevention mechanisms like CHIP&PIN decrease the fraudulent activities through simple theft, counterfeit cards and NRI; online frauds (internet and mail order frauds) are still increasing in both amount and number of transactions. There has been a growing amount of financial losses due to credit card frauds as the usage of the credit cards become more and more common. Many papers reported huge amounts of losses in different countries [2, 6-7]. According to Visa reports about European countries, about 50% of the whole credit card fraud losses in 2008 are due to online frauds. Credit card fraud detection is an extremely difficult, but also popular problem to solve. Firstly, there comes only a limited amount of data with the transaction being committed, such as transaction amount, date and time, address, merchant category code (MCC) and acquirer number of the merchant. There are millions of possible places and e-commerce sites to use a credit card which makes it extremely difficult to match a pattern. Also, there can be past transactions made by fraudsters which also fit a pattern of normal (legitimate) behavior [8]. Moreover, the problem has many constraints. First of all, the profile of normal and fraudulent behavior changes constantly. Secondly, the development of new fraud detection methods is made more difficult by the fact that the exchange of ideas in fraud detection, especially in credit card fraud detection is severely limited due to security and privacy concerns. Thirdly, data sets are not made available and the results are often censored, making them difficult to assess. Because of this problem, there is no chance of benchmarking for the models built. Even, some of the studies are done using synthetically generated data [9-10]. None of the previous studies with real data in the literature give details about the data and the variables used in classifier models. Fourthly, credit card fraud data sets are highly skewed sets with a ratio of about 10000