Abstract—This paper presents a novel approach to knowledge extraction from large-scale datasets using a neural network when applied to the real-world problem of payment card fraud detection. Fraud is a serious and long term threat to a peaceful and democratic society. We present SOAR (Sparse Oracle-based Adaptive Rule) extraction, a practical approach to process large datasets and extract key generalizing rules that are comprehensible using a trained neural network as an oracle to locate key decision boundaries. Experimental results indicate a high level of rule comprehensibility with an acceptable level of accuracy can be achieved. The SOAR extraction outperformed the best decision tree induction method and produced over 10 times fewer rules aiding comprehensibility. Moreover, the extracted rules discovered fraud facts of key interest to industry fraud analysts. I. INTRODUCTION raud is prevalent in many high-volume areas, such as on- line shopping, telecommunications, banking, social security claims, etc., where a manual review of all transactions is not possible and decisions must be made quickly to prevent crime. Fraud is increasing with the expansion of computing technology and globalization, with criminals devising new frauds to overcome the strategies already in place to stop them. Automating the detection of fraud, through the use of a Fraud Management System (FMS), is therefore of strategic importance. One type of fraud is payment card fraud – this is the criminal act of deception through the use of a physical plastic card or card information without the knowledge of the cardholder. When a transaction takes place, the details of that transaction are processed by the acquiring bank for authorization. It is reported that in the USA total card fraud losses cost banks and merchants $8.6 billion per year [1] and in the UK £609.9 million [2]; despite the FMS tools already in place to tackle the problem. There are three types of fraud: (1) collusion between a merchant and a cardholder using false transactions, (2) committed using the physical payment card, called Cardholder Present (CP) – such as the interception of new credit cards in the mail, stolen/lost cards or the copying of card information onto counterfeit physical cards, Manuscript received May 2 2010. This work was supported in part by Retail Decisions Europe Ltd. (http://www.redplc.com). Nick F Ryman-Tubb is with City University London, Department of Computing, Northampton Square, London, EC1V 0HB, UK (phone: +44 (0) 20 7040 4053; e-mail: nick.ryman-tubb@soi.city.ac.uk; web: http://www.soi.city.ac.uk/neural). Artur d'Avila Garcez is with City University London, Department of Computing, Northampton Square, London, EC1V 0HB, UK (phone: +44 (0) 20 7040 4053; e-mail: aag@soi.city.ac.uk). employee fraud at the issuing bank, etc., and (3) committed through the use of the internet or telephone, where the Cardholder is Not Present (CNP) at the point of transaction. A. The Importance of Fraud Detection Traditionally, public perceptions of fraud are tempered by a belief that it is a “white-collar” crime which targets the wealthy and big business and is of less personal concern, as the effects are cushioned for the victim [3]. However, mafia figures and other violent criminals are increasingly moving into fraud [4] so that payment card fraud now involves the threat of violence, including murder. In the USA, the fear of fraud now supersedes that of terrorism, computer and health viruses and personal safety [5] and in the UK the Attorney General describes fraud as, “second only to drug trafficking in causing harm to the economy and society.” [6]. Today, the proceeds from fraud are paying for organized crime, drug smuggling and terrorism [7, 8]. Existing FMS approaches are not keeping pace [9]; with firms rating payment fraud as the most critical threat to their business; “…as long as criminals believe they can get away with committing fraud, the problem will continue to grow to a point where it may challenge the competitiveness of the online model”. If anti-fraud technologies do not keep pace businesses lose money from: charge-backs and fines, loss of goods, loss of reputation with their payment card facilities withdrawn and in some cases business failure. To detect fraud, organizations use a range of methods, at the most basic level this is a list of internal procedures such as fixed credit limits, transaction volume limits and so on. However, only a small number now rely on manual methods alone, with the majority employing some form of automated FMS. The FMS is often a rule-based system that stores and uses knowledge in a transparent way and is easy for a fraud expert to modify and interpret. Rules provide a convenient mechanism for explaining decisions. However, the generation of comprehensible rules is an expensive and time-consuming task, requiring a high degree of skill, both in terms of the developers and the experts concerned. The performance of the FMS is dependent upon the skill of the human expert and how past data and events are interpreted. Experts are often subjective and can only deal with a limited number of transaction fields. While it was found that such systems could be easily understood and provide an initial level of success in automating fraud decision making, often their accuracy worsened over time. To try to improve the accuracy more rules are added by the experts, but the system then becomes increasingly complex, slower to process and SOAR – Sparse Oracle-based Adaptive Rule Extraction: Knowledge extraction from large-scale datasets to detect credit card fraud Nick F Ryman-Tubb, Member, IEEE, Artur d'Avila Garcez F 978-1-4244-8126-2/10/$26.00 ©2010 IEEE