Kamran Vatanabadi, December 2017 1 CREDIT CARD TRANSACTIONS ANOMALY DETECTION A PRACTICAL GUIDE ON MODELING CUSTOMER BEHAVIOR I. Introduction Most people think deep neural networks 1 or any other modern machine learning techniques are capable of doing almost anything at least in specific single domains. Even if we consider this is a correct claim, the problem is we usually do not have access to enough data (specially labeled) to train a neural network and expect that magic. That is why most of the successful machine learning projects which catch our attention come from giant companies who have enough money to pay for data and hire hundreds of people to label the data. Now the question is, what can we do if we do not have that much money or labeled data, but still want to or have to use machine learning techniques? This technical white paper is the result of real experience on different projects in medium to large sized companies. Companies that just give us access to millions of transactions per hour and ask to find the anomalies!? Solving these kinds of problems is the real magic! II. Knowledge usage All we need to extract knowledge from any source of data is clustering, but the problem with using clustering alone is that it does not guarantee the obtained knowledge is what we are looking for or is understandable with our already trained mind at all. Our minds already have been trained and contain millions of patterns; these are the patterns that give meaning to our life, the reason we are looking for anomalies in credit card transactions is also related to some of these patterns! That is why we usually prefer using classification because, in that way, we push the knowledge extraction process - deep neural network - to find the solution the way our minds like it. Now we need to fill the gap of having no labeled data with something else, which is our previous knowledge, but not in form of labeling the data, because it has costs. So, we see in both cases we have to use our existing knowledge by either applying it during the process of data labeling or using the knowledge during the system design. 1 Or recurrent neural network when temporal patterns are the problem. 2 “Fraud” is not an “anomaly”, but we use it here as like as anomaly. Frauds are usually complicated and already planned like money laundry. III. Credit card transactions You get your first credit card, and after a year of using it, you decide to go on a vacation. You go to a different country, say from America to Europe; you think if you use your credit card, it might not get through or might get declined. However, nothing happens it works! Now the question is, does their fraud 2 detection system works at all? Or it is that much intelligent that knows it is you, spending the money? There is no way they can make sure that it is you who is using the card in another country unless they track all your life event and know that you are going to Europe. It means even if someone steals your credit card and goes to Europe and uses it, bank's fraud detection system cannot necessarily identify whether it is legitimate or not unless the thief changes your usage pattern. In fact, if the fraud detection system wants to catch the fraudulent transaction the moment it happens, the system gives you many false positive, so these - anomaly or fraud detection - systems usually wait to collect enough data and then when they enough evidence raise anomaly alarm.