Supervised and unsupervised learning approaches to telecommunications fraud detection DRAFT – PLEASE DO NOT DISTRIBUTE Constantinos S. Hilas * , Paris As. Mastorocostas b Dept. of Informatics and Communications, Technological Educational Institute of Serres, Serres GR-621 24, Greece chilas@teiser.gr, b mast@teiser.gr Abstract. This paper investigates the usefulness of applying different learning approaches to a problem of telecommunications fraud detection. Five different user models are compared by means of both supervised and unsupervised learn- ing techniques, namely the multilayer perceptron and the hierarchical agglom- erative clustering. One aim of the study is to identify the user model that best identifies fraud cases. The second task is to explore different views of the same problem and see what can be learned form the application of each different technique. The models are compared in terms of their performances. Each tech- nique’s outcome is evaluated with appropriate measures. 1 Introduction Telecommunications fraud can be simply described as any activity by which tele- communications service is obtained without intention of paying [1]. Telecommunica- tions fraud has certain characteristics that make it particularly attractive to fraudsters. The main one is that the danger of localization is small. This is because all actions are performed from a distance which in conjunction with the mess topology and the size of networks makes the process of localization time-consuming and expensive. On the other hand, no particularly sophisticated equipment is needed, if one is needed at all. The simple knowledge of an access code, which can be acquired even with methods of social engineering, makes the implementation of fraud feasible. Finally, the prod- uct of telecommunications fraud, a phone call, is directly convertible to money. The illegal sale of illegally acquired telecommunications service time is big business, [2]. Several categories of telecommunications fraud have been reported. The main are the technical fraud, the contractual fraud, the hacking fraud, and the procedural fraud, [1]. In [3], twelve distinct fraud types are identified while combinations of them have also been reported, [4]. The most common fraud scenario in private networks is the superimposed fraud. This is the case of an employee, the fraudster, who uses another employee’s authorization code to access outgoing trunks and costly services. Thus, the fraudster’s activity is superimposed over the legitimate user’s one. Telecommunications fraud has drawn the attention of many researchers in the re- sent years not only due to the huge economic burden on companies’ accountings but