Using Active Learning Methods for Predicting Fraudulent Financial Statements Stamatis Karlos 1(&) , Georgios Kostopoulos 2 , Sotiris Kotsiantis 2 , and Vassilis Tampakas 1 1 Department of Computer Engineering Informatics, Technical Educational Institute of Western Greece, Antirrion, Greece stkarlos@upatras.gr, vtampakas@teimes.gr 2 Educational Software Development Laboratory (ESDLab), Department of Mathematics, University of Patras, Patras, Greece kostg@sch.gr, sotos@math.upatras.gr Abstract. Detection of Fraudulent Financial Statements (FFS), or simpler fraud detection problem, refers to the falsication of nancial statements with the aim either to demonstrate larger positive rates, such as assets and prot, or to conceal negative factors, such as expenses and losses. Since the expansion of contem- porary markets and multinational trade are real phenomena, production of large volumes of data under which the operation of the current rms is facilitated constitutes a resulting consequence. Thus, analog upgrade of the antifraud mechanisms should be adopted, enabling the introduction of Machine Learning tools in the related eld. However, because of the inability to collect trustworthy datasets that describe the corresponding ratios of a rm that has conducted fraud actions, strategies that exploit the existence of a few labeled instances for dis- covering useful patterns from a pool of unlabeled data could be proved really efcient. In this work, comparisons of algorithms that operate under Active Learning theory against their supervised variants are being conducted, using data extracted from Greek rms. To the best of our knowledge, this is the rst study that uses Active Learning for predicting FFS. The obtained results prove the superior performance of the corresponding active learners. Keywords: Active learning theory Á Machine learning Á Fraud detection Á Financial ratios Á Classication accuracy 1 Introduction Nowadays, more and more scientic elds are getting affected by the innovations of technology. As a result, either some services are improved towards more protable or even more efcient directions or their nature is totally transformed based on different ideas that merge either from the needs of the current society or from the new demands that have been posed by the problem that is encountered each time. One of the most interactive eld with a great amount of services that people today come in contact with during their daily life is Machine Learning (ML). Although many learning problems are tackled by ML and its subelds, the most wellknown is classication. The objective target of this problem is the assignment of © Springer International Publishing AG 2017 G. Boracchi et al. (Eds.): EANN 2017, CCIS 744, pp. 351362, 2017. DOI: 10.1007/978-3-319-65172-9_30