Using Active Learning Methods for Predicting Fraudulent Financial Statements Stamatis Karlos 1(&) , Georgios Kostopoulos 2 , Sotiris Kotsiantis 2 , and Vassilis Tampakas 1 1 Department of Computer Engineering Informatics, Technical Educational Institute of Western Greece, Antirrion, Greece stkarlos@upatras.gr, vtampakas@teimes.gr 2 Educational Software Development Laboratory (ESDLab), Department of Mathematics, University of Patras, Patras, Greece kostg@sch.gr, sotos@math.upatras.gr Abstract. Detection of Fraudulent Financial Statements (FFS), or simpler fraud detection problem, refers to the falsiﬁcation of ﬁnancial statements with the aim either to demonstrate larger positive rates, such as assets and proﬁt, or to conceal negative factors, such as expenses and losses. Since the expansion of contem- porary markets and multinational trade are real phenomena, production of large volumes of data under which the operation of the current ﬁrms is facilitated constitutes a resulting consequence. Thus, analog upgrade of the antifraud mechanisms should be adopted, enabling the introduction of Machine Learning tools in the related ﬁeld. However, because of the inability to collect trustworthy datasets that describe the corresponding ratios of a ﬁrm that has conducted fraud actions, strategies that exploit the existence of a few labeled instances for dis- covering useful patterns from a pool of unlabeled data could be proved really efﬁcient. In this work, comparisons of algorithms that operate under Active Learning theory against their supervised variants are being conducted, using data extracted from Greek ﬁrms. To the best of our knowledge, this is the ﬁrst study that uses Active Learning for predicting FFS. The obtained results prove the superior performance of the corresponding active learners. Keywords: Active learning theory Á Machine learning Á Fraud detection Á Financial ratios Á Classiﬁcation accuracy 1 Introduction Nowadays, more and more scientiﬁc ﬁelds are getting affected by the innovations of technology. As a result, either some services are improved – towards more proﬁtable or even more efﬁcient directions – or their nature is totally transformed based on different ideas that merge either from the needs of the current society or from the new demands that have been posed by the problem that is encountered each time. One of the most interactive ﬁeld with a great amount of services that people today come in contact with during their daily life is Machine Learning (ML). Although many learning problems are tackled by ML and its subﬁelds, the most well–known is classiﬁcation. The objective target of this problem is the assignment of © Springer International Publishing AG 2017 G. Boracchi et al. (Eds.): EANN 2017, CCIS 744, pp. 351–362, 2017. DOI: 10.1007/978-3-319-65172-9_30