Using Active Learning Methods for Predicting
Fraudulent Financial Statements
Stamatis Karlos
1(&)
, Georgios Kostopoulos
2
, Sotiris Kotsiantis
2
,
and Vassilis Tampakas
1
1
Department of Computer Engineering Informatics,
Technical Educational Institute of Western Greece, Antirrion, Greece
stkarlos@upatras.gr, vtampakas@teimes.gr
2
Educational Software Development Laboratory (ESDLab),
Department of Mathematics, University of Patras, Patras, Greece
kostg@sch.gr, sotos@math.upatras.gr
Abstract. Detection of Fraudulent Financial Statements (FFS), or simpler fraud
detection problem, refers to the falsification of financial statements with the aim
either to demonstrate larger positive rates, such as assets and profit, or to conceal
negative factors, such as expenses and losses. Since the expansion of contem-
porary markets and multinational trade are real phenomena, production of large
volumes of data under which the operation of the current firms is facilitated
constitutes a resulting consequence. Thus, analog upgrade of the antifraud
mechanisms should be adopted, enabling the introduction of Machine Learning
tools in the related field. However, because of the inability to collect trustworthy
datasets that describe the corresponding ratios of a firm that has conducted fraud
actions, strategies that exploit the existence of a few labeled instances for dis-
covering useful patterns from a pool of unlabeled data could be proved really
efficient. In this work, comparisons of algorithms that operate under Active
Learning theory against their supervised variants are being conducted, using
data extracted from Greek firms. To the best of our knowledge, this is the first
study that uses Active Learning for predicting FFS. The obtained results prove
the superior performance of the corresponding active learners.
Keywords: Active learning theory Á Machine learning Á Fraud detection Á
Financial ratios Á Classification accuracy
1 Introduction
Nowadays, more and more scientific fields are getting affected by the innovations of
technology. As a result, either some services are improved – towards more profitable or
even more efficient directions – or their nature is totally transformed based on different
ideas that merge either from the needs of the current society or from the new demands
that have been posed by the problem that is encountered each time. One of the most
interactive field with a great amount of services that people today come in contact with
during their daily life is Machine Learning (ML).
Although many learning problems are tackled by ML and its subfields, the most
well–known is classification. The objective target of this problem is the assignment of
© Springer International Publishing AG 2017
G. Boracchi et al. (Eds.): EANN 2017, CCIS 744, pp. 351–362, 2017.
DOI: 10.1007/978-3-319-65172-9_30