Detecting the ﬁnancial statement fraud: The analysis of the differences between data mining techniques and experts’ judgments Chi-Chen Lin a,1 , An-An Chiu b,2 , Shaio Yan Huang c,3 , David C. Yen d,⇑ a Department of Accounting Information, National Taipei University of Business, No. 321, Sec. 1, Jinan Rd., Zhongzheng, Taipei, Taiwan b Department of International Trade, Feng Chia University, No. 100, Wenhwa Rd., Seatwen, Taichung 40724, Taiwan c Department of Accounting and Information Technology, National Chung Cheng University, 168 University Rd., Min-Hsiung, Chia-Yi 62102, Taiwan d School of Economics and Business, 226 Netzer Administration Bldg., SUNY College at Oneonta, Oneonta, NY 13820, United States article info Article history: Received 4 November 2014 Received in revised form 14 August 2015 Accepted 18 August 2015 Available online 24 August 2015 Keywords: Fraud factor Fraud triangle Data mining abstract The objective of this study is to examine all aspects of fraud triangle using the data mining techniques and employ the available and public information to proxy variables to evaluate such attributes as pressure/incentive, opportunity, and attitude/rationalization, based on the ﬁndings from prior studies in this subject ﬁeld and also the Statement on Auditing Standards. The second objective is to discuss whether or not the suggestion of the experts agrees with the results obtained from adopting those novel techniques. In speciﬁc, this study uses both expert questionnaires and data mining techniques to sort out the different fraud factors and then rank the importance of them. The data mining methods employed in this research include Logistic Regression, Decision Trees (CART), and Artiﬁcial Neural Networks (ANNs). Empirically, the ANNs and CART approaches work with the training and testing samples in a correct classiﬁcation rate of 91.2% (ANNs) & 90.4% (CART) and 92.8% (ANNs) & 90.3% (CART), respectively, which is more accurate than the logistic model that only reaches 83.7% and 88.5% of the correct classiﬁcation in assessing the fraud presence. In addition, type II error of ANNs drops signiﬁcantly to 23.9% from 43.3% and 27.8% compared to the ones using CART and logistic models. Finally, the differences between differ- ent data mining tools and expert judgments are also compared to provide more insights as a research contribution. Ó 2015 Elsevier B.V. All rights reserved. 1. Introduction After the occurrence of several major scandals (e.g., Enron Corp., Tyco, and WorldCom Inc.), the loss of market capitalization result- ing from the reported ﬁnancial statement fraud is estimated to be about $460 billion [39]. In 2014, Association of Certiﬁed Fraud Examiners (ACFE) reported that the U.S. organizations lose almost 5 percent of their revenue due to fraud, and the Gross Domestic Product (GDP) based annual fraud estimate for U.S. alone is around $3.7 trillion (ACFE, 2014). Sorkin [41] reported that there are 343 criminals and 189 civil defendants involved with fraudulent activ- ities which have harmed more than 120,000 victims with a value of more than $8 billion in recent years in the United States. Financial fraud is becoming an increasingly serious problem and as a result, effective detecting accounting fraud has always been an important but rather complex task for accounting professionals [29,13,37,34]. Examining the ﬁnancial fraud is in fact one of the hot issues given that the economic and social fallouts from the fraud can be massive [22]. After AICPA issued SAS No. 82, a greater responsibility has been imposed onto the auditors to detect fraud in general, and in dealing with the effective management of fraud in particular. However, this aforementioned act did not provide more speciﬁc and objective guidelines. Following the issuance of SAS No. 99 and Sarbanes–Oxley Act, the aim of preventing fraud with a more rigorous internal control oversight is placed as a major focus and it has stimulated and inspired the numerous academic studies [42,33,12,18] in this subject area. A proliﬁc area of prior research has focused on using different tools and techniques to detect frauds such as analytical procedures, ratio analysis, regression analysis, score propagation over an auction network (SPAN) and checklists to improve the fraud detec- tion [16,19,48]. However, the previous studies may result in too many fraud risk factors to identify the importance of each fraud http://dx.doi.org/10.1016/j.knosys.2015.08.011 0950-7051/Ó 2015 Elsevier B.V. All rights reserved. ⇑ Corresponding author. Tel.: +1 607 436 3458 (ofﬁce); fax: +1 607 436 2543. E-mail addresses: c97ve47@yahoo.com.tw (C.-C. Lin), aachiu@fcuoa.fcu.edu.tw (A.-A. Chiu), actsyh@yahoo.com.tw (S.Y. Huang), David.Yen@oneonta.edu (D.C. Yen). 1 Tel.: +886 2 23226362. 2 Tel.: +886 4 24517250x4076. 3 Tel.: +886 5 2720411x34501; fax: +886 5 2721197. Knowledge-Based Systems 89 (2015) 459–470 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys