Please cite this article in press as: S. Goedertier, et al., Process discovery in event logs: An application in the telecom industry, Appl. Soft Comput. J. (2010), doi:10.1016/j.asoc.2010.04.025 ARTICLE IN PRESS G Model ASOC-875; No. of Pages 14 Applied Soft Computing xxx (2010) xxx–xxx Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Process discovery in event logs: An application in the telecom industry Stijn Goedertier a , Jochen De Weerdt a, , David Martens a,b , Jan Vanthienen a , Bart Baesens a,c a Department of Decision Sciences and Information Management, Katholieke Universiteit Leuven, Naamsestraat 69, B-3000 Leuven, Belgium b Department of Business Administration and Public Management, Hogeschool Gent, Universiteit Gent, Voskenslaan 270, B-9000 Ghent, Belgium c School of Management, University of Southampton, Highfield Southampton, SO17 1BJ, United Kingdom article info Article history: Received 14 December 2009 Received in revised form 4 February 2010 Accepted 30 April 2010 Available online xxx Keywords: Process discovery AGNEs HeuristicsMiner Event logs Genetic Miner Data mining Workflow management systems (WfMS) abstract The abundant availability of data is typical for information-intensive organizations. Usually, discerning knowledge from vast amounts of data is a challenge. Similarly, discovering business process models from information system event logs is definitely non-trivial. Within the analysis of event logs, process discov- ery, which can be defined as the automated construction of structured process models from such event logs, is an important learning task. However, the discovery of these processes poses many challenges. First of all, human-centric processes are likely to contain a lot of noise as people deviate from standard procedures. Other challenges are the discovery of so-called non-local, non-free choice constructs, dupli- cate activities, incomplete event logs and the inclusion of prior knowledge. In this paper, we present an empirical evaluation of three state-of-the-art process discovery techniques: Genetic Miner, AGNEs and HeuristicsMiner. Although the detailed empirical evaluation is the main contribution of this paper to the literature, an in-depth discussion of a number of different evaluation metrics for process discovery techniques and a thorough discussion of the validity issue are key contributions as well. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Organizations currently face an information paradox: the more they automate their processes, the less they are capable of monitoring and understanding them. A good understand- ing of processes is nonetheless vital for fulfilling business requirements such as verifying and guaranteeing business pro- cess compliance [26], setting up a coherent access control policy [14] and optimizing and redesigning business pro- cesses [17]. A better understanding will eventually enable organizations to provide better, automated support for their busi- ness processes in flexible, process-aware information systems [8,42]. Traditionally, practitioners have been obtaining insight into pro- cesses using interviewing techniques. A new and promising way of acquiring insights into business processes is the analysis of the event logs of information systems [33]. In many organizations, such event logs conceal an untapped reservoir of knowledge about the way employees and customers conduct every-day business trans- actions. Event logs are already available in many organizations. Popular Enterprise Resource Planning (ERP) systems such as SAP R/3, Oracle e-Business Suite and workflow management systems Corresponding author. Tel.: +32 16 32 68 87; fax: +32 16 32 66 24. E-mail addresses: stijn.goedertier@econ.kuleuven.be (S. Goedertier), jochen.deweerdt@econ.kuleuven.be (J. De Weerdt), david.martens@econ.kuleuven.be (D. Martens), jan.vanthienen@econ.kuleuven.be (J. Vanthienen), bart.baesens@econ.kuleuven.be (B. Baesens). (WfMS) such as ARIS, TIBCO and Microsoft Biztalk already keep track of these event logs. The topic of process discovery is relatively new and can be situ- ated at an intersection of the fields of Business Process Management (BPM) and data mining [27]. It is inherently related to data min- ing and to the more general domain of knowledge discovery in databases (KDD) since the nature of its objectives is extracting useful information from large data repositories. Likewise, process discovery is strongly associated with BPM because of its purpose of gaining insight into business processes. As a result, process mining fits flawlessly into the BPM life cycle framework [34,41,47]. Because of the rather novelty of process discovery, it is definitely valuable to discuss various state-of-the-art discovery algorithms and assess them in a real-life setting. In order to do so, the remain- der of this paper is structured as follows. In Section 2, process discovery and its main challenges are discussed and some basic concepts of Petri net theory are briefly introduced. Section 3 out- lines a number of state-of-the-art process discovery techniques. Section 4 provides a discussion on evaluation metrics. In Section 5, three of the discussed algorithms will be applied on a real-life event log. Finally, the conclusions are formulated in Section 6. 2. Preliminaries 2.1. Process discovery The basic idea of process discovery or control-flow discovery is straightforward: given an event log, automatically compose a 1568-4946/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2010.04.025