Please cite this article in press as: S. Goedertier, et al., Process discovery in event logs: An application in the telecom industry, Appl. Soft
Comput. J. (2010), doi:10.1016/j.asoc.2010.04.025
ARTICLE IN PRESS
G Model
ASOC-875; No. of Pages 14
Applied Soft Computing xxx (2010) xxx–xxx
Contents lists available at ScienceDirect
Applied Soft Computing
journal homepage: www.elsevier.com/locate/asoc
Process discovery in event logs: An application in the telecom industry
Stijn Goedertier
a
, Jochen De Weerdt
a,∗
, David Martens
a,b
, Jan Vanthienen
a
, Bart Baesens
a,c
a
Department of Decision Sciences and Information Management, Katholieke Universiteit Leuven, Naamsestraat 69, B-3000 Leuven, Belgium
b
Department of Business Administration and Public Management, Hogeschool Gent, Universiteit Gent, Voskenslaan 270, B-9000 Ghent, Belgium
c
School of Management, University of Southampton, Highfield Southampton, SO17 1BJ, United Kingdom
article info
Article history:
Received 14 December 2009
Received in revised form 4 February 2010
Accepted 30 April 2010
Available online xxx
Keywords:
Process discovery
AGNEs
HeuristicsMiner
Event logs
Genetic Miner
Data mining
Workflow management systems (WfMS)
abstract
The abundant availability of data is typical for information-intensive organizations. Usually, discerning
knowledge from vast amounts of data is a challenge. Similarly, discovering business process models from
information system event logs is definitely non-trivial. Within the analysis of event logs, process discov-
ery, which can be defined as the automated construction of structured process models from such event
logs, is an important learning task. However, the discovery of these processes poses many challenges.
First of all, human-centric processes are likely to contain a lot of noise as people deviate from standard
procedures. Other challenges are the discovery of so-called non-local, non-free choice constructs, dupli-
cate activities, incomplete event logs and the inclusion of prior knowledge. In this paper, we present an
empirical evaluation of three state-of-the-art process discovery techniques: Genetic Miner, AGNEs and
HeuristicsMiner. Although the detailed empirical evaluation is the main contribution of this paper to
the literature, an in-depth discussion of a number of different evaluation metrics for process discovery
techniques and a thorough discussion of the validity issue are key contributions as well.
© 2010 Elsevier B.V. All rights reserved.
1. Introduction
Organizations currently face an information paradox: the
more they automate their processes, the less they are capable
of monitoring and understanding them. A good understand-
ing of processes is nonetheless vital for fulfilling business
requirements such as verifying and guaranteeing business pro-
cess compliance [26], setting up a coherent access control
policy [14] and optimizing and redesigning business pro-
cesses [17]. A better understanding will eventually enable
organizations to provide better, automated support for their busi-
ness processes in flexible, process-aware information systems
[8,42].
Traditionally, practitioners have been obtaining insight into pro-
cesses using interviewing techniques. A new and promising way
of acquiring insights into business processes is the analysis of the
event logs of information systems [33]. In many organizations, such
event logs conceal an untapped reservoir of knowledge about the
way employees and customers conduct every-day business trans-
actions. Event logs are already available in many organizations.
Popular Enterprise Resource Planning (ERP) systems such as SAP
R/3, Oracle e-Business Suite and workflow management systems
∗
Corresponding author. Tel.: +32 16 32 68 87; fax: +32 16 32 66 24.
E-mail addresses: stijn.goedertier@econ.kuleuven.be (S. Goedertier),
jochen.deweerdt@econ.kuleuven.be (J. De Weerdt),
david.martens@econ.kuleuven.be (D. Martens), jan.vanthienen@econ.kuleuven.be
(J. Vanthienen), bart.baesens@econ.kuleuven.be (B. Baesens).
(WfMS) such as ARIS, TIBCO and Microsoft Biztalk already keep
track of these event logs.
The topic of process discovery is relatively new and can be situ-
ated at an intersection of the fields of Business Process Management
(BPM) and data mining [27]. It is inherently related to data min-
ing and to the more general domain of knowledge discovery in
databases (KDD) since the nature of its objectives is extracting
useful information from large data repositories. Likewise, process
discovery is strongly associated with BPM because of its purpose of
gaining insight into business processes. As a result, process mining
fits flawlessly into the BPM life cycle framework [34,41,47].
Because of the rather novelty of process discovery, it is definitely
valuable to discuss various state-of-the-art discovery algorithms
and assess them in a real-life setting. In order to do so, the remain-
der of this paper is structured as follows. In Section 2, process
discovery and its main challenges are discussed and some basic
concepts of Petri net theory are briefly introduced. Section 3 out-
lines a number of state-of-the-art process discovery techniques.
Section 4 provides a discussion on evaluation metrics. In Section
5, three of the discussed algorithms will be applied on a real-life
event log. Finally, the conclusions are formulated in Section 6.
2. Preliminaries
2.1. Process discovery
The basic idea of process discovery or control-flow discovery
is straightforward: given an event log, automatically compose a
1568-4946/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2010.04.025