Interactive Data-driven Process Model Construction P.M. Dixit 1 , H.M.W. Verbeek 1 , J.C.A.M. Buijs 1 , and W.M.P. van der Aalst 2 1 Eindhoven University of Technology, Eindhoven, The Netherlands 2 Rheinisch-Westflische Technische Hochschule, (RWTH) Aachen, Germany {p.m.dixit,h.m.w.verbeek,j.c.a.m.buijs}@tue.nl wvdaalst@pads.rwth-aachen.de Abstract. Process discovery algorithms address the problem of learning process models from event logs. Typically, in such settings a user’s acti- vity is limited to conﬁguring the parameters of the discovery algorithm, and hence the user expertise/domain knowledge can not be incorporated during traditional process discovery. In a setting where the event logs are noisy, incomplete and/or contain uninteresting activities, the process models discovered by discovery algorithms are often inaccurate and/or incomprehensible. Furthermore, many of these automated techniques can produce unsound models and/or cannot discover duplicate activities, si- lent activities etc. To overcome such shortcomings, we introduce a new concept to interactively discover a process model, by combining a user’s domain knowledge with the information from the event log. The disco- vered models are always sound and can have duplicate activities, silent activities etc. An objective evaluation and a case study shows that the proposed approach can outperform traditional discovery techniques. Keywords: HCI, process discovery, process mining 1 Introduction Process discovery, a sub-ﬁeld of process mining, aims at discovering process models from event logs. Most discovery algorithms aim to do so automatically by learning patterns from the event log. Automated process discovery algorithms work well in settings where the event log contains all the necessary (e.g. noise free, complete) information required by the algorithm, and the language of the underlying model is about the same as the language of the models discovered by the discovery algorithm. However, in many real world scenarios this is not the case. First, the discovered process models might explain the event logs extremely well, but may still be completely incomprehensible to the end user. Therefore, it is imperative to enable the user to have control over the process model being discovered, thereby also enabling incorporation of domain knowledge during pro- cess discovery. Second, the process models discovered by discovery algorithms are constrained by the vocabulary of the language used for representing the model, i.e., representational bias [1]. That is, some process discovery algorithms may not discover silent activities (i.e., skippable activities), duplicate activities (i.e., activities that occur more than once) etc. Third, many discovery algorithms may