Data-Driven Process Performance Measurement and Prediction: A Process-Tree-Based Approach Sebastiaan J. van Zelst 1,2 , Luis F.R. Santos 2 , Wil M.P. van der Aalst 2,1 1 Fraunhofer Institute for Applied Information Technology (FIT), Germany sebastiaan.van.zelst@fit.fraunhofer.de 2 RWTH Aachen University, Aachen, Germany Abstract. To achieve operational excellence, a clear understanding of the core processes of a company is vital. Process mining enables com- panies to achieve this by distilling historical process knowledge based on recorded historical event data. Few techniques focus on the predic- tion of process performance after process redesign. This paper proposes a foundational framework for a data-driven business process redesign approach, allowing the user to investigate the impact of changes in the process, w.r.t. the overall process performance. The framework supports the prediction of future performance based on anticipated activity-level performance changes and control-flow changes. We have applied our ap- proach to several real event logs, confirming our approach’s applicability. Key words: Process mining, Process improvement, Process redesign 1 Introduction Information systems, e.g., Enterprise Resource Planning (ERP), support the execution of a company’s core processes. These systems capture at what point in time an activity was performed for an instance of the process. Process mining techniques turn such event data into actionable knowledge [1]. For example, various process discovery techniques exist that transform the event data into a process model describing the process behavior as captured in the data [2]. Similarly, conformance checking techniques quantify whether the process behaves as recorded in the event data w.r.t. a given reference model [3]. The overarching aim of process mining techniques is to improve the process, e.g., decreasing the process duration while maintaining the same quality level. Yet, a relatively small amount of work focuses on data-driven techniques to sup- port decision-makers in effectively improving the process. For example, in [4], the authors propose to discover simulation models on the basis of recorded event data, which can be used to simulate the process under different “What if” sce- narios. In [5], a similar approach is proposed, explicitly focusing on macro-level aspects of the process, e.g., average case duration. The work presented in this pa- per acts in the middle of the two spectra covered by the work mentioned. Similar to [4], we measure performance on the activity-level. However, we do not learn a complete simulation model. Instead, we explain the historical behavior captured