Local Process Models: Pattern Mining with Process Models Niek Tax, Natalia Sidorova, Wil M.P. van der Aalst {n.tax,n.sidorova,w.m.p.v.d.aalst}@tue.nl Eindhoven University of Technology, The Netherlands Keywords: pattern mining, process mining, business process modeling, data mining 1. Introduction Process mining aims to extract novel insights from event data (van der Aalst, 2016). Process discovery plays a prominent role in process mining. The goal is to discover a process model that is representative for the set of event sequences in terms of start-to-end behavior, i.e. from the start of a case till its termi- nation. Many process discovery algorithms have been proposed and applied to a variety of real life cases. A more conventional perspective on discovering insights from event sequences can be found in the areas of se- quential pattern mining (Agrawal & Srikant, 1995) and episode mining (Mannila et al., 1997), which focus on finding frequent patterns, not aiming for descriptions of the full event sequences from start to end. Sequential pattern mining is limited to the discovery of sequential orderings of events, while process dis- covery methods aim to discover a larger set of event relations, including sequential orderings, (exclusive) choice relations, concurrency, and loops, represented in process models such as Petri nets (Reisig, 2012), BPMN (Object Management Group, 2011), or UML activity diagrams. Process models distinguish themselves from more traditional sequence mining approaches like Hidden Markov Models (Rabiner, 1989) and Recurrent Neural Networks with their visual representation, which allows them to be used for communication between process stakeholders. However, process discovery is normally limited to the discovery of a complete model that captures the full behavior of process instances, and not local patterns within instances. Local Process Models (LPMs) allow the mining of patterns positioned in-between simple patterns (e.g. subsequences) and end-to-end models, focusing on a subset of the process activities and describing frequent patterns of behavior. 2. Motivating Example Imagine a sales department where multiple sales of- ficers perform four types of activities: (A) register a call for bids, (B) investigate a call for bids from the business perspective, (C) investigate a call for bids from the legal perspective, and (D) decide on partici- pation in the call for bid. The event sequences (Figure 1(a)) contain the activities performed by one sales of- ficer throughout the day. The sales officer works on different calls for bids and not necessarily performs all activities for a particular call himself. Applying discovery algorithms, like the Inductive Miner (Lee- mans et al., 2013), yields models allowing for any sequence of events (Figure 1(c)). Such ”flower-like” models do not give any insight in typical behavioral patterns. When we apply any sequential pattern min- ing algorithm using a threshold of six occurrences, we obtain the seven length-three sequential patterns de- picted in Figure 1(d) (results obtained using the SPMF (Fournier-Viger et al., 2014) implementation of the PrefixSpan algorithm (Pei et al., 2001)). However, the data contains a frequent non-sequential pattern where a sales officer first performs A, followed by B and C in arbitrary order (Figure 1(b)). This pattern cannot be found with existing process discovery or sequential pattern mining techniques. The two numbers shown in the transitions (i.e., rectangles) represent (1) the num- ber of events of this type in the event log that fit this local process model and (2) the total number of events of this type in the event log. For example, 13 out of 19 events of type C in the event log fit transition C, which are indicated in bold in the log in Figure 1(a). Under- lined sequences indicate non-continuous instances, i.e. instances with non-fitting events in-between the events forming the instance of the local process model. 3. LPM Discovery Approach A technique for the discovery of Local Process Mod- els (LPMs) is described in detail in (Tax et al., 2016a). LPM discovery uses the process tree (Buijs et al., 2012) process model notation, an example of which is SEQ (A, B), which is a sequential pat- tern that describes that activity B occurs after ac- tivity A. Process tree models are iteratively ex- 83