Towards Efficient Business Process Clustering and Retrieval: Combining Language Modeling and Structure Matching Mu Qiao 1 , Rama Akkiraju 2 , and Aubrey J. Rembert 2 1 Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA muq103@cse.psu.edu 2 IBM T.J. Watson Research Center, Hawthorne, NY 10532, USA {akkiraju,ajrember}@us.ibm.com Abstract. Large organizations tend to have hundreds of business processes. Discovering and understanding similarities among business processes can be useful to organizations for a number of reasons includ- ing better overall process management and maintenance. In this paper we present a novel and efficient approach to cluster and retrieve busi- ness processes. A given set of business processes are clustered based on their underlying topic, structure and semantic similarities. In addition, given a query business process, top k most similar processes are retrieved based on clustering results. In this work, we bring together two not well- connected schools of work: statistical language modeling and structure matching and combine them in a novel way. Our approach takes into account both high-level topic information that can be collected from pro- cess description documents and keywords as well as detailed structural features such as process control flows in finding similarities among busi- ness processes. This ability to work with processes that may not always have formal control flows is particularly useful in dealing with real-world business processes which are not always described formally. We devel- oped a system to implement our approach and evaluated it on several collections of industry best practice processes and real-world business processes at a large IT service company that are described at varied levels of formalisms. Our experimental results reveal that the combined language modeling and structure matching based retrieval outperforms structure-matching-only techniques in both mean average precision and running time measures. 1 Introduction Large organizations tend to have hundreds of business processes. Discovering and understanding the similarities among business processes can be useful to or- ganizations for many reasons. First, commonly occurring process activities can be managed more efficiently resulting in business process maintenance efficien- cies. Second, common and similar processes and process activities can be reused S. Rinderle-Ma, F. Toumani, and K. Wolf (Eds.): BPM 2011, LNCS 6896, pp. 199–214, 2011. c Springer-Verlag Berlin Heidelberg 2011