Micro Pattern Evolution Sunghun Kim Department of Computer Science University of California, Santa Cruz Santa Cruz, CA, USA hunkim@cs.ucsc.edu Kai Pan Department of Computer Science University of California, Santa Cruz Santa Cruz, CA, USA pankai@cs.ucsc.edu E. James Whitehead, Jr. Department of Computer Science University of California, Santa Cruz Santa Cruz, CA, USA ejw@cs.ucsc.edu ABSTRACT When analyzing the evolution history of a software project, we wish to develop results that generalize across projects. One approach is to analyze design patterns, permitting characteristics of the evolution to be associated with patterns, instead of source code. Traditional design patterns are generally not amenable to reliable automatic extraction from source code, yet automation is crucial for scalable evolution analysis. Instead, we analyze “micro pattern” evolution; patterns whose abstraction level is closer to source code, and designed to be automatically extractable from Java source code or bytecode. We perform micro-pattern evolution analysis on three open source projects, ArgoUML, Columba, and jEdit to identify micro pattern frequencies, common kinds of pattern evolution, and bug-prone patterns. In all analyzed projects, we found that the micro patterns of Java classes do not change often. Common bug- prone pattern evolution kinds are ‘Pool ! Pool’, ‘Implementor ! NONE’, and ‘Sampler ! Sampler’. Among all pattern evolution kinds, ‘Box’, ‘CompoundBox’, ‘Pool’, ‘CommonState’, and ‘Outline’ micro patterns have high bug rates, but they have low frequencies and a small number of changes. The pattern evolution kinds that are bug-prone are somewhat similar across projects. The bug-prone pattern evolution kinds of two different periods of the same project are almost identical. Categories and Subject Descriptors D.2.7 [Software Engineering]: Distribution, Maintenance, and Enhancement – Restructuring, reverse engineering, and reengineering, D.2.8 [Software Engineering]: Metrics – Product metrics, K.6.3 [Management of Computing and Information Systems]: Software Management – Software maintenance General Terms Algorithms, Measurement, Experimentation 1. INTRODUCTION Software evolution research examines the development history of a software project to learn facts about the software, and better understand its qualities. After examining the history of many different software projects, ideally we would like to be able to make claims like, if we observe evolution pattern X, then the consequences for one or more software qualities are Y and Z. Most software repository mining research examines software by subdividing it into parts using physical distinctions, such as modules, directories, files, and methods. Researchers examine the evolution of these physical elements, and then correlate various software properties with traits of the observed evolution. For example, researchers have examined revision histories to determine correlations between changes and bugs [13]. Though there has been much success in correlating software properties with the evolution of physical elements within a project, the ability to apply these results to other projects has been limited. This is due to the use of the software’s existing physical distinctions, which limits the applicability of results to just a single project. Knowing something about the evolution of the methods in a specific Java class does not typically provide any insight into other classes, since different classes have different source code. To make more generalizable observations requires some means for abstracting away from the physical elements into abstract categories. These categories need to be concrete enough to capture important aspects of the behavior of the software, yet sufficiently general that one can observe the same abstract categories across multiple projects. The classic software design patterns [6] fit this description, and suggest the possibility that we can deeply understand the evolutionary behavior of specific design patterns. To perform such analysis in a scalable way, we need an automated mechanism for extracting software design patterns from source code. Unfortunately, to date there is no accurate mechanism for identifying design patterns in code, with existing approaches suffering from large amounts of false positives or false negatives. Recent work by Gil and Maman has introduced the concept of micro patterns [7], which are “Java class-level traceable patterns.” These are more fine-grained design patterns than the classic patterns, and have been designed to always be automatically extractable from source code (or bytecode). Micro patterns express more fine-grained design idioms than classic patterns. For our purposes, what is important is that we now have a reliable, automatic way to extract a set of general design abstractions from Java projects. This now allows us to explore whether evolution characteristics can be correlated with the abstractions inherent in these micro patterns, and make generalizable conclusions about specific evolution patterns. In this paper we analyze the micro pattern evolution of three open source projects, ArgoUML, Columba, and JEdit, shown in Table 1. Our goal in doing so is to examine whether there are any correlations between the evolution of micro patterns and the likelihood of having bugs. Ideally we wish to identify micro pattern evolution kinds that are consistently fault prone across projects, and hence allow us to make general conclusions about this kind of evolution that have broad applicability. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MSR ’06, May 22-23, 2006, Shanghai, China. Copyright 2006 ACM 1-59593-085-X/06/0005…$5.00.