Design and Evaluation of an Automated Aspect Mining Tool David Shepherd, Emily Gibson, and Lori Pollock Computer and Information Sciences University of Delaware Newark, DE 19716 302 831 1953, 302 831 8458 {shepherd, gibson, pollock}@cis.udel.edu Abstract— Attention to aspect oriented programming (AOP) is rapidly growing as its benefits in large software system devel- opment and maintenance are increasingly recognized. However, existing large software systems, which could benefit most from refactoring into AOP, still remain unchanged in practice, due to the high cost of the refactoring. Automatic identification and extraction of aspects would not only enable migration of legacy systems to AOP, but also prevent current systems from accumulating scattered and duplicated code. In this paper, we present the design, implementation, and evaluation of an aspect mining analysis, which automatically identifies desirable candidates for refactoring into AOP, without requiring input from the user or predefined queries. By exploiting the program dependence graph and abstract syntax tree representations of a program, our analysis is able to automatically identify a much larger set of valuable refactoring candidates than current aspect mining techniques, as demonstrated by an empirical evaluation of our automatic mining analysis on two large software systems. Keywords - Aspect Oriented Programming, analysis I. I NTRODUCTION Aspect Oriented Programming (AOP) is used to reduce complexity, increase readability, and improve modularity in software systems. In large software systems, complexity, read- ability, and modularity remain major obstacles[11], [17]. These systems, which could benefit the most from refactoring into AOP, are the last systems that are actually refactored, due to the amount of time and effort that would be required. Typically, programmers decide to exploit AOP when they are implementing a new task in a program that, because of its nature, is going to require adding code in locations scattered throughout many functions, classes, or files. However, the debugging, testing, and maintenance of large legacy systems also could be eased considerably if already existing code for these kinds of tasks could be identified and refactored into AOP style[11], [17]. Even after refactoring, any project where there are multiple programmers, especially if geographically separated (such as an open source project), could benefit from ongoing support for identification of aspects, segments of related code that should be refactored into AspectJ to provide better program readability and modularity. Programmers are unlikely to be completely familiar with other programmers’ code, causing them to miss opportunities for applying aspects to their problems. A tool that automates a large part of the refactoring process is necessary if refactoring systems for AOP are ever to be- come practically viable. Transformation to an aspect oriented program requires two steps: (1) mining, or identification, of code that performs a single task scattered throughout a software system, and (2) refactoring of the system into an aspect oriented program in which the scattered code has been replaced by aspects in an AOP language such as AspectJ[2]. This paper focuses on automating the first step of this process, the identification of refactoring candidates. Existing implemented techniques and tools for aspect min- ing can be categorized as being either lexical or exploratory. While these approaches provide for a means to mine as- pects from legacy software, neither of these approaches are automatic. Both require a seed and depend on the user’s understanding of the software to be refactored. The first kind of analysis used to perform aspect mining was a lightweight approach that performs lexical searches (a simple "Find" operation on a given text with a string or regular expression as the seed), which can be combined with type information[9], [8]. This approach is effective in finding some aspects very quickly. However, more useful lexical searches are dependent on the coding practices of the programmer, such as variable or method naming conventions, which are hard to guarantee, especially in a legacy system[9], [8]. Lexical analysis is usually initiated by the user specifying a seed (either a regular expression or a string), and then the tool performs a grep-like function on the code base. Lexical analysis for AOP is often used in conjunction with a visualization tool, which displays the source files with the results of the search highlighted. This is especially helpful to prospective AOP programmers by letting them see the scattered nature of certain aspects. These lexical tools have been demonstrated to be most useful in situations where programmers follow a set of principles when naming their variables and methods [9], [8]. Lexical tools are now used in combination with type information to provide more intelligent mining. Execution of the analysis is still quick, yet the tool is able to find several kinds of aspects that lexical analysis alone cannot discover. Some systems even rank the amount of scatter that different types in a program have in order to