Detection of Redundant Code using R 2 D 2 Ant´ onio Menezes Leit˜ ao INESC-ID/Technical University of Lisbon Rua Alves Redol n. 9 1000–029 Lisboa Portugal antonio.leitao@dei.ist.utl.pt Abstract We present the R 2 D 2 redundancy detector. R 2 D 2 identifies redundant code fragments in large software systems written in Lisp. For each pair of code fragments, R 2 D 2 uses a combination of techniques ranging from syntax- based analysis to semantics-based analysis, that detects positive and negative evidences regarding the redundancy of the analyzed code fragments. These evidences are combined according to a well-defined model and sufficiently redundant fragments are reported to the user. R 2 D 2 explores several techniques and heuristics to operate within reasonable time and space bounds and is designed to be extensible. 1. Introduction One of the most used editing operations of our time is copy&paste. Whenever a programmer uses copy&paste he/she introduces redundant code in the program, in the sense that it is code that didn’t need to be there if the programmer decided instead to refactor the original code. Redundant code is, thus, code that can be eliminated using an appropriate abstraction or reuse technique. It is well known that redundant code severely impacts program maintenance and should be minimized. We will now look at the causes and consequences of redundancy. 1.1. Causes of Redundancy There are several causes for the existence of redundant code including duplication, idioms, design patterns and coincidence. Duplication Code duplication occurs when the programmer prefers to copy, paste and modify some code frag- ment instead of writing a new one. This is a very frequent phenomena in software development. The duplicates are also known as clones [BYM + 98]. Code duplication occurs because: It’s an extremely simple form of reuse. When a single version of the code must satisfy several purposes it might be preferable to create dupli- cates to allow for its independent evolution. When programmer productivity is measured by the amount of written code, duplication increases income.