How Does your Software Grow? Evolution and Architectural Change in Open Source Software Michael W. Godfrey Software Architecture Group (SWAG) Department of Computer Science, University of Waterloo email: migod@uwaterloo.ca Abstract Our recent work has addressed how and why software systems evolve over time, with a particular emphasis on software architecture and open source software systems [2, 3, 6]. In this position paper, we present a short sum- mary of two recent projects. First, we have performed a case study on the evolution of the Linux kernel [3], as well as some other open source software (OSS) systems. We have found that several OSS systems appear not to obey some of “Lehman’s laws” of software evolution [5, 7], and that Linux in particular is continuing to grow at a geometric rate. Currently, we are working on a detailed study of the evolution of one of the subsystems of the Linux kernel: the SCSI drivers subsystem. We have found that cloning, which is usually considered to be an indicator of lazy development and poor process, is quite common and is even considered to be a useful prac- tice. Second, we are developing a tool called Beagle to aid software maintainers in understanding how large systems have changed over time. Beagle integrates data from vari- ous static analysis and metrics tools and provides a query engine as well as navigable visualizations. Of particular note, Beagle aims to provide help in modelling long term evolution of systems that have undergone architectural and structural change. 1 Evolution and Growth in Open Source Software Large software systems must evolve, or they risk losing market share to competitors [5]. However, it is well known that maintaining such a system is extraordinarily difficult, complicated, and time consuming. The tasks of adding new features, adding support for new hardware devices and plat- forms, system tuning, and defect fixing all become more difficult as a system ages and grows. Most studies of software evolution have been performed on systems developed within a single company using tradi- tional management techniques. With the widespread avail- ability of several large software systems that have been de- veloped using an “open source” development approach, we now have a chance to examine these systems in detail, and see if their evolutionary narratives are significantly different from commercially developed systems. Lehman’s laws of software evolution [5], which are based on case studies of several large software systems, sug- gest that as a system grows in size, it becomes increasingly difficult to add new code unless explicit steps are taken to maintain the overall design. Turski’s statistical analysis of these case studies suggests that system growth (measured in terms of numbers of source modules and number of mod- ules changed) is usually sub-linear, slowing down as the system gets larger and more complex [7]. 1 Our analysis into the evolution of the Linux kernel [3] has led to several surprising observations. First, we noted that the growth of the kernel has continued at a geometric rate, even as it has surpassed two million lines (MLOC) of source code. Our statistical analysis indicates that a good model for Linux’s growth is size in uncommented LOC days since v1.0 (coefficient of determination calcu- lated using least squares) We measured system size in uncommented LOC; we noted that the growth pattern is roughly the same for commented LOC, number of source code files, and even tar file size. Figure 1 shows the graph of growth in terms of number of source files (the approach favoured by Lehman et al. ). 1 More formally, the Turski/Lehman growth model has been given as where is number of source modules [7]; this equation, when solved directly, can be proven to be approximately . 1