Determinism and Evolution Israel Herraiz Universidad Rey Juan Carlos Madrid, Spain herraiz@gsyc.es Jesus M. Gonzalez- Barahona Universidad Rey Juan Carlos Madrid, Spain jgb@gsyc.es Gregorio Robles Universidad Rey Juan Carlos Madrid, Spain grex@gsyc.es ABSTRACT It has been proposed that software evolution follows a Self- Organized Criticality (SOC) dynamics. This fact is sup- ported by the presence of long range correlations in the time series of the number of changes made to the source code over time. Those long range correlations imply that the current state of the project was determined time ago. In other words, the evolution of the software project is governed by a sort of determinism. But this idea seems to contradict intuition. To explore this apparent contradiction, we have performed an empirical study on a sample of 3, 821 libre (free, open source) software projects, finding that their evo- lution projects is short range correlated. This suggests that the dynamics of software evolution may not be SOC, and therefore that the past of a project does not determine its future except for relatively short periods of time, at least for libre software. Categories and Subject Descriptors D.2.7 [Software Engineering]: Distribution, Maintenance, and Enhancement—Reverse engineering, Version control ; D.2.9 [Software Engineering]: Management—Life cycle, Software configuration management, Time estimation General Terms Theory Keywords software evolution, time series analysis, self-organized criti- cality, long term process, short term process Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MSR’08, May 10-11, 2008, Leipzig, Germany. Copyright 2008 ACM 978-1-60558-024-1/08/05 ...$5.00. 1. INTRODUCTION Libre 1 software development has been traditionally a source of strange cases of software evolution. The first to report one of those were Godfrey and Tu [8, 9]. Their findings suggested that the classical Lehman’s laws of software evolution [15] were not fulfilled in the case of Linux, because it was evolv- ing at a growing rate (which in fact was still growing 5 years later [21]). Those cases raised the question of whether libre software evolves differently than propietary software, and whether the laws of software evolution are a valid approach for an universal theory of software evolution. Although these questions have been addressed many times [14], most of the findings and models exposed on those works have failed to provide the theoretical background needed for a proper and universal theory of software evolution. One study that addressed the problem was Wu, in his PhD the- sis [26], who among other interesting findings proposed that the evolution of libre software was governed by a Self Orga- nized Criticality (SOC) dynamics. This conclusion was supported by the presence of long range correlated time series in a set of 11 projects. Re- gardless the suitability of the selected projects for this kind of study, the limited amount of cases studies, or even the methodology used, we find the idea of long range correlated processes in software evolution as contrary to common intu- ition. Long range correlation would mean that the current state of the project is determined (or at least, heavily in- fluenced) by events that took place long time ago. In other words, the evolution of libre software is governed by a sort of determinism. In order to explore if this kind of dynamics is a property of libre software, we have selected a large (3, 821) sample of projects, performing an analysis similar to the one by Wu. We have studied the daily time series of changes, focusing on deciding whether their profile were short or long range correlated. The projects were obtained out of the whole population of projects stored in SourceForge.net, a well known hosting ser- vice for libre software projects, that provides a web-based in- tegrated development environment. The data was obtained using the CVSAnalY SourceForge dataset 2 , maintained by our research group. 1 In this paper we will use the term “libre software” to re- fer both to“free software”, as defined by the Free Software Foundation, and“open source software”, as defined by the Open Source Initiative. 2 http://libresoft.es/Results/CVSAnalY SF 1