How the Evolution of Emerging Collaborations Relates to Code Changes: An Empirical Study Sebastiano Panichella 1 , Gerardo Canfora 1 , Massimiliano Di Penta 1 , Rocco Oliveto 2 1 Department of Engineering, University of Sannio, Benevento, Italy 2 Department of Bioscience and Territory, University of Molise, Pesche (IS), Italy ABSTRACT Developers contributing to open source projects spontaneous- ly group into “emerging” teams, reflected by messages ex- changed over mailing lists, issue trackers and other commu- nication means. Previous studies suggested that such teams somewhat mirror the software modularity. This paper em- pirically investigates how, when a project evolves, emerging teams re-organize themselves—e.g., by splitting or merging. We relate the evolution of teams to the files they change, to investigate whether teams split to work on cohesive groups of files. Results of this study—conducted on the evolution history of four open source projects, namely Apache httpd, Eclipse JDT, Netbeans, and Samba—provide indications of what happens in the project when teams reorganize. Specif- ically, we found that emerging team splits imply working on more cohesive groups of files and emerging team merges im- ply working on groups of files that are cohesive from struc- tural perspective. Such indications serve to better under- stand the evolution of software projects. More important, the observation of how emerging teams change can serve to suggest software remodularization actions. Categories and Subject Descriptors D.2.9 [Software Engineering]: Management—Program- ming teams. General Terms Experimentation, Human Factors Keywords Developers’ Communication, Open Source Projects, Mining Software Repositories, Empirical Studies. 1. INTRODUCTION The organization of developers into teams is crucial for the success of software projects. In industrial projects, teams are Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICPC ’14, June 2−3, 2014, Hyderabad, India Copyright 2014 ACM 978-1-4503-2879-1/14/06 ...$15.00. often defined and staffed by project managers, that group people based on the needs of a particular task, and on their availability, skills, and attitude to work together. Dynamics are different in open source projects [11] involving develop- ers spread across the world and working in different time zones, often communicating using electronic means such as mailing lists. In essence, developers participating in open source projects are not staffed into teams by project man- agers. Moreover, the way they collaborate depends on the structure of the open source project, i.e., whether the project is of “cathedral” or “bazaar” type [28]. Generally speaking, developers spontaneously group themselves into “emerging teams” , that can be recognized by observing the developers’ communication network and how developers change source code files. In the rest of the paper, we use such a definition of team. Bird et al. [6] analyzed social networks built on mailing list communication, and found a high correlation relation- ship between the level of email activity and the level of ac- tivity in source code development. Later on, Bird et al. [7] found that there is a causal consequence between the mod- ularity of a software project and the way developers group into teams. They also found that developers belonging to the same sub-community share a larger proportion of files than developers belonging to different sub-communities. When a software project evolves, the way emerging teams are formed and operate may change. This is because during its lifetime a project undergoes different kinds of changes, re- quiring the contribution of different, and possibly new, peo- ple. As pointed out by Hong et al. [21], emerging teams re- organization often happens in correspondence to new project releases. Aim of the paper. Stemming from the above consid- erations, this paper investigates how emerging teams evolve in open source software projects as people focus on different technical activities, i.e., code-changes. By analyzing how people collaborate through mailing lists and issue trackers, and what files they modify, we investigate whether emerg- ing teams evolve with the aim of working on more cohesive groups of files. Figure 1 provides an overview of our analyses: on the one side, we identify emerging teams in different time periods following software releases, and map teams related to subsequent releases/periods. Then, we analyze what files these teams change in the versioning system, and analyze how the cohesiveness of such files changes when teams split or merge. Study Overview. The study has been conducted on the evolution history—consisting of data from versioning sys-