Assessing Semistructured Merge in Version Control Systems: A Replicated Experiment Guilherme Cavalcanti Federal University of Pernambuco Recife, Brazil gjcc@cin.ufpe.br Paola Accioly Federal University of Pernambuco Recife, Brazil prga@cin.ufpe.br Paulo Borba Federal University of Pernambuco Recife, Brazil phmb@cin.ufpe.br Abstract— Context: To reduce the integration effort arising from conﬂicting changes resulting from collaborative software development tasks, unstructured merge tools try to automati- cally solve part of the conﬂicts via textual similarity, whereas structured and semistructured merge tools try to go further by exploiting the syntactic structure of the involved artefacts. Ob- jective: In this paper, aiming at increasing the existing body of evidence and assessing results for systems developed under an alternative version control paradigm, we replicate an experiment conducted by Apel et al. [1] to compare the unstructured and semistructured approach with respect to the occurrence of conﬂicts reported by both approaches. Method: We used both semistructured and unstructured merge in a sample 2.5 times bigger than the original study regarding the number of projects and 18 times bigger regarding the number of merge scenarios, and we compared the occurrence of conﬂicts. Results: Similar to the original study, we observed that semistructured merge reduces the number of conﬂicts in 55% of the scenarios of the new sample. However, the observed average conﬂict reduction of 62% in these scenarios is far superior than what has been observed before. We also bring new evidence that the use of semistructured merge can reduce the occurrence of conﬂicting merge scenarios by half. Conclusions: Our ﬁndings reinforce the beneﬁts of exploiting the syntactic structure of the artefacts involved in code integration. Besides, the reductions observed in the number and size of conﬂicts suggest that the use of semistructured merge, when compared to the unstructured approach, might decrease integration effort without compromising correctness. Keywords— replication study, collaborative development, soft- ware merging, semistructured merge, version control systems. I. I NTRODUCTION In a collaborative development environment, developers often implement tasks in an independent way using individual copies of project ﬁles. As a result, while merging separate code contributions from each task, one likely has to deal with conﬂicting changes and dedicate substantial effort to resolve conﬂicts. These conﬂicts occur due to a number of reasons. For example, when different developers make changes to the same artefact without being aware of the other changes — the so-called direct or textual conﬂicts — or when there are concurrent modiﬁcations in different artefacts, leading to build or test failures — the indirect conﬂicts [2, 3]. Regardless of the nature of the conﬂicts, they may hamper productivity, since detecting and solving conﬂicts might be a tiresome and error prone activity, and, as a consequence, they delay the project while developers trace its cause and seek a solution. To learn about the occurrence of conﬂicts and their conse- quences, previous empirical studies answer questions concern- ing when developers detect conﬂicts, and how often conﬂicts occur. Zimmermann [4], for instance, describes that textual conﬂicts occurred in a range from 23% to 47% of all ﬁles’ integration. Brun et al. [2] and Kasi and Sarma [5] found that textual conﬂicts occurred in an average of 15% of all merge scenarios — a set consisting of a common base revision and its derived versions — and 31% of the merge scenarios free of textual conﬂicts resulted in build or behavioral errors. Such evidence motivates and guides the design of tools that use different strategies to both decrease integration effort and improve correctness during code integration. For example, to reduce the integration effort, unstructured merge tools are purely text-based and resolve conﬂicts via textual similarity. On the other hand, a structured merge tool is tailored to a speciﬁc programming language and uses knowledge of the language’s grammar to resolve conﬂicts [6, 7, 8, 9, 10]. Finally, semistructured merge [1] attempts to combine the previous ones, so that it provides structural information about software artefacts to resolve conﬂicts automatically, and when this information is not sufﬁcient, it applies the usual textual resolution to the conﬂict. Apel et al. [1] in a previous empirical study found that the semistructured approach was promising if compared to the unstructured one. By studying 24 projects using Subversion, a Centralized Version Control System (CVCS), and analysing a total of 180 merge scenarios, they found that, in 60% of their sample merge scenarios, the semistructured approach was able to reduce the number of textual conﬂicts by, on average, 34%. They also found that, in 82% of their sample merge scenarios, the semistructured approach reduced the number of conﬂicting lines of code by, on average, 61%; and that, in 72% of their sample merge scenarios, semistructured merge reduced the number of conﬂicting ﬁles by, on average, 28%. Given the importance of such tools for collaborative software development, here we further investigate Apel’s et al. [1] hypothesis and replicate their study. To possibly expand external validity of the original study, we analyse different systems stored on Git, a Distributed Version Control System (DVCS), since DVCSs have seen an increase in popularity compared to traditional CVCS, and offer extra information that help us to better understand software development processes, such as merge tracking [11, 12]. We then compare semistructured merge to the unstructured one in 3266 merge scenarios from 60 projects, a sample 2.5 times bigger than the original study regarding the number of projects and 18 times bigger regarding the number of merge scenarios.