On the Impact of Refactoring Operations on Code Naturalness Bin Lin, Csaba Nagy, Gabriele Bavota, and Michele Lanza Software Institute — Università della Svizzera italiana (USI), Switzerland Abstract—Recent studies have demonstrated that software is natural, that is, its source code is highly repetitive and predictable like human languages. Also, previous studies suggested the exis- tence of a relationship between code quality and its naturalness, presenting empirical evidence showing that buggy code is “less natural” than non-buggy code. We conjecture that this quality- naturalness relationship could be exploited to support refactoring activities (e.g., to locate source code areas in need of refactoring). We perform a first step in this direction by analyzing whether refactoring can improve the naturalness of code. We use state-of-the-art tools to mine a large dataset of refac- toring operations performed in open source systems. Then, we investigate the impact of different types of refactoring operations on the naturalness of the impacted code. We found that (i) code refactoring does not necessarily increase the naturalness of the refactored code; and (ii) the impact on the code naturalness strongly depends on the type of refactoring operations. Index Terms—Naturalness, Refactoring, Open Source Software I. I NTRODUCTION Software is not unique. Researchers have discovered that for sequences of six tokens extracted from the source code, the probability of finding the same sequence in other software projects is higher than 50% [1]. Based on this finding, Hindle et al. [2] introduced the concept of source code “naturalness”, to indicate that source code is highly repetitive and predictable, just like a text written in human language. They showed that this characteristic can be captured by statistical language models and can be leveraged for different software engineering tasks, such as code completion [3] and fault localization [4]. The latter application proposed by Ray et al. was possible thanks to the finding that buggy code is less natural (i.e., less predictable) than correct code [4]. One interesting unanswered question is whether software refactoring (i.e., the activity of improving code quality without modifying the system’s external behavior) can be seen as a process implicitly aiming at improving code naturalness. Intu- itively, we might think the source code is easier to maintain if it is more natural, as there are fewer “surprising” and “unfamiliar” code fragments for developers. Thus, it can be conjectured that developers focus their refactoring attentions on code exhibiting low naturalness. If such a conjecture is confirmed, information about the naturalness of code components could be leveraged to support refactoring operations (e.g., by identifying code components in need of refactoring). We perform a first step in that direction by investigating whether refactoring operations applied by software developers result in an improvement of the code naturalness. We use RMINER [5], a state-of-the-art refactoring miner tool, to mine 1,448 real refactoring operations performed by software developers in 619 open source projects. These operations cover 10 different refactoring types (e.g., move method, extract class). Once these operations are collected, we employ the statistical language model proposed by Tu et al. [3] to measure the naturalness of the code components before and after the refactoring. This allows us to verify whether different types of refactoring operations improve the code naturalness. Our results show that the impact on the code naturalness strongly depends on the specific type of refactoring operation. For example, “Extract Method” refactoring is more likely to increase the code naturalness, while “Pull Up Method” refactoring often leads to lower naturalness. These results suggest that leveraging code naturalness for identification of refactoring opportunities is far from trivial, and highlight the need for additional investigations in this direction. II. RELATED WORK The naturalness of software has received considerable attention in the software engineering research community. After the seminal work by Hindle et al. [2], several studies have investigated the code naturalness from different perspectives. Tu et al. [3] found that the distribution of repetitive code is highly skewed in the source code. Lin et al. [6] disclosed that different parts of source code are not equally repetitive. Researchers have also studied the relation between natu- ralness and software defects. Campbell et al. [7] found that syntax errors are less natural than other code, and this fact can be used to augment compilers’ ability to locate missing and extra tokens. Ray et al. [4] evaluated the naturalness of buggy code and the corresponding fixes by analyzing over 8,000 fix commits from 10 Java projects. Their results showed that buggy code is less natural, and the naturalness increases once the bug is fixed. They also showed that focusing on unnatural code is cost-effective in finding bugs compared to other state-of-the-art static bug finders. The most relevant work is the study conducted by Arima et al. [8], which uses code naturalness as a metric to evaluate whether a refactoring operation is effective. With the assumption that appropriate refactoring should raise the code naturalness, the authors constructed a gold set of 28 refactoring operations extracted from JUnit4 1 by searching for the keywords “refactor” and “clean” in commit logs and manually filtering out 1 https://github.com/junit-team/junit4