1 MetricMiner: Supporting Researchers in Mining Software Repositories Francisco Zigmund Sokol, Mauricio Finavaro Aniche, Marco Aurélio Gerosa Department of Computer Science University of Sao Paulo (USP) - Brazil E-mail: francisco.sokol@usp.br, {aniche, gerosa}@ime.usp.br Abstract—Researchers use mining software repository (MSR) techniques for studying software engineering empirically, by means of analysis of artifacts, such as source code, version control systems metadata, etc. However, to conduct a study using these techniques, researchers usually spend time collecting data and developing a complex infrastructure, which demands disk space and processing time. In this paper, we present MetricMiner, a web application aimed to support researchers in some steps of mining software repositories, such as metrics calculation, data extraction, and statistical inference. The tool also contains data ready to be analyzed, saving time and computational resources. Index Terms—mining software repositories; supporting tool; code metrics I. I NTRODUCTION Techniques of mining software repositories enables re- searchers to study software engineering practices empirically. Practitioners by means of these techniques uncover useful information for the software development team, such as fre- quently changed or error-prone classes, or the identification of core developers in order to transfer knowledge. With this information exposed, teams can take actions to improve their code and processes. To develop a study in the area, researchers need to gather large amounts of data sometimes from many different projects and store them in their own workstations or servers. Then, manually run code metrics, and perform statistical calcula- tions. This process requires the installation of several tools and libraries, making the process complex and slow. Besides the complexity, this kind of research consumes many computa- tional resources. To start with, the repositories download uses a reasonable amount of bandwidth. After being processed and persisted in a database, the data occupies a huge disk volume. To calculate metrics on a lot of artifacts a large amount of processing time is required. Finally, after all these steps, it is possible to extract information, and evaluate them by means of statistical analysis. It means that researchers spend a lot of time working on the tools, rather than in analyzing the data and interpreting the results. Based on all these difficulties, we decided to develop MetricMiner, a web application that performs all these steps without requiring great effort from the researcher. With it, re- searchers can write new metrics, and extract information from a reasonable quantity of different projects. In this paper, we present the tool, its functionalities, and architecture decisions. We also present a replication study that was developed using the tool. II. METRICMINER:AWEB APPLICATION TO SUPPORT RESEARCH IN MSR Understanding the process of software evolution is a hard task. Large systems tend to have a long development history, with many different developers working on different parts of the system. It is common that no developers know all source code of the project. Because of that, the idea of a manual analysis of all software is impracticable. Mining Software Repositories (MSR) analyses the software evolution in an automated way, through the application of data mining techniques into the development history data. Studies in this field reveal useful information to the development of a particular project or even find patterns in software evolution that can be generalized to other software systems. The term "software repository" comprises all artifacts cre- ated during the development of a software system. From source code files that are stored in a source control manager, such as Git or SVN, to messages that developers exchange in mailing lists. Such repositories contain useful information, which can be explored to comprehend the software evolution and contribute to its development. MetricMiner is a web application that aims to support researchers when working with mining software repositories. As mentioned before, when a researcher needs to do a study like that, s/he needs to install many different tools, libraries, etc., and spend many computational resources. As a web application, MetricMiner makes use of the power of cloud computing to scale. This way, researchers do not have to worry about resources. Currently, MetricMiner is running over a cloud infrastructure and it is currently available at http://metricminer.org.br/. MetricMiner was based on rEvolution 1 , a command-line tool that extracts data from a local repository and persists them in a database. rEvolution was limited to collect data from just one project, requiring researchers to execute it manually for each repository. In addition, the configuration of the tool was complex. It was necessary to configure database, source control tools, and all external applications that the tool uses in a single XML file. 1 http://github.com/mauricioaniche/rEvolution.