Extracting Corrective Actions from Code Repositories Yegor Bugayenko Huawei Russia yegor256@huawei.com Kirill Daniakin Mirko Farina Firas Jolha Artem Kruglov Giancarlo Succi * g.succi@innopolis.ru Innopolis University Russia Witold Pedrycz University of Alberta Canada wpedrycz@ualberta.ca ABSTRACT Simple detection of bugs, defects or anomalies during software development is not enough - it is necessary to apply corrective actions to eliminate them. To fnd out whether an anomaly exists in any software, we can measure the quality attributes using software metrics. The main goal of this paper was to fnd out and explain how to meaningfully attribute metrics to useful corrective actions. CCS CONCEPTS Information systems Data management systems;• Soft- ware and its engineering Software development process management. KEYWORDS software metrics, repository, information retrieval, anomaly, cor- rective action, project management ACM Reference Format: Yegor Bugayenko, Kirill Daniakin, Mirko Farina, Firas Jolha, Artem Kruglov, Giancarlo Succi, and Witold Pedrycz. 2022. Extracting Corrective Actions from Code Repositories. In 19th International Conference on Mining Software Repositories (MSR ’22), May 23–24, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3524842.3528517 1 INTRODUCTION Since the establishment of TPS [3], the problem of measuring and improving the quality and performance of various production en- vironments has been the subject of intense research. In software engineering, it gave rise to a number of frameworks for managing the development process, such as COBIT, ITIL, and CMMI [1, 2]. Such frameworks provide grounds for achieving specifc business goals within the project; however, none of them describes prac- tices of the situational management. In other words, how to relate * corresponding author. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. MSR ’22, May 23–24, 2022, Pittsburgh, PA, USA © 2022 Association for Computing Machinery. ACM ISBN 978-1-4503-9303-4/22/05. . . $15.00 https://doi.org/10.1145/3524842.3528517 anomalies and metrics of the project with specifc preventive and/or corrective actions. 2 ANALYSIS OF PREVIOUS FINDINGS We conducted an exploratory search and found seven papers on this topic, which we subsequently reviewed (for details please see: https://bit.ly/3Bn148x) The papers we reviewed specifed corrective actions for quanti- fed anomalies detected in software systems. Across these papers, we individuated 90 anomalies and 140 possible corrective actions. In this context, it is worth noting that only a bunch of the papers we analysed drafted out connections between metrics and anomalies. To overcome this limitation, we manually clustered (by similarity) the anomalies and the corrective actions we collected. Subsequently, we related them to key process areas of CMMI and organized the metrics into three main categories (process, product, and resources). Finally, we connected the metrics to specifc anomalies when such a connection was specifed. The results 1 demonstrated that several important areas within the CMMI have been previously overlooked, and—as a consequence—that the connections between metrics and anomalies have not been fully comprehended. 3 INDUSTRIAL PROBLEM This instructive failure prompted us to speculate that it might be possible to relate all other anomalies and associated preventive and corrective actions by using Machine Learning (ML) techniques [4, 5] with the intent of producing an advisory system, deployable during all phases of software development, capable of enriching and improving project management processes. 4 OUR RESULTS To achieve this goal we have been performing targeted interviews with senior Huawei managers. We also tried to identify anomalies in an unsupervised way, by using multiple clustering techniques on several software metrics collected from several repositories. We did succeed in identifying anomalies as cluster of abnormal values of metrics and used statistical tests (such as T-test and F-test) to ascertain the contribution of diferent metrics in predicting the emergence of specifc anomalies. If, our preliminary fndings will be confrmed, we may be able to build an ML model capable of identifying anomalies and then auto- matically recommending certain corrective and preventive actions. 1 https://github.com/fras-jolha/CAPA 687 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR) Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on June 30,2022 at 06:50:58 UTC from IEEE Xplore. Restrictions apply.