Cluster Comput (2017) 20:2267–2281 DOI 10.1007/s10586-017-0892-6 A parallel framework for software defect detection and metric selection on cloud computing Md Mohsin Ali 1 · Shamsul Huda 2 · Jemal Abawajy 2 · Sultan Alyahya 3 · Hmood Al-Dossari 3 · John Yearwood 2 Received: 23 October 2016 / Revised: 13 March 2017 / Accepted: 27 April 2017 / Published online: 24 May 2017 © Springer Science+Business Media New York 2017 Abstract With the continued growth of Internet of Things (IoT) and its convergence with the cloud, numerous inter- operable software are being developed for cloud. Therefore, there is a growing demand to maintain a better quality of soft- ware in the cloud for improved service. This is more crucial as the cloud environment is growing fast towards a hybrid model; a combination of public and private cloud model. Considering the high volume of the available software as a service (SaaS) in the cloud, identification of non-standard software and measuring their quality in the SaaS is an urgent issue. Manual testing and determination of the quality of the software is very expensive and impossible to accomplish it to some extent. An automated software defect detection model that is capable to measure the relative quality of software and identify their faulty components can significantly reduce both the software development effort and can improve the cloud service. In this paper, we propose a software defect detec- tion model that can be used to identify faulty components B Shamsul Huda shamusl.huda@deakin.edu.au Md Mohsin Ali mohsin.ali@anu.edu.au Jemal Abawajy jemal.abawajy@deakin.edu.au Sultan Alyahya sualyahya@ksu.edu.sa Hmood Al-Dossari hzaldossari@ksu.edu.sa John Yearwood john.yearwood@deakin.edu.au 1 The Australian National University, Canberra, Australia 2 Deakin University, Melbourne, Australia 3 King Saud University, Riyadh, Saudi Arabia in big software metric data. The novelty of our proposed approach is that it can identify significant metrics using a combination of different filters and wrapper techniques. One of the important contributions of the proposed approach is that we designed and evaluated a parallel framework of a hybrid software defect predictor in order to deal with big software metric data in a computationally efficient way for cloud environment. Two different hybrids have been devel- oped using Fisher and Maximum Relevance (MR) filters with a Artificial Neural Network (ANN) based wrapper in the parallel framework. The evaluations are performed with real defect-prone software datasets for all parallel versions. Experimental results show that the proposed parallel hybrid framework achieves a significant computational speedup on a computer cluster with a higher defect prediction accuracy and smaller number of software metrics compared to the indepen- dent filter or wrapper approaches. 1 Introduction Due to the rapid development of cloud computing, the size and complexity of cloud based software products are con- tinually increasing. With the advent of Internet of Things (IoT) and its convergence with cloud, the functionalities and requirements of cloud based software products are also increasing. This poses more challenges to the cloud based business organization to develop high quality software prod- ucts. Thus, determination of the quality of the software product and maintaining their quality are very important and challenging due the exponential growth of overall complex- ity. Considering the importance of tackling this challenge, software industries are spending around 1/4th of their bud- get for quality assurance and testing [4]. 123