International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-9 Issue-5, March 2020
343
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: D1858029420/2020©BEIESP
DOI: 10.35940/ijitee.D1858.039520
Software Defect Prediction Via Deep Learning
Rehan Ullah Khan, Saleh Albahli, Waleed Albattah, Mohammad Nazrul Islam Khan
Abstract: Existing models on defect prediction are trained
on historical limited data which has been studied from a variety
of pioneering and researchers. Cross-project defect prediction,
which is often reuse data from other projects, works well when
the data of training models is completely sufficient to meet the
project demands. However, current studies on software defect
prediction require some degree of heterogeneity of metric values
that does not always lead to accurate predictions. Inspired by the
current research studies, this paper takes the benefit with the
state-of-the-art of deep learning and random forest to perform
various experiments using five different datasets. Our model is
ideal for predicting of defects with 90% accuracy using 10-fold
cross-validation. The achieved results show that Random Forest
and Deep learning are giving more accurate predictions with
compared to Bayes network and SVM on all five datasets. We
also derived Deep Learning that can be competitive classifiers
and provide more robust for detecting defect prediction.
Keywords : Defect prediction; Deep Learning; Software
repository mining; Cross-Project; Class imbalance.
I. INTRODUCTION
Reducing defects and number of failures in software
products is an important goal for software engineers. This is
done in order to achieve maximum performance, build the
trust of users and enhance the overall quality of the product.
During the life cycle of a product, a software goes through
several feature changes, quality iterations and reassembling.
Ideally, all these changes are perfectly merged, should cause
no defect and are free of error. However, technically these
changes sometimes induce the defect in an already working
product, known as defect inducing changes. So, a “defect -
inducing-change” can be described as a type of software
change (single commit or multiple iterations in a specific
period of time), which may cause one or numerous faults or
defects in the software’s source code
Jist-In-Time (JIT) defect prediction is of more practical
value compared with traditional defect predictions at
module. The JIT was coined by the Kamei et al.[35] who put
forward a method of checking the error based on raw metric
which not only predicts the error out from the line of code
under inspection, but also highlights the latent defect which
can be detected at the check in time unlike other effort-
aware detection method. This method also reduces the
tedious task of finding the author of the code as many people
are involved over a module and doing the inspection at the
check in time , where the change details are still fresh in
mind, help make the debug very easy.
Revised Manuscript Received on February 06, 2020.
Rehan Ullah Khan, Department of Information Technology, College
of Computer, Qassim University, Saudi Arabia
Saleh Albahli, Department of Information Technology, College of
Computer, Qassim University, Saudi Arabia
Waleed Albattah, Department of Information Technology, College of
Computer, Qassim University, Saudi Arabia
Mohammad Nazrul Islam Khan, Department of Computer
Engineering, College of Computer, Qassim University, Saudi Arabia
Therefore, in JIT defect predictions, it is easy to find such
a developer to inspect the predicted defect-prone change, as
each change is associated with a particular developer. Kim et
al. [36] used numerous features extracted from various
sources such as change metadata, source code, and change
log messages to build prediction models to predict defect-
inducing changes. Their results showed that defect-inducing
changes can be predicted at 60% recall and 61% precision
on average.
However, there is much work available on the JIT effort
aware system by using the traditional file, package or
method level for the defect prediction [22-24].as well as
supervised machine learning methods and unsupervised
learning methods. Still, there is a huge gap in accuracy, and
false prediction.. Therefore, it is necessary to have state-of-
the-art supervised, unsupervised or deep learning methods
that can reduce the accuracy gap and can provide efficient
predictions, which are precise and timely. Hence, the basic
objective of this work is to cope with the challenges of JIT
prediction, and propose a technique , which is highly
efficient in terms of results and preciseness. In this paper, we
performed various experiments using five different datasets
for the prediction of defects using a state-of-the-art fusion
approach of deep learning method with the Random Forest
algorithm, which help predicting of defect with 90%
accuracy using 10-fold cross-validation. Thus, our model
can reduce the error level in accuracy and avoid false
prediction as the data grows.
The rest of the paper is organized as follows. Section 2
reviews up-to-date literature on Just-in-time software defect
prediction as well as the benefit of using deep learning in
software engineering. In section 3, we have reported the
approach used in our experiments. Section 4 presents our
proposed model with its experimental analysis, evaluation
results and experimental discussions. Finally, conclusions
and perspectives are given in Section 5.
II. BACKGROUND AND RELATED WORK
Just-in-time software defect prediction (JIT-SDP) has
valuable vicinity in software defect prediction because it
provides to identify defect-inducing changes. Yang at el. [1]
have compared the performance of local and global models
through a large-scale empirical study based on six open-
source projects with 227417 changes in the context of JIT-
SDP. Local models have significantly better effort-aware
prediction performance than global models in the cross-
validation and cross-project-validation scenarios. Therefore,
local models are promising for effort-aware JIT-SDP. Xiang
at el. have proposed a multi-objective optimization based
supervised method MULTI to build JIT-SDP models [4].
MULTI can perform significantly better than the state-of-