1 Vijaya Bharathi Manjeti, Sireesha Rodda International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Issue 1 January 2015 Associative Classifier for Software Fault Tolerance in presence of Class Imbalance Vijaya Bharathi Manjeti GMRIT Institute Technology Rajam Srikakulam Sireesha Rodda GITAM Instittue of Technology GITAM University Visakhapatnam ABSTRACT Software fault prediction is crucial in reducing the overall cost for developing a software product and also to assure the quality of the finished software product. Different software quality models based on data mining techniques are in existence for identifying software-fault prone modules. However, the presence of class imbalance problem reduces the overall quality of the developed software product. This paper addresses the effects of class imbalance on the classification algorithms intended to perform software-fault prediction. An ensemble-based classifier is proposed to mitigate the effects of class imbalance. This classifier learns defect prediction efficiently as demonstrated in the results. Keywords- Software defect prediction, class imbalance learning, ensemble classifiers. 1.INTRODUCTION Presence of software faults can turn out to be expensive during software development in terms of quality and cost [1]. The conventional process of manual software reviews and testing activities can only detect 60% of the faults [2]. Menzies et. al. [3] found defect predictors can increase the probability of detection to 71%. Various Machine learning and statistical approaches have been investigated for Software Defect Prediction. Classification is a popular option for performing software defect prediction. The classification algorithm categorizes which module is more prone to defects based on the classifier developed from existing data culled from previous development projects. Association Mining (AM)[4] refers to the task of finding the complete set of frequent itemsets from which class association rules are generated based on their association with the pertinent class labels. Associative Classification[5] deals with the set of features as itemsets and applies Association Mining techniques to discover set of frequent itemsets that occur in the training dataset based on a user specified minimum support threshold. An associative classifier uses the Class Association Rules (CARs) generated by Association Mining to predict the class label of an unseen instance. Once the classification model is built using CARs, it is evaluated on the test data. It has been shown that Associative Classifiers show better performance than other classifiers. The rules generated by the classifier are understandable to the human user. Software Defect Prediction features an imbalance between defect and non-defect class labels of the dataset. Generally, the number of non-defect samples (majority class) is much more than that of defective ones(minority class). Imbalanced distribution of data contributes to for the poor performance of the classifier, negatively effecting the classification of defective samples. Arunasalem et.al.in their paper[6], prove that accuracy is not a suitable metric for evaluating the efficiency of a classifier, particularly when it concerns imbalanced data. They also prove that support and confidence framework is biased towards the majority class. Presence of class imbalance in Software Defect Prediction demands for more importance to the identification of minority class elements even at the cost of accuracy. Therefore, specialized