International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2014): 5.611 Volume 4 Issue 12, December 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY A Survey Paper on Efficient Approach of Data Reduction Techniques for Bug Triaging System Smita Boharpi 1 , Sonal Fatangare 2 1 M. E. Student Department of Computer Engineering, RMD Sinhgad School of Engineering, Pune, India 2 Assistant Professor, Department of Computer Engineering, RMD Sinhgad School of Engineering, Pune, India Abstract: Bug Triaging is an important part of testing process in software development organizations. It is process of assigning a correct developer for fixing a bug. Software companies spend most of cost in dealing for software bugs. Traditionally in software development, new bugs are manually triaged by expert developer i.e. human traiger. The manual bug triage is expensive in time cost and low in accuracy due to the large number of daily bugs and the lack of expertise of all bugs. To avoid the expensive cost in manual bug triage, an automatic bug triage approach is used to predict developers for bug report. For bug triage data reduction techniques is used to build a small scale and high quality set of bug data by removing bug reports and words which are redundant or non- informative. So applying instance selection with feature selection simultaneously with historical bug data sets. Keywords: Bug Triage, Prediction for Reduction Order, Bug Repositories, data Reduction, Feature Selection, Instance Selection, Word Dimension, Bug Dimension 1. Introduction Data mining technology being used in software development process can not only enhances the accuracy and comprehensiveness of software development but also enhances the credibility of the software. A software bugs is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. Most bugs arise from mistakes and errors made by people in either a program's source code or its design, or in frameworks and operating systems used by such programs, and a few are caused by compilers producing incorrect code. Reports detailing bugs in a program are commonly called as bug reports, defect reports, fault reports, problem reports, and trouble reports [3]. MINING software repositories is main domain which has aims to employ data mining to deal with software engineering problems. Software repositories are large-scale databases for storing the output of software development, e.g., source code, bugs, emails, and specifications. Traditionally for large-scale and complex data in software repositories, software analysis is not completely suitable. So data mining techniques, mining software repositories can uncover interesting information in software repositories and solve real world software problems. A bug repository is also called software repository which is used storing details of bugs. The bug triage is inevitable steps for fixing a bug which are correctly assigning a developer to new bug. For open source large-scale software projects, the number of daily bugs is so large which makes the triaging process very difficult and challenging. Software companies spend over 45 percent of cost in fixing bugs. In a bug repository, a bug is maintained as a bug report, which records the textual description of reproducing the bug and updates according to the status of bug fixing. It also provides data platform to support many types of tasks on bugs, e.g., fault prediction, bug localization and reopened bug analysis. In bug repository, bug reports are called bug data. In software development tasks, there are two challenges related to bug data that may affect on bug repositories that are the large scale and the low quality. In large scale, daily-reported bugs large number of new bugs are stored in bug repositories. And low quality bugs noise and redundancy. Noisy bugs may mislead related developers while redundant bugs waste the limited time of bug handling suffer from the low quality of bug data [1]. The main aim of data reduction for bug triage to build a small scale and high quality set of bug data by removing bug reports and words which are redundant or non-informative. So instance selection and feature selection techniques are simultaneously used to reduce the bug dimension and the word dimension. The reduced bug data contain fewer bug reports and fewer words than original bug data. And they also provide similar information than original bug data. The instance selection means subset of relevant instances i.e. bug report in bug data and the feature selection means subset of relevant features i.e. words in bug data. 2. Existing System C.Sun,D.Lo,S.C.Khoo and J.Jiang, [3] used a bug tracking system, different testers or users may submit multiple reports on the same bugs, referred to as duplicates, which may cost extra maintenance efforts in triaging and fixing bugs. In order to identify such duplicates accurately, in this paper propose a retrieval function (REP) to measure the similarity between two bug reports. It fully utilizes the information available in a bug report including not only the similarity of textual content in summary and description fields, but also similarity of non- textual fields such as product, component, version, etc. The drawbacks of that system are there is no indexing structure of bug report repository to speed up the retrieval process. T. M. Khoshgoftaar, K. Gao, and N. Seliya [7] purpose attribute selection and imbalanced data: Problems in software defect prediction. To handle imbalanced defect data. Paper ID: NOV152369 1900