Survey on Clustering Techniques in Data Mining for Software Engineering Maninderjit Kaur #1 , Sushil Kumar Garg *2 # Research Scholar (Department of Computer Science and Engineering), RIMT-Institute of Engineering & Technology, Mandi Gobindgarh, Fatehgarh Sahib, Punjab, India 1 maninderjit91@yahoo.com * Principal RIMT-Maharaja Aggrasen Engineering College, Mandi Gobindgarh, Fatehgarh Sahib, Punjab, India 2 sushilgarg70@yahoo.com Abstract — Quality and reliability of the computer software is very important. Software development uses a huge amount of software engineering data. Software Engineering data is the collection of execution traces, code bases, graphs, bug reports etc. Software Engineering data is very useful in understanding the development and working of any product or software. Software is of high quality and highly reliable if it is error-free. Software is error-free if there is no bug present in it or it is free from bugs. Bugs are very hard to find. Software Engineering tasks are Programming, Testing, Bug Detection, Debugging and Maintenance. Data Mining Techniques are applied on software engineering tasks. Data mining techniques are used to mine software engineering data and extract the meaningful and useful information. Techniques used for mining software engineering data are matching, clustering, classification etc. Keywords— Software Engineering, Data Mining, Software Engineering Data, Software Engineering Task, Clustering, Clustering Techniques I. INTRODUCTION The advancement in technology is increasing day by day. This advancement in technology affects the working of different products or softwares. To support these some changes or manipulations become necessity. Maintenance of softwares is becoming very difficult and challenging task. 60% of total life cycle efforts spent on maintenance activities only as in [21]. Software is highly productive and reliable if the Programming, Bug Detection, Testing, Debugging and Maintenance tasks are good as in [12]. Software Engineering Researchers are not expert to develop a tool or algorithm for data mining. In the same way, Data Mining Researchers do not understand the mining requirements in software engineering domain. There is a need of a close collaboration between both domains so the software engineering tasks like Programming, Bug Detection, Testing, Debugging and Maintenance improved. Software engineering data is available in the form of documentation, source code, bug databases, mailing history, bug reports execution traces and graphs as in [2]. Data Mining is the process of finding a small set of precious information and patterns from large sets of raw material. Human are better at storing data. Extracting knowledge from these large datasets is not done in a better way by humans. It is not easy to understand large datasets and finding out the valuable and accurate information to create a good software. Data Mining Process‘s steps are- data integration, data cleaning, data selection, data transformation, data mining, pattern evaluation and knowledge presentation. Software Engineering data is present in vast amount. Different type of users requires different type of data. All the data is not meaningful for all the users of software. Every user requires the data that is meaningful to them. Meaningful data can be extracted by using different data mining process as in [28]. The data mining process is shown following in Fig.1. Fig. 1 Data Mining Process [28] In [27], Mining algorithms fall into four main categories-  Frequent Pattern Mining: In this, commonly occurring patterns are found.  Pattern Matching: In this, data instances for given patterns are found.  Clustering: In this, grouping of data into different clusters is done.  Classification: In this, predicting labels of data based on already labelled data is done. In [2], Software Engineering data falls under three categories-  Sequences: Static execution traces extracted from source code and dynamic execution traces extracted at run time.  Text: Documentation, source code, e-mails, bug reports, code comments and bug databases. Knowledge Data Information Patterns Selected data Transformed data Data Integration Data Cleaning Data Transfo- mation Data Selection Data Mining Pattern Evaluation International Journal of Advanced and Innovative Research (2278-7844) / # 238 / Volume 3 Issue 4 © 2014 IJAIR. ALL RIGHTS RESERVED 238