Survey on Clustering Techniques in Data Mining for
Software Engineering
Maninderjit Kaur
#1
, Sushil Kumar Garg
*2
#
Research Scholar (Department of Computer Science and Engineering), RIMT-Institute of Engineering & Technology, Mandi
Gobindgarh, Fatehgarh Sahib, Punjab, India
1
maninderjit91@yahoo.com
*
Principal RIMT-Maharaja Aggrasen Engineering College, Mandi Gobindgarh, Fatehgarh Sahib, Punjab, India
2
sushilgarg70@yahoo.com
Abstract — Quality and reliability of the computer software is
very important. Software development uses a huge amount of
software engineering data. Software Engineering data is the
collection of execution traces, code bases, graphs, bug reports etc.
Software Engineering data is very useful in understanding the
development and working of any product or software. Software
is of high quality and highly reliable if it is error-free. Software is
error-free if there is no bug present in it or it is free from bugs.
Bugs are very hard to find. Software Engineering tasks are
Programming, Testing, Bug Detection, Debugging and
Maintenance. Data Mining Techniques are applied on software
engineering tasks. Data mining techniques are used to mine
software engineering data and extract the meaningful and useful
information. Techniques used for mining software engineering
data are matching, clustering, classification etc.
Keywords— Software Engineering, Data Mining, Software
Engineering Data, Software Engineering Task, Clustering,
Clustering Techniques
I. INTRODUCTION
The advancement in technology is increasing day by day.
This advancement in technology affects the working of
different products or softwares. To support these some
changes or manipulations become necessity. Maintenance of
softwares is becoming very difficult and challenging task.
60% of total life cycle efforts spent on maintenance activities
only as in [21]. Software is highly productive and reliable if
the Programming, Bug Detection, Testing, Debugging and
Maintenance tasks are good as in [12].
Software Engineering Researchers are not expert to develop
a tool or algorithm for data mining. In the same way, Data
Mining Researchers do not understand the mining
requirements in software engineering domain. There is a need
of a close collaboration between both domains so the software
engineering tasks like Programming, Bug Detection, Testing,
Debugging and Maintenance improved. Software engineering
data is available in the form of documentation, source code,
bug databases, mailing history, bug reports execution traces
and graphs as in [2].
Data Mining is the process of finding a small set of
precious information and patterns from large sets of raw
material. Human are better at storing data. Extracting
knowledge from these large datasets is not done in a better
way by humans. It is not easy to understand large datasets and
finding out the valuable and accurate information to create a
good software. Data Mining Process‘s steps are- data
integration, data cleaning, data selection, data transformation,
data mining, pattern evaluation and knowledge presentation.
Software Engineering data is present in vast amount. Different
type of users requires different type of data. All the data is not
meaningful for all the users of software. Every user requires
the data that is meaningful to them. Meaningful data can be
extracted by using different data mining process as in [28].
The data mining process is shown following in Fig.1.
Fig. 1 Data Mining Process [28]
In [27], Mining algorithms fall into four main categories-
Frequent Pattern Mining: In this, commonly occurring
patterns are found.
Pattern Matching: In this, data instances for given
patterns are found.
Clustering: In this, grouping of data into different
clusters is done.
Classification: In this, predicting labels of data based
on already labelled data is done.
In [2], Software Engineering data falls under three
categories-
Sequences: Static execution traces extracted from
source code and dynamic execution traces extracted at
run time.
Text: Documentation, source code, e-mails, bug
reports, code comments and bug databases.
Knowledge
Data
Information
Patterns
Selected
data
Transformed
data
Data
Integration
Data
Cleaning
Data
Transfo-
mation
Data
Selection
Data
Mining
Pattern
Evaluation
International Journal of Advanced and Innovative Research (2278-7844) / # 238 / Volume 3 Issue 4
© 2014 IJAIR. ALL RIGHTS RESERVED 238