Munch: An efficient modularisation strategy to assess the degree of refactoring on
sequential source code checkings
Mahir Arzoky, Stephen Swift and Allan Tucker
Department of Information System and Computing
Brunel University
Uxbridge, UK
{mahir.arzoky, stephen.swift,
allan.tucker}@brunel.ac.uk
James Cain
Quantel Limited
Newbury, UK
james.cain@quantel.com
Abstract—Software module clustering is the process of
automatically partitioning the structure of the system using
low-level dependencies in the source code, to improve the
system’s structure. There have been a large number of studies
using the search-based software engineering approach to solve
the software module clustering problem. This paper introduces
the concept of seeding to modularise sequential source code
software versions, in order to measure the degree of
refactoring. We have developed a software clustering tool
called Munch. We evaluated the efficiency of the
modularisation by performing a set of experiments on the
dataset. We initially experimented with few fitness functions
and as a result chose what we believe the most suitable
function EVMD to test on our unique dataset. The results of
the experiments provide evidence to support the seeding
strategy.
Keywords-clustering; modularisation; refactoring; seeding;
time series; fitness functions; EVM.
I. INTRODUCTION
As developers are increasingly creating more
sophisticated applications, software systems are growing in
both their complexity and size. Systems are composed of
entities such as variables and classes, which in turn rely and
interact with each other in complex ways. Systems naturally
continue to evolve and as they evolve, their structure
becomes more complex and harder to track.
Thus, software systems need to be regularly maintained
in order to cope with the constantly evolving requirements.
Maintenance and evolution of systems can be frustrating; as
it is difficult for developers to keep a fixed understanding of
the system’s structure, as structure change during
maintenance. This problem is fuelled by the lack of updated
documentation which is at times non-existent. To add to the
difficulty of undocumented code, many of the original
developers are no longer available to assist with the
development. To maintain such systems is a challenging
task.
Refactoring is one of the most common techniques used
to transform software in order to improve its internal quality
attributes [16] [18]. Refactoring is defined as the change
made to software system which improves the internal
structure of the code while maintaining its external
behaviour [6]. If applied correctly, refactoring can improve
maintainability, enhance performance and simplify the
structure of the code. Nonetheless, both managers and
developers can be hesitant when it comes to using
refactoring due to the amount of effort needed to make even
a slight change in the code and also the risk of introducing
new bugs. Hence, within the development of large software
systems, there is significant value in being able to predict
when refactoring occurs.
Available information in software engineering problems
can be incomplete, vague and susceptible to change. As the
modular structure of software system tends to decay over
time it is important to modularise. Modularisation is the
process of partitioning the structure of the software system
into subsystems. Subsystems are clusters of source code
resources with similar properties combined together to
create a high-level attribute of the system. Modularisation
also makes the problem at hand easier to understand as it
reduces the amount of data needed by developers.
According to Constantine and Yourdon [5] good
modularisation of software systems leads to easier design,
development, testing and maintenance.
Consequently, due to the immense interest of automated
re-modularisation, through search-based software
engineering, fast and effective tools for automated software
module clustering are developed. Automated tools are used
to generate useful information on system structure. These
tools analyse the low-level dependencies in the source code
and cluster them into a set of meaningful subsystem. It is
important to choose the suitable granularity level of
clustering the system at hand.
A range of software modularisation techniques [7] [8]
[11] [13] has been studied. For various search algorithms,
search-based software engineering has shown to be highly
robust.
The input information for modularisation is dependence
information obtained from source code of systems to be
modularised. Mancoridis et. al. [10] have first used a
Module Dependency Graph (MDG) as a representation of
software module clustering problem. MDGs representing
the structure of the software system are formed by
expressing modules of the system as nodes and expressing
the dependence relationship between the modules as edges.
The primary purpose of the paper is to perform efficient
modularisation on a time series of source code relationships
2011 Fourth International Conference on Software Testing, Verification and Validation Workshops
978-0-7695-4345-1/11 $26.00 © 2011 IEEE
DOI 10.1109/ICSTW.2011.87
422