Understanding the Effects of Code Clones on Modularity in Software Systems
Liguo Yu
Computer Science and Informatics
Indiana University South Bend
South Bend, IN, USA
ligyu@iusb.edu
S. Ramaswamy, A. Vaidyanathan
Industrial Software Systems
ABB Corporate Research Center
Bangalore, India
srini@ieee.org
Abstract—Modularity is an important software design principle.
One key point in the design of high quality software products is
avoiding code clones, i.e., a portion of source code that is identical
or similar to another. During the software evolution process, new
code segments are frequently added. It is common to see that
code clones accrue with the release of new versions of the
product. Such accruement could be more serious in systems
software, where new code segments are associated with similar
functions or similar drivers. In this paper, we study code clones
of two versions of the Linux kernel from the viewpoint of
modularity. Our investigation finds that although quite some
effort has been spent in Linux to remove some code clones, more
code clones are typically added with the release of new versions.
This has become a major issue that is potentially causing
degradation of the modularity design principle within the Linux
kernel.
Keywords-code clone, modularity, system software, Linux
I. INTRODUCTION
Modularity is a measure of the extent to which software
systems are composed of disparate, substitutable modules;
whereby each of which accomplishes a single functionality. In
principle, modularity can be classified into functional
modularity and architectural modularity. Functional modularity
separates different functions from each other in different parts
of the source code. Functional modularity enables high
cohesion and low coupling by breaking the source code in to
independent non-blocking modules. The principle of
architectural modularity recommends dividing the software
system in to separate layers with separate concerns for each of
the layers. In software systems design, modularity is a widely
accepted design principle [1] [2]. It has been considered
together with hierarchical structure and interaction locality as
the three basic rules governing the evolution of complex
systems [3] [4]. In software systems, to achieve high
understandability, maintainability, and reusability, the entire
system should be decomposable into manageable components,
such as libraries, functions, classes, and aspects. One objective
of these different abstractions is to reduce source code
redundancy. For example, a library function or an abstract class
shall be consumed and implemented by the client components.
When a software system is initially designed and
developed, the underlying architectural concepts might
advocate the principle of modularity. However, as it evolves to
accommodate new and emergent functional and nonfunctional
requirements, the modularity principle might be compromised,
i.e., the existing modular design might be altered or new non-
modular code might be generated. For example, to evolve an
improvised protocol handler, or augment a new feature to an
existing driver, a programmer might inadvertently add new
function code that is same or similar to other functions, thereby
creating redundant functions on this particular code segment.
While the adoption of say, the strategy design pattern could
make the source code extensible, without the need to add
redundant functionality, such practices are often adopted only
by seasoned software developers. Ensuring a maintainable
codebase through modular design often is a causality of the
pressures of getting to the market. Although the code may be
thoroughly tested and correct from the implementation
viewpoint, this is not a good practice from the design viewpoint
because of such duplication and redundancy of functions, or
classes.
Such duplication or redundancy of code segments in a
program is called code clone, in general. Formally, a code
clone is defined as a portion of code in a source file that is
identical or similar to another [5]. As described before, code
clones are often introduced through a software evolution
process, where an original portion of code is copied and pasted
within the same file, or to different files [6] [7]. Due to the
pressures of reaching the market, such development practices
are still rampant in software development organizations across
the world.
Code clones defy the design principle of modularity and it
makes it difficult to maintain and test programs effectively. On
the one hand, if a function has a bug, such practices could
result in deep proliferations of this bug throughout the system.
The issue could become more serious if this function has
mutated through the evolution process: even if we later detect
the bug in the original function, it is hard to locate and fix all
these mutated functions. The reverse, where in a bug is found
in a mutated version, and back traced to the original code base,
but it can still be a difficult task to track all the other mutated
code bases. Furthermore, code clones can result the
unnecessary ‘bulging’ of source code, which would require
additional time to build, large memory to run, more effort to
maintain, and significant costs to upgrade the system.
Therefore, as software programs evolve over time, they might
suffer from a degradation of modularity due to the accruement
of code clones [8].
Nevertheless, code clones are common in evolving software
systems, especially in systems software. Because functions
provided by system software generally follow similar
algorithms and have similar solutions, copy and paste might be
considered as a normal development routine in implementing
2012 19th Asia-Pacific Software Engineering Conference
1530-1362/12 $26.00 © 2012 IEEE
DOI 10.1109/APSEC.2012.49
105