Understanding Class Evolution in Object-Oriented Software
Zhenchang Xing and Eleni Stroulia
Computing Science Department
University of Alberta
Edmonton AB, T6G 2H1, Canada
{xing, stroulia}@cs.ualberta.ca
Abstract
In the context of object-oriented design, software
systems model real-world entities abstractly represented
in the system classes. As the system evolves through its
lifecycle, its class design also evolves. Thus,
understanding class evolution is essential in
understanding the current design of the system and the
rationale behind its evolution. In this paper, we describe
a taxonomy of class-evolution profiles, a method for
automatically categorizing a system's classes in one (or
more) of eight types in the taxonomy, and a data-mining
method for eliciting co-evolution relations among them.
These methods rely on our UMLDiff algorithm that,
given a sequence of UML class models of a system,
surfaces the design-level changes over its lifecycle. The
recovered knowledge about class evolution facilitates the
overall understanding of the system class-design
evolution and the identification of the specific classes
that should be investigated in more detail towards
improving the system-design qualities. We report on two
case studies evaluating our approach.
1 Motivation and Background
The objective of reverse engineering is most often to
enable software understanding in support of maintenance,
feature enhancement and adaptation activities [5]. In
object-oriented systems, classes model abstractions of
real-world entities around which these systems are
designed. Therefore, understanding the system classes,
i.e., their internal structure and their role in the context of
the system functionality and behavior, constitutes a
crucial step towards understanding the overall system
design for both maintenance and new development.
There have been several research efforts to date
aiming at understanding systems at the class level. For
example, Lanza et al. [17] introduced the “class
blueprint”, a visualization of the internal structure of
system classes at a particular point in their lifecycle. The
class blueprint distinguishes among different types of
classes, such as classes with wide interfaces that offer
many entry points to their functionalities, definers that
reside at the top of a hierarchy or specializers that are
leaves of an hierarchy, etc. However, such visualizations
require a substantial interpretation effort on behalf of
their users and become fairly “unreadable” for large
systems with numerous classes.
Furthermore, given that most software development
nowadays adopts an evolutionary lifecycle model,
analyzing a single snapshot of a system’s classes enables
only limited insight; a comparative analysis of a
sequence of snapshots should be more valuable in
understanding the system’s design rationale. For
example, consider a software maintainer who wants to
identify “hotspots”, i.e., areas of substantial evolutionary
activity, over the lifespan of a software system. By
comparing a set of subsequent versions, he may find out
that a few classes have been substantially changed in
every new version, irrespective of what features were
modified in this version. This evidence of highly coupled
design may focus his examination into the source code of
these classes to determine the cause of problem and to
propose modifications to remedy it.
Such evolutionary analysis was the objective of
Demeyer et al. [7], which investigated the use of
comparative analysis of software metrics for drawing
inferences regarding the evolution of a system. However,
the result of their analysis refers to the system as a whole
and does not provide any insight regarding the evolution
of individual or groups of classes.
Another, potentially more precise, source of
evolutionary information could be documentation, either
at the source-code level or at the change-log level of the
version-management system used for the development of
the software system. Unfortunately, more frequently than
not, such documentation is sparse and inconsistent [5,13].
In our work on understanding class evolution in
object-oriented systems, we have adopted class models
of subsequent system snapshots (which may be released
versions or simply snapshots checked-out in regular time
intervals) as the primary input of our method. These
class models are easily obtainable, given the source code
that resides in a version-management system and any of
a variety of existing round-trip software-development
tools [29,30], and they are, by their very nature, fairly
accurate representations of the source [19]. The
fundamental intuition underlying our method is that by
Proceedings of the 12th IEEE International Workshop on Program Comprehension (IWPC’04)
1092-8138/04 $ 20.00 © 2004 IEEE