CodeCompass: An Open Sofware Comprehension Framework
for Industrial Usage
Zoltán Porkoláb, Tibor Brunner
Eötvös Loránd University
Budapest, Hungary
[gsd,bruntib]@caesar.elte.hu
Dániel Krupp, Márton Csordás
Ericsson Hungary Ltd.
Budapest, Hungary
[daniel.krupp,marton.csordas]@ericsson.com
ABSTRACT
CodeCompass is an open source LLVM/Clang-based tool developed
by Ericsson Ltd. and Eötvös Loránd University, Budapest to help
the understanding of large legacy software systems. Based on the
LLVM/Clang compiler infrastructure, CodeCompass gives exact
information on complex C/C++ language elements like overload-
ing, inheritance, the usage of variables and types, possible uses
of function pointers and virtual functions - features that various
existing tools support only partially. Steensgaard’s and Andersen’s
pointer analysis algorithms are used to compute and visualize the
use of pointers/references. The wide range of interactive visual-
izations extends further than the usual class and function call dia-
grams; architectural, component and interface diagrams are a few
of the implemented graphs. To make comprehension more exten-
sive, CodeCompass also utilizes build information to explore the
system architecture as well as version control information.
CodeCompass is regularly used by hundreds of designers and
developers. Having a web-based, pluginable, extensible architecture,
the CodeCompass framework can be an open platform to further
code comprehension, static analysis and software metrics eforts.
The source code and a tutorial is publicly available on GitHub, and
a live demo is also available online.
KEYWORDS
code comprehension, C/C++ programming language, software vi-
sualization
ACM Reference Format:
Zoltán Porkoláb, Tibor Brunner and Dániel Krupp, Márton Csordás. 2018.
CodeCompass: An Open Software Comprehension Framework for Industrial
Usage. In ICPC ’18: 26th IEEE/ACM International Conference on Program
Comprehension , May 27ś28, 2018, Gothenburg, Sweden. ACM, New York, NY,
USA, Article 4, 9 pages. https://doi.org/10.1145/3196321.3197546
1 INTRODUCTION
The maintenance of large, long-existing legacy systems is trouble-
some. During the extended lifetime of a system the code quality
is continuously eroding, the original intentions are lost due to
the fuctuation among the developers, and the documentation is
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
ICPC ’18, May 27ś28, 2018, Gothenburg, Sweden
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5714-2/18/05. . . $15.00
https://doi.org/10.1145/3196321.3197546
getting unreliable. Especially in the telecom industry, high relia-
bility software products, such as IMS signaling servers [1] have
typically been in use for 20ś30 years [2, 3]. This development land-
scape has the following peculiar characteristics: i) the software
needs to comply to large, complex and evolving standards; ii) has a
multiple-decade long development and maintenance life-cycle; iii)
is developed in large (100+ heads) development organization; iv)
which is distributed in multiple countries and; v) transfers of devel-
opment responsibility occur from one site to the other occasionally.
However, this software development landscape is not unique to
the telecom industry and our observations can be applied at other
industries, such as fnance, IT platforms, or large-scale internet
applications; all areas where complex software is developed and
maintained for long time.
It is well-known, that in such a design environment, development
and maintenance becomes more and more expensive. Prior to any
maintenance activity ś new feature development, bug fxing, etc.
ś programmers frst have to locate the place where the change
applies, have to understand the actual code to see what should be
extended or modifed, and have to explore the connections to other
parts of the software to decide how to interact in order to avoid
regression. All these activities require an adequate understanding of
the code in question and its certain environment. Although, ideally
the executor of the activity has full knowledge about the system,
in practice this is rarely the case. In fact, programmers many times
have only a vague understanding of the program they’re going to
modify. A major cost factor of legacy systems is the extra efort
of comprehension. Fixing new bugs introduced due to incomplete
knowledge about the system is also very expensive, both in terms
of development cost and time.
As the documentation is unreliable, and the original design in-
tentions are lost during the years and due to the fuctuation among
the developers, the only reliable source of the comprehension is
the existing code base.
Development tools are not performing well in the code compre-
hension process as they are optimized for writing new code, not
for efectively browsing existing one. When creating new code, the
programmer spends longer time working on the same abstraction
level: e.g. defning class interfaces, and later implementing these
classes with relationships to other classes. When one is going to un-
derstand existing code it is necessary to jump between abstraction
levels frequently: e.g. starting from a method call into a diferent
class we have to understand the role of that class with its complete
interface, where and how that class is used, then we must drill
down into the implementation details of an other specifc method.
Accordingly, when writing new code a few fles are open in parallel