Executable Analysis using Abstract Interpretation with
Circular Linear Progressions
Rathijit Sen Y. N. Srikant
Department of Computer Science and Automation
Indian Institute of Science, Bangalore
E-mail: {rathi,srikant}@csa.iisc.ernet.in
Abstract
We propose a new abstract domain for static analysis
of executable code. Concrete states are abstracted using
Circular Linear Progressions (CLPs). CLPs model com-
putations using a finite word length as is seen in any real
life processor. The finite abstraction allows handling over-
flow scenarios in a natural and straight-forward manner.
Abstract transfer functions have been defined for a wide
range of operations which makes this domain easily appli-
cable for analyzing code for a wide range of ISAs. CLPs
combine the scalability of interval domains with the dis-
creteness of linear congruence domains. We also present
a novel, lightweight method to track linear equality rela-
tions between static objects that is used by the analysis to
improve precision. The analysis is efficient, the total space
and time overhead being quadratic in the number of static
objects being tracked.
1 Introduction
A wide selection of problems require statically analyz-
ing programs for deducing or verifying properties that hold
at the time of execution. These include detection of mem-
ory aliases with direct impact on compiler optimizations,
timing verification for real-time systems, detecting possi-
bilities for stack overflow, checking program assertions for
correctness and many more. Often, source code may not be
available. Analysis of arbitrary executable code in the ab-
sence of source level information has important uses such as
detecting malicious content and vulnerabilities, performing
code comparisons, and others. In all of these, we are inter-
ested in knowing about properties that hold over all possible
program inputs and execution sequences.
A landmark paper by Cousot and Cousot [4] introduced
the concept of Abstract Interpretation. It provides a for-
mal framework for dealing with abstract representations of
program states and is a well established technique for static
analysis of programs. The idea is to define an abstract do-
main and operations on elements of that domain consistent
with concrete execution semantics. At any program point,
the set of abstract values holding at that point is an over-
approximation of the possible set of concrete values that
can hold at that point over all possible execution sequences.
In order to analyze arbitrary code, abstract elements must
be composable for a wide range of operations. Computa-
tions that involve a finite word length introduce additional
challenges in guaranteeing safety for abstract transfer func-
tions. For example, the result of adding two positive values
may not be positive. It is also desired that the analysis is not
computationally intensive.
We introduce a new abstract domain particularly suited
for statically analyzing arbitrary executable code, with ab-
stract elements represented by Circular Linear Progressions
(CLPs). CLPs model computations in the concrete domain
that use a finite word length. CLPs ensure safety even if
computations incur overflow. CLPs track discrete sets of
values and are easily composable for a wide range of oper-
ations. Space overhead for our analysis is O(N
2
) per basic
block, where N is the total number of static objects being
tracked. Each static object represents a register or a stati-
cally identifiable memory partition. The abstract function
results are refined for better precision using linear equal-
ity relations between static objects. Such relations are de-
rived using an O(N
2
) algorithm, resulting in a computa-
tional complexity of O(N
2
) for analyzing each instruction
and join point.
As an example, consider the source representation of a
computation in Figure 1. The computation for y results in
an overflow for x =3 but not for x =7. The CLPs for y
and z as computed by our analysis are shown alongside as
3-tuples (lower bound, upper bound, step) as described in
§3. They indicate that y is either -4 or 0 and z is either -1
or 3 during actual execution. For this example, the analyzer
has been able to compute a safe and tight approximation of
the actual runtime values.
39 1-4244-1050-9/07/$25.00 ©2007 IEEE