Executable Analysis using Abstract Interpretation with Circular Linear Progressions Rathijit Sen Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore E-mail: {rathi,srikant}@csa.iisc.ernet.in Abstract We propose a new abstract domain for static analysis of executable code. Concrete states are abstracted using Circular Linear Progressions (CLPs). CLPs model com- putations using a finite word length as is seen in any real life processor. The finite abstraction allows handling over- flow scenarios in a natural and straight-forward manner. Abstract transfer functions have been defined for a wide range of operations which makes this domain easily appli- cable for analyzing code for a wide range of ISAs. CLPs combine the scalability of interval domains with the dis- creteness of linear congruence domains. We also present a novel, lightweight method to track linear equality rela- tions between static objects that is used by the analysis to improve precision. The analysis is efficient, the total space and time overhead being quadratic in the number of static objects being tracked. 1 Introduction A wide selection of problems require statically analyz- ing programs for deducing or verifying properties that hold at the time of execution. These include detection of mem- ory aliases with direct impact on compiler optimizations, timing verification for real-time systems, detecting possi- bilities for stack overflow, checking program assertions for correctness and many more. Often, source code may not be available. Analysis of arbitrary executable code in the ab- sence of source level information has important uses such as detecting malicious content and vulnerabilities, performing code comparisons, and others. In all of these, we are inter- ested in knowing about properties that hold over all possible program inputs and execution sequences. A landmark paper by Cousot and Cousot [4] introduced the concept of Abstract Interpretation. It provides a for- mal framework for dealing with abstract representations of program states and is a well established technique for static analysis of programs. The idea is to define an abstract do- main and operations on elements of that domain consistent with concrete execution semantics. At any program point, the set of abstract values holding at that point is an over- approximation of the possible set of concrete values that can hold at that point over all possible execution sequences. In order to analyze arbitrary code, abstract elements must be composable for a wide range of operations. Computa- tions that involve a finite word length introduce additional challenges in guaranteeing safety for abstract transfer func- tions. For example, the result of adding two positive values may not be positive. It is also desired that the analysis is not computationally intensive. We introduce a new abstract domain particularly suited for statically analyzing arbitrary executable code, with ab- stract elements represented by Circular Linear Progressions (CLPs). CLPs model computations in the concrete domain that use a finite word length. CLPs ensure safety even if computations incur overflow. CLPs track discrete sets of values and are easily composable for a wide range of oper- ations. Space overhead for our analysis is O(N 2 ) per basic block, where N is the total number of static objects being tracked. Each static object represents a register or a stati- cally identifiable memory partition. The abstract function results are refined for better precision using linear equal- ity relations between static objects. Such relations are de- rived using an O(N 2 ) algorithm, resulting in a computa- tional complexity of O(N 2 ) for analyzing each instruction and join point. As an example, consider the source representation of a computation in Figure 1. The computation for y results in an overflow for x =3 but not for x =7. The CLPs for y and z as computed by our analysis are shown alongside as 3-tuples (lower bound, upper bound, step) as described in §3. They indicate that y is either -4 or 0 and z is either -1 or 3 during actual execution. For this example, the analyzer has been able to compute a safe and tight approximation of the actual runtime values. 39 1-4244-1050-9/07/$25.00 ©2007 IEEE