Analysis of Invariants for Efficient Bounded Verification Juan P. Galeotti, Nicolás Rosner, Carlos G. López Pombo Department of Computer Science, FCEyN Universidad de Buenos Aires, Argentina {jgaleotti,nrosner,clpombo}@dc.uba.ar Marcelo F. Frias Department of Software Engineering, Buenos Aires Institute of Technology (ITBA) Argentina mfrias@itba.edu.ar ABSTRACT SAT-based bounded verification of annotated code consists of translating the code together with the annotations to a propositional formula, and analyzing the formula for specifi- cation violations using a SAT-solver. If a violation is found, an execution trace exposing the error is exhibited. Code involving linked data structures with intricate invariants is particularly hard to analyze using these techniques. In this article we present TACO, a prototype tool which implements a novel, general and fully automated technique for the SAT-based analysis of JML-annotated Java sequen- tial programs dealing with complex linked data structures. We instrument code analysis with a symmetry-breaking pred- icate that allows for the parallel, automated computation of tight bounds for Java fields. Experiments show that the translations to propositional formulas require significantly less propositional variables, leading in the experiments we have carried out to an improvement on the efficiency of the analysis of orders of magnitude, compared to the non- instrumented SAT-based analysis. We show that, in some cases, our tool can uncover bugs that cannot be detected by state-of-the-art tools based on SAT-solving, model checking or SMT-solving. Categories and Subject Descriptors D.1.5 [Programming Techniques]: Object-Oriented Pro- gramming; D.2.1 [Software Engineering]: Specifications; D.2.4 [Software Engineering]: Program verification—Class invariants, programming by contract, formal methods. General Terms Verification, Languages Keywords Static analysis, SAT-based code analysis, Alloy, KodKod, DynAlloy. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISSTA’10, July 12–16, 2010, Trento, Italy. Copyright 2010 ACM 978-1-60558-823-0/10/07 ...$10.00. 1. INTRODUCTION SAT-based analysis of code allows one to statically find failures in software. This requires appropriately translating the original piece of software, as well as some assertion to be verified, to a propositional formula. The use of a SAT-solver then allows one to find a valuation for the propositional vari- ables that encodes a failure: a valid execution trace of the system that violates the given assertion. With variations, this is the approach followed by CBMC [6], Saturn [32] and F-Soft [17] for the analysis of C code, and by Miniatur [11] and JForge [9] for the analysis of Java code. In the presence of contracts for invoked methods, mod- ular SAT-based analysis can be done by first replacing the calls in a method by the corresponding contracts and then analyzing the resulting code. This is the approach followed for instance in [9]. One important limitation remains at the intraprocedural level, where the code for a single method (al- ready including the contracts or the inlined code for called methods) has to be analyzed. Code involving linked data structures with rich invariants (such as circular lists, red- black trees, AVL trees or binomial heaps) is hard to analyze using these techniques. In this article we present TACO (Translation of Anno- tated COde), our prototype tool implementing a novel, gen- eral and fully automated technique for SAT-based analysis of sequential annotated code involving complex linked data structures. This technique relies on a novel and effective way of removing variables in the translation to a proposi- tional formula. In Section 4 we will present experimental results showing that the technique we present shows signif- icant improvements in SAT-based intraprocedural program analysis and allows us to uncover bugs that could not be de- tected using state-of-the-art tools based on model checking or SMT-solving. To describe the technique at a high level of abstraction let us consider the following class for singly-linked structures: public class List { public class LNode { LNode head; LNode next; } int key;} Assuming that we use this structure for representing a singly linked list, we require lists to be acyclic. Let us also assume that nodes have identifiers N0,N1,N2,..., and that nodes are kept in the list in order according to their iden- tifiers (N0 <N1 < ··· ). Thus, a list instance will have the shape d d L head next N 0 next d N 1 N 2 0 1 2