Dynamic Shape Analysis using Spectral Graph Properties Muhammad Zubair Malik and Sarfraz Khurshid Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 Email: {zubair@mail, khurshid@ece}.utexas.edu Abstract—Dynamically allocated data structures pervade imperative and object-oriented programs. Automated analysis and testing of such programs requires reasoning about their data structures. The structures often have complex structural properties, such as acyclicity of the object graph rooted at a given pointer. Such properties pose a challenge for automated reasoning. Shape analysis is a class of techniques that address reasoning about such programs. Traditionally, shape analysis is performed using static analysis of the program code. More recently, dynamic techniques for shape analysis have been developed, which inspect program states to identify properties of data structures. This paper presents a novel dynamic technique, which adapts well-studied results from graph theory to determine the shape of the program’s key data structures. Specifically, spectral graph theory, a field that studies the properties of a graph in relation to the properties of matrices based on the graph, e.g., eigenvalues of its adjacency matrix, provides the foundational ideas. Experimental results using a suite of data structures demonstrate the potential the technique holds in identifying data structure properties and detecting likely erroneous program states. Keywords-Structural invariant generation; Shape analysis; Graph spectra; Deryaft I. I NTRODUCTION Automated analysis and testing of programs written in commonly used imperative and object-oriented languages remains a challenging problem. Part of the challenge is in automated reasoning about dynamic data structures that re- side on the program heap and often have complex structural properties, such as acyclicity of the object graph rooted at a given pointer, which are hard to reason about. Shape analysis is a class of techniques that address such properties. Traditionally, shape analysis techniques use static analysis of the program code to determine properties of its data structures [13], [20], [25], [28]. A key motivation behind the use of static analysis is to determine the prop- erties at desired control points for all program executions, say for program verification. More recently, dynamic tech- niques [10], [15], which inspect program states at desired control points to characterize data structure properties, have been developed. While these techniques do not enable verification for all executions, they enable detecting likely erroneous executions at runtime and promise to be more scalable for finding bugs than techniques based on static analysis. This paper presents a novel dynamic technique, which adapts well-studied results from graph theory to determine the shape of the program’s key data structures. We view the object graph that represents a program heap as a mathemat- ical object – an edge-labeled graph, where graph vertices correspond to objects allocated on the heap and graph edges correspond to fields of these objects [8], [9], [16]. Our tech- nique is inspired by spectral graph theory [4] – a field that studies the properties of a graph in relation to the properties of matrices based on it, such as its adjacency matrix or its Laplacian matrix. Specifically, we define properties of recursive data structures using properties of eigenvalues of the associated matrices as well as other graph properties, such as in-degree of a vertex. Our technique builds on the Deryaft framework [12], [15], which we developed in previous work, for generating likely representation invariants. Deryaft takes its inspira- tion from the Daikon invariant detector [6]. In contrast to Daikon, which is a general purpose invariant detection engine, Deryaft focuses on structural properties and as such generates more accurate structural invariants. We follow the general approach introduced by Deryaft for structural invariants: first, identify core and derived fields of a data structure; and then, check which properties from a pre- defined collection of properties hold for the field values for a given set of program states. The properties that hold for a given set of states are used in two ways: (1) to directly check if a new program state satisfies them; and (2) to generate a representation of the properties as an executable Java predicate, which can be used in a number of ways, e.g., as a runtime assertion or to perform data structure repair [5]. A key advantage of using graph spectra over Deryaft’s approach is that, in principal, they allow checking for (violation of) properties that may not be pre-defined and computed only based on the program states once they are encountered. Thus, graph spectra not only introduce a novel abstraction for properties of program state, but they also enhance our ability to dynamically detect a larger class of errors without requiring the user to provide detailed specifications. As a first step to enable detecting properties that are not directly characterized in spectral graph theory, we conjecture that an invariant learning mechanism using support vector machines [19] may provide a viable solution. 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation 978-0-7695-4670-4/12 $26.00 © 2012 IEEE DOI 10.1109/ICST.2012.33 212 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation 978-0-7695-4670-4/12 $26.00 © 2012 IEEE DOI 10.1109/ICST.2012.33 211