Program Abstractions for Behaviour Validation Guido de Caso Víctor Braberman Departamento de Computación, FCEyN, UBA Buenos Aires, Argentina {gdecaso, vbraber, diegog}@dc.uba.ar Diego Garbervetsky Sebastián Uchitel ∗† Department of Computing, Imperial College London, UK s.uchitel@doc.ic.ac.uk ABSTRACT Code artefacts that have non-trivial requirements with re- spect to the ordering in which their methods or procedures ought to be called are common and appear, for instance, in the form of API implementations and objects. This work addresses the problem of validating if API implementations provide their intended behaviour when descriptions of this behaviour are informal, partial or non-existent. The pro- posed approach addresses this problem by generating ab- stract behaviour models which resemble typestates. These models are statically computed and encode all admissible sequences of method calls. The level of abstraction at which such models are constructed has shown to be useful for val- idating code artefacts and identifying findings which led to the discovery of bugs, adjustment of the requirements ex- pected by the engineer to the requirements implicit in the code, and the improvement of available documentation. Categories and Subject Descriptors D.2.4 [Software Engineering]: Software/Program Verifi- cation—validation ; D.2.5 [Software Engineering]: Test- ing and Debugging—debugging aids General Terms Algorithms, Design, Verification Keywords Behaviour model synthesis, automated abstraction, source code validation 1. INTRODUCTION Code artefacts that have non-trivial requirements with re- spect to the order in which their methods or procedures ought to be called are commonplace. Such is the case for many API implementations and objects. In practice, de- scriptions of intended behaviour are incomplete and infor- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE ’11, May 21–28, 2011, Honolulu, Hawaii, USA Copyright 2011 ACM 978-1-4503-0445-0/11/05 ...$10.00. mal, if documented at all, hindering verification and vali- dation of the code artefacts themselves and the client code that uses the artefacts. Hence, researchers have not relied on these descriptions and developed techniques to support the mining or synthesis of typestates [17] from API implementa- tions which are then used to verify if client code conforms to the implemented protocol [1, 3]. Such approaches, however, address only part of the problem: they assume the code from which the typestate is extracted is correct; that it conforms to the ordering of methods or procedures intended at the time of design or developing the requirements for the API. This work addresses the complementary problem of val- idating if API implementations provide their intended be- haviour when descriptions of this behaviour are informal, partial or non-existent. Validation of API implementation behaviour can result in the identification of bugs in the code which induce undesired requirements, adjustment of the re- quirements expected by the engineer to the requirements implicit in the code, and the improvement of available doc- umentation for that code. In this work, we argue that an automatically constructed abstraction of an API implementation can be useful for val- idation against poorly documented requirements or the en- gineer’s mental model and can lead to the identification of problems in the code, in the requirements or the engineer’s understanding of both. Given that validation is an activ- ity that requires human intervention, the level at which an API implementation is abstracted is key and has different re- quirements than those abstractions used for verification [18]. In this paper we present a novel technique for automat- ically constructing abstractions in the form of behaviour models from code artefacts equipped with requires clauses for methods. These models, similarly to typestates, encode all admissible sequences of method calls. The level of ab- straction at which such models are constructed aims at pre- serving enabledness of sets of operations, resulting in a fi- nite model with intuitive and formal traceability links to the code. This level of abstraction and the traceability links have shown to be useful for validation code artefacts and identifying findings that relate to bugs in code and prob- lems in expected or documented requirements. Literature on typestate synthesis refers to safety and per- missiveness as a way to characterize abstraction properties: a typestate is safe [1] if no call sequence violates the library’s internal invariants; it is permissive if it contains every such sequence. Previous approaches have aimed (e.g., [12]) at modular program analysis using typestates which are both safe and permissive for cases in which the library’s internal