Extensible Code Verification Bor-Yuh Evan Chang 1 bec@cs.berkeley.edu George C. Necula 1 necula@cs.berkeley.edu Robert R. Schneck 2 schneck@math.berkeley.edu 1 Department of Electrical Engineering and Computer Science/ 2 Group in Logic and the Methodology of Science University of California, Berkeley ABSTRACT Popular language-based security mechanisms for software systems are based on verifiers that enforce a fixed and trusted type system. We live in a multi-lingual world and no system is written entirely in a single strongly-typed lan- guage. Rather than seek the absolute most general type system, we propose a sound framework for customizing the mechanism (e.g., a type system or an explicit safety proof) used to enforce a particular safety policy, enabling a pro- ducer of untrusted code to choose the most appropriate ver- ification mechanism. In this framework, called the Open Verifier, code producers can provide untrusted verifiers for checking, for example, the well-typedness of the code. This gives a code producer the maximum of flexibility of the code generation schemes and the type system used. To ensure soundness, the untrusted verifier runs under the supervision of a trusted module that queries it about the safety of indi- vidual instructions. Each answer must be accompanied by a proof that allows the trusted module to check the correct- ness of the answer. We demonstrate this framework in the context of two untrusted cooperating verifiers. One handles code that is compiled from Cool, a strongly-typed, object- oriented language (roughly, a subset of Java). The other one is used for runtime support functions written in C. Fur- thermore, we demonstrate that through careful layering of the proof-generation effort, the cost of building such an un- trusted verifier above constructing a conventional, trusted verifier is manageable. 1. INTRODUCTION Language-based security mechanisms have gained accep- tance for enforcing basic but essential security properties, such as memory safety. Without memory safety, untrusted code can interfere with the enforcement of higher-level se- curity properties. But the state of the practice in today’s language-based enforcement strategies requires the whole untrusted program to be expressed in a single “trusted” typed intermediate language, such as the Java Virtual Ma- chine Language (JVML) [19] or the Microsoft Intermediate Language (MSIL) [12, 13]. Each of these intermediate lan- guages is a good target for one or more corresponding source- level languages. Programs written in other source languages can be compiled into the trusted intermediate language but often in unnatural ways with a loss of expressiveness and performance [5, 14, 30, 6]. On top of these inconveniences, the fixed type-system ap- proach has fundamentally limited applicability as a security enforcement mechanism because it applies only to that part of a system that conforms to the given type system. All software systems have components written in more than one language, for convenience or for necessity. The same strong typing that guarantees memory safety often proves to be an obstacle when writing low-level or high-performance run- time support routines. The design of MSIL reflects better the multi-lingual composition of today’s software systems. MSIL contains support for multiple languages and also per- mits the straightforward compilation of low-level languages, such as C and C++, because it incorporates a low-level sub- language. The results of such compilations are, however, not directly verifiable and thus less privileged and more limited in how they can interact with verified code. The cost of this flexibility is in the complexity of the intermediate language. For example, MSIL includes eight distinct forms of function calls, including direct, virtual, interface virtual, and indi- rect calls, along with tail versions of these. Moreover, not surprisingly, MSIL is still not perfect for every imaginable source language. The ILX project [29] argues that MSIL ought to be extended with, among other features, two ad- ditional forms of function call to better support the compi- lation of functional higher-order languages. Similar exten- sions have been proposed for supporting parametric poly- morphism [7, 16]. There is a strong temptation to create an intermediate language type-system that is as expressive as possible because it is hard to change the type system after many copies of the virtual machine are deployed. The fundamental limitation in the design of today’s vir- tual machines is that they specify not only the low-level safety policy of interest (i.e., memory safety), but they also fix one particular mechanism that is sufficient for enforcing it (i.e., a particular strong type system). We advocate in this paper that the producer of the untrusted code should be allowed to choose the mechanism by which the low-level safety policy is enforced. After all, who is in a better posi- tion to tell what mechanism works best for one particular program? We do not think that simply selecting among sev- eral built-in mechanisms is sufficient or even desirable. We want to explore what can be achieved if the code producer is allowed to specify its own type system or the particular low-level code generation strategies that it wants to use. An ideal intermediate language for such a virtual machine is un- typed and at a low enough level so as not to place too many constraints on the code producer. This also allows the code producer to perform optimizations that must traditionally be done by JITs. In essence, we want to allow the code pro- ducer to provide a verifier for the code as well. The challenge then is to ensure the soundness of the verification process 1