A Formal Semantics for Weak References J. J. Hallett Boston University jhallett@cs.bu.edu A. J. Kfoury Boston University kfoury@cs.bu.edu May 22, 2005 Modified: August 8, 2005 Abstract A weak reference is a reference to an object that is not followed by the pointer tracer when garbage collection is called. That is, a weak reference cannot prevent the object it references from being garbage collected. Weak references remain a troublesome programming feature largely because there is not an accepted, precise semantics that describes their behavior (in fact, we are not aware of any formalization of their semantics). The trouble is that weak references allow reachable objects to be garbage collected, therefore allowing garbage collection to influence the result of a program. Despite this difficulty, weak references continue to be used in practice for reasons related to efficient storage management, and are included in many popular programming languages (Standard ML, Haskell, OCaml, and Java). We give a formal semantics for a calculus called λ weak that includes weak references and is derived from Morrisett, Felleisen, and Harper’s λgc. λgc formalizes the notion of garbage collection by means of a rewrite rule. Such a formalization is required to precisely characterize the semantics of weak references. However, the inclusion of a garbage-collection rewrite-rule in a language with weak references introduces non-deterministic evaluation, even if the parameter-passing mechanism is deterministic (call-by-value in our case). This raises the question of confluence for our rewrite system. We discuss natural restrictions under which our rewrite system is confluent, thus guaranteeing uniqueness of program result. We define conditions that allow other garbage collection algorithms to co-exist with our semantics of weak references. We also introduce a polymorphic type system to prove the absence of erroneous program behavior (i.e., the absence of “stuck evaluation”) and a corresponding type inference algorithm. We prove the type system sound and the inference algorithm sound and complete. 1 Introduction Motivation Behind Weak References Weakreferencesarereferencestoanobjectthatisnotfollowedbythepointertracerwhengarbagecollection is called. That is, a weak reference cannot prevent the object it references from being garbage collected. Most language implementations that support weak references (SMLofNJ, Hugs-GHC, OCaml, Java) allow a weak reference to be dereferenced, determining whether the object pointed to has been garbage collected and, if not, what the object is [SML, Mosa, Mosb, Hug98, OCa, Sun]. Weak references have shown to be particularly useful when we want to store numerous objects without allowing them to permanently occupy space. The classic examples of data structures that benefit from weak referencesarecaches,implementationsofhash-consing,andmemotables[CMP00]. Ineachdatastructurewe may wish to keep a reference to an object but also prevent that object from consuming unnecessary space. That is, we would like the object to be garbage collected once it is no longer reachable from outside the 1