Type-Safe Modular Hash-Consing Jean-Christophe Filliˆ atre LRI Universit´ e Paris Sud 91405 Orsay France filliatr@lri.fr Sylvain Conchon LRI Universit´ e Paris Sud 91405 Orsay France conchon@lri.fr Abstract Hash-consing is a technique to share values that are structurally equal. Beyond the obvious advantage of saving memory blocks, hash-consing may also be used to speed up fundamental operations and data structures by several orders of magnitude when sharing is maximal. This paper introduces an OCAML hash-consing library that encapsulates hash-consed terms in an abstract datatype, thus safely ensuring maximal sharing. This library is also parameterized by an equality that allows the user to identify terms according to an arbitrary equivalence relation. Categories and Subject Descriptors D.2.3 [Software engineer- ing]: Coding Tools and Techniques General Terms Design, Performance Keywords Hash-consing, sharing, data structures 1. Introduction Hash-consing is a technique to share purely functional data that are structurally equal [8, 9]. The name hash-consing comes from Lisp: the only allocating function is cons and sharing is traditionally realized using a hash table [2]. One obvious use of hash-consing is to save memory space. Hash-consing is part of the programming folklore but, in most programming languages, it is more a design pattern than a library. The standard way of doing hash-consing is to use a global hash table to store already allocated values and to look for an existing equal value in this table every time we want to create a new value. For instance, in the Objective Caml programming language 1 it reduces to the following four lines of code using hash tables from the OCAML standard library: let table = Hashtbl.create 251 let hashcons x = try Hashtbl.find table x with Not_found Hashtbl.add table x x; x The Hashtbl module uses the polymorphic structural equality and a generic hash function. The initial size of the hash table is clearly 1 We use the syntax of Objective Caml (OCAML for short) [1] throughout this paper, but this could be easily translated to any ML implementation. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ML’06 September 16, 2006, Portland, Oregon, USA. Copyright c 2006 ACM 1-59593-483-9/06/0009. . . $5.00. context dependent and we will not discuss its choice in this paper— anyway, choosing a prime number is always a good idea. As a running example of a datatype on which to perform hash- consing, we choose the following type term for λ-terms with de Bruijn indices: type term = | Var of int | Lam of term | App of term × term Instantiated on this type, the hashcons function has the following signature: val hashcons : term term If we want to get maximal sharing—the property that two values are indeed shared as soon as they are structurally equal—we need to systematically apply hashcons each time we build a new term. Therefore it is a good idea to introduce smart constructors perform- ing hash-consing: let var n = hashcons (Var n) let lam u = hashcons (Lam u) let app (u,v) = hashcons (App (u,v)) By applying var, lam and app instead of Var, Lam and App di- rectly, we ensure that all the values of type term are always hash- consed. Thus maximal sharing is achieved and physical equality (==) can be substituted for structural equality (=) since we now have x = y ⇐⇒ x == y. In particular, the equality used in the hash-consing itself can now be improved by using physical equality on sub-terms, since they are al- ready hash-consed by assumption. To do such a bootstrapping, we need custom hash tables based on this new equality. Fortunately, the OCAML standard library provides generic hash tables parame- terized by arbitrary equality and hash function. To get custom hash tables, we simply need to define a module that packs together the type term, an equality and a hash function module Term = struct type t = term let equal x y = match x,y with | Var n, Var m n == m | Lam u, Lam v u == v | App (u1,u2), App (v1,v2) u1 == v1 && u2 == v2 | false let hash = Hashtbl.hash end and then to apply the Hashtbl.Make functor: module H = Hashtbl.Make(Term)