1 High-Speed Formal Verification of Heterogeneous Coherence Hierarchies Jesse G. Beu, Jason A. Poovey, Eric R. Hein, Thomas M. Conte Georgia Institute of Technology, Atlanta GA jesse.beu@gmail.com, japoovey@gmail.com, ehein6@gatech.edu, tom@conte.us Abstract As more heterogeneous architecture solutions con- tinue to emerge, coherence solutions tailored for these architectures will become mandatory. Coherence hi- erarchies will likely continue to be prevalent in future large-scale shared memory architectures. However, past experience has shown that hierarchical coherence protocol design is a non-trivial problem, especially when considering the verification effort required to guarantee correctness. While some strategies do exist for verification of homogenous coherence hierarchies, support for rea- sonable verification of heterogeneous coherence hier- archies is currently unavailable. Ideally, hierarchical coherence protocols composed of ‘building block’ pro- tocols should be able to take advantage of incremental verification to side step the state-space explosion prob- lem which hampers any large-scale verification effort. In this work, we prove this can be accomplished through the use of the Manager-Client Pairing (MCP) framework, which provides encapsulation and permis- sion checking support that enables a form of state- space symmetry. When combined with an inductive proof, this ensures the validation properties of proper permission distribution and livelock/deadlock freedom are enforced by any hierarchical composition of MCP compliant protocols. Demonstration of this methodol- ogy through the MurPhi formal verifier shows several orders of magnitude improvement in verification cost compared to full hierarchy verification. 1. Introduction It is well established that power constraints have caused a major paradigm shift in computer architecture towards parallel processing for performance scaling. With it have come new opportunities and design spaces for architects to explore. Among these are heterogene- ous architectures, where on-chip network and proces- sor diversity can be exploited for performance benefit or power/energy savings [1-5]. Such systems benefit from the design of diverse interacting coherence proto- cols, where each protocol is optimized to take ad- vantage of properties of a homogeneous region within the overall heterogeneous architecture. This comes at a cost however, in that the design and verification com- plexity of such systems is substantially higher than that of their homogenous coherence counterparts. Despite this cost, the benefit of heterogeneous co- herence has resulted in real-world applications of co- herence heterogeneity. The Wildfire architecture, for example, was built using the existing first level proto- col of the Sun E6500 in a larger hierarchy that enabled Coherent Memory Replication for improved node lo- cality [3]. The Piranha architecture [4] had an intra- chip coherence management mechanism that was inte- grated with an independent inter-chip coherence proto- col engine. This allowed for efficient use of on-chip caches and fast intra-chip data transfers while another DRAM directory-based protocol could be leveraged to enable scalability and performance at the inter-chip granularity. The HP Superdome [5] also employed a similar strategy as Wildfire, but with a different goal in mind. An inter-chip communication layer interfaced the native intra-chip protocol to a higher-level directo- ry protocol. The resulting system was able to restrict message broadcast scope to the local protocol in many cases, enabling the use of commodity parts (i.e., those with “glueless” multiprocessor buses) in a large-scale system while maintaining performance. These exam- ples suggest that heterogeneous coherence hierarchies will become more attractive in the present era as cur- rent technology trends continue. Another factor motivating heterogeneous coherence support is the emergence of Partitioned Global Ad- dress Space (PGAS) languages, such as X10 [7], which explicitly express physical locality of memory through places and processor/thread affinity. Depending on the relationship between the size of the address spaces assigned to a place, the number of active threads oper- ating within a place, and the available architectural resources, localized coherence protocols can be benefi- cial. Localized protocols can be optimized for a par- ticular place’s partition of the address space and archi- tectural real estate, while still maintaining global ad-