Performance Modeling for Early Analysis of Multi-Core Systems Reinaldo Bergamaschi 1 , Indira Nair 1 , Gero Dittmann 1 , Hiren Patel 3 , Geert Janssen 1 , Nagu Dhanwada 4 , Alper Buyuktosunoglu 1 , Emrah Acar 5 , Gi-Joon Nam 5 , Guoling Han 2 , Dorothy Kucar 1 , Pradip Bose 1 , John Darringer 1 1 IBM T. J. Watson Research Center, Yorktown Heights, NY 10598; 2 Univ. of California, Los Angeles, CA 90095; 3 Virginia Tech, Blacksburg, VA 24060; 4 IBM EDA, East Fishkill, NY 12533, 5 IBM Austin Research, Austin, TX 78758 berga@us.ibm.com ABSTRACT Performance analysis of microprocessors is a critical step in defining the microarchitecture, prior to register-transfer-level (RTL) design. In complex chip multiprocessor systems, including multiple cores, caches and busses, this problem is compounded by complex performance interactions between cores, caches and interconnections, as well as by tight interdependencies between performance, power and physical characteristics of the design (i.e., floorplan). Although there are many point tools for the analysis of performance, or power, or floorplan of complex systems-on-chip (SoCs), there are surprisingly few works on an integrated tool that is capable of analyzing these various system characteristics simultaneously and allow the user to explore different design configurations and their effect on performance, power, size and thermal aspects. This paper describes an integrated tool for early analysis of performance, power, physical and thermal characteristics of multi-core systems. It includes cycle-accurate, transaction-level SystemC-based performance models of POWER processors and system components (i.e., caches, buses). Power models, for power computation, physical models for floorplanning and packaging models for thermal analysis are also included. The tool allows the user to build different systems by selecting components from a library and connecting them together in a visual environment. Using these models, users can simulate and dynamically analyze the performance, power and thermal aspects of multi-core systems. Categories and Subject Descriptors C.0 [General]: Modeling of computer architecture, system architectures, systems specification methodology. General Terms Algorithms, Performance, Design, Experimentation. Keywords Performance, power and physical analysis, transaction-level modeling, multi-core systems modeling, early analysis. 1. INTRODUCTION Advanced microprocessor design methodologies rely heavily on early performance and power analysis for microarchitecture trade- offs and tuning. Simulation-based methods using execution- driven or trace-driven models are commonly used. In order to obtain a reasonable degree of accuracy, critical for detailed trade- off analysis, cycle-accurate models of the internal pipelines of the processors, as well as communication delays between components are needed. The communication delays between components include a functional part and a physical part. The functional delay depends on the specific communication protocols used. The physical delay is related to the number of cycles needed to transfer data across the length of the interconnections, which depends on the relative positioning of the components (i.e., floorplan), the technology and buffering capabilities. As components get larger, the physical delays increase and must be taken into account in the models. Several microprocessor performance analysis tools have been developed over the years for various purposes. These fall in two main types, trace-driven timing simulators [1][2] and execution- driven simulators [3][4]. Both of these have advantages and disadvantages regarding simulation speed and the ability to model certain architectural details, such as branches and speculative execution, and the ability to execute actual software versus instruction traces. Power models and tools, which use statistics generated by performance simulators, have also been developed [5][6]. While these tools have been successfully applied to a variety of processors and systems, they lack the modularity and componentization required for quick design exploration. Moreover, they do not offer an integrated environment for analyzing performance, power, floorplan and thermal aspects. This paper presents the models and tools supporting an integrated approach to early design analysis for multi-core systems, which were implemented in a tool called SLATE (System-Level Analysis Tool for Early Exploration). This paper gives an overview of the system and a detailed description of the performance models. This paper is organized in the following way. Section 2 presents an overview of SLATE and the early design methodology it supports. Section 3 describes the SystemC-based performance modeling approach applied to the SLATE components. Section 4 presents the experimental results and Section 5 offers conclusions. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CODES+ISSS’07, September 30 – October 3, 2007, Salzburg, Austria. Copyright 2007 ACM 978-1-59593-824-4/07/0009…$5.00. 209