A case study on modeling shared memory access effects during performance analysis of HW /SW systems Marcello Lajolo * Politecnico di Torino Torino, Italy lajolo@polito.it Anand Raghunathan NEC C&C Research Labs. Princeton, NJ, USA anand@ccrl.nj.nec.com Sujit Dey* UC San Diego La Jolla, CA, USA dey@ece.ucsd.edu Luciano Lavagno Politecnico di Torino Torino, Italy lavagno@polito.it Abstract Behavioral simulation with timing annotations derived from performance modeling and analysis is a promising alternativefor use in evaluating system-level design trade- offs [1, 2J. The accuracy of such approaches is determined by how well the effects of various HW and SW architec- turalfeatures, like the Real Time Operating System (RTOS), sharedmemories and buses, HWISW communication mech- anisms, etc are modeled at this level. Wepresent a study of the effects of shared memory buses during system-level performance analysis in the POLIS co- design environment, using the example of a TCPI/P Net- work Interface System. We demonstrate how the effects of the memory arbiter and shared memory bus can be mod- eled efficiently at the behavioral level, and used to evaluate various design tradeoffs. Experimental results demonstrate that modeling these effects can significantly increase the accuracy of system-level performance estimates. 1 Introduction Efficient exploration of system-level design tradeoffs de- pends heavily on the availability of fast and accurate esti- mation and modeling techniques, for metrics such as perfor- mance, power, and cost, to guide various design decisions. Various techniques have been proposed for performance analysis of hardware [3,4,5] and software [6,7]. In this pa- per, we focus on performance modeling for mixed HW/SW embedded systems. Hardware-software co-simulation [8] remains the most popular approach to performance estima- tion for such systems. There are several flavors of hardware- "This work was started when the authors were at NEC C&C Research Labs, Princeton, NJ 1092-6100/98 $10.00 @ 1998 IEEE [ Alberto Sangiovanni Vincentelli University of California at Berkeley Berkeley, CA, USA alberto@eecs.berkeley.edu software simulation, with varying degrees of efficiency and accuracy. The techniques that involve simulating (RTL) hardware models of the embedded processor(s) along with the models of the hardware components tend to be the most accurate, but are also the slowest. Moreover, detailed hard- ware models for embedded processors are often not available to system designers. A popular alternative is to use instruc- tion set simulators (ISS) to simulate the software compo- nents of the system, and HDL simulators to simulate the hardware components. Instruction set simulators may be cy- cle and bit-accurate, or may abstract out some architectural details of the target embedded processor such as pipelines and superscalar ordering for efficiency. The efficiency of this approach may still be limited due to the (assembly or binary instruction) level of detail in software simulation, and the communication overhead required to synchronize the ex- ecution of the ISS and hardware simulator. While there has been some work on attempting to reduce the synchroniza- tion overhead [9, 10], such approaches are still not very efficient for use in exploring tradeoffs during HW/SW co- design. Bus functional models of the embedded processors may be used to exercise the hardware components without needing to run an ISS concurrently, however, only the hard- ware functionality is simulated in this approach, making it more suitable for validation ofthe hardware and HW/SW in- terface. Using an interface-based design methodology [11] helps separate the behavior of the components from their interface protocols, and allows the use of time and space abstractions for efficient validation and analysis. Behavioral simulation coupled with timing annotations based on performance modeling techniques offers a promis- ing alternative for use in evaluating system-level design tradeoffs [12, 2]. In such approaches, behavioral models of the software components are simulated, and performance estimates for blocks of code are used to annotate timing in- formation. In the POLIS co-design environment [12], a ho- 117