Instruction-driven Timing CPU Model for Efficient Embedded Software Development using OVP Felipe Rosa 1,2 , Luciano Ost 1 , Ricardo Reis 2 , Gilles Sassatelli 1 1 LIRMM (CNRS-University of Montpellier II) 161 rue Ada, Cedex 05 - 34095 Montpellier - France {ost, sassatelli}@lirmm.fr 2 UFRGS - Instituto de Informática - PGMicro/PPGC Av. Bento Gonçalves 9500 Porto Alegre, RS - Brazil {frdarosa,reis}@inf.ufrgs.br Abstract - The software complexity of MPSoCs is increasing dramatically, resulting in new design challenges, such as improving the system’s performance and programmability by porting parallel programming APIs. Such challenges impose more time and cost on the system’s software development. This leads to the adopting of virtual platform frameworks aimed at functional verification like OVP, capable of simulating embedded systems running real application code at the speed of hundreds of MIPS. This work focuses on enhancing OVP capability by including a quasi-cycle accurate timing CPU model, making it suitable for performance analysis. This paper also evaluates the accuracy of the proposed timing CPU model when compared to a real system. Results show that the accuracy of our model varies from 0.06% to 10.56% depending on the benchmark profile. Keywords: OVP simulation, modeling, design space exploration of MPSoCs, software validation. I. INTRODUCTION Software development is an important issue in today’s MPSoC design. The increasing software complexity makes the functional verification more difficult, resulting into increased development cost [1][2]. In this context, software engineers are investigating alternatives to scale up the system performance, while dealing with new challenges in MPSoC software development, such as defining inter-CPU communication protocol stacks, as well as porting APIs and operating systems (OSs) [3]. To handle with such scenario virtual platforms are being employed. Virtual platforms emulate hardware behavior at the instruction-level making target software believe that it is running on a real physical hardware. While accelerating the software development, such simulators usually offer a set of CPU models and memory system models, allowing the analyses of executing different application/OSs onto multiprocessor architectures without modifications. Event-driven and quasi-cycle accurate virtual platform frameworks like GEM5 target microarchitecture exploration since specific modeling details are provided (e.g. instruction pipeline details, cache coherence protocols, etc) [4]. Such simulators are not scalable to a large number of CPUs, specifically when it comes to usability, ease-of-modeling and simulation time (around 200 KIPS [5]). In contrast, simulators such as the Open Virtual Platforms (OVP) OVPsim that rely on just-in-time (JIT) dynamic binary translation can achieve simulation speeds of up to 100 MIPS [5]. This simulation performance comes at the expense of accuracy; OVPsim provides instruction accuracy only, which results in inaccurate software performance estimation (e.g. application execution time). This paper contributes by including a quasi-cycle accurate timing CPU model in the OVP framework. The proposed approach broadens the OVP design space exploration spectrum, since software engineers can choose between faster (original OVP) or more accurate simulation (proposed OVP model) within the same simulator. In this direction, we claim that software engineers can easily implement/port C applications, execute them in the original OVP until the point where functionality is validated. Applications can then be executed in a still fast but quasi-cycle accurate OVP model, which allows estimating execution time of a program executing onto a given CPU architecture. Summarizing, this paper contributes in the following aspects: (i) the implementation and integration of a quasi-cycle accurate timing model into a JIT-based simulator; (ii) the extensive model evaluation by using several benchmarks, while comparing it to a real hardware platform; II. STATE OF THE ART Due the limited simulation speed of event-driven cycle- accurate frameworks, simulators based on binary translation become decisive to deal with today’s application challenges, as well as to enable large scenarios evaluation. Simics [7], QEMU [8] and the adopted OVPSim are examples of virtual platform frameworks that rely on dynamic binary translation, i.e. dynamic translation and optimization of target machine code to host machine code. Such simulators/emulators vary in modeling flexibility, simulation speed and accuracy. The lack of accuracy inherent to JIT-based simulators is motivating research in alternatives performance / accuracy tradeoffs. In this direction, Chiang et al. [9] propose the integration of QEMU and SystemC allowing faster clock-accurate evaluation when compared to RTL-based ones at the cost of inadequate simulation speed for today’s software complexity, since the simulation is performed in the SystemC environment. A pipeline model was included into QEMU in [10], where authors propose a two-phase approach (offline and online phases) to estimate application performance. In the offline phase a cycle pre-estimation of the application