From Plasma to BeeFarm: Design Experience of an FPGA-based Multicore Prototype Nehir Sonmez 1,2 , Oriol Arcas 1,2 , Gokhan Sayilar 3 , Osman S. Unsal 1 , Adri´ an Cristal 1,4 , Ibrahim Hur 1 , Satnam Singh 5 , and Mateo Valero 1,2 1 Barcelona Supercomputing Center, Spain 2 Computer Architecture Department, Universitat Polit` ecnica de Catalunya 3 Faculty of Engineering and Natural Sciences, Sabanci University, Turkey 4 IIIA - Artif. Intelligence Research Inst. CSIC - Spanish National Research Council 5 Microsoft Research Cambridge, United Kingdom Abstract. In this paper, we take a MIPS-based open-source uniproces- sor soft core, Plasma, and extend it to obtain the Beefarm infrastruc- ture for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate su- perior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research. 1 Introduction This paper reports on our experience of designing and building an eight core cache-coherent shared-memory multiprocessor system on FPGA called BeeFarm to help investigate support for Transactional Memory [11, 17, 23]. The primary reason for using an FPGA-based simulator is to achieve a significantly faster sim- ulation speed for multicore architecture research compared to the performance of software instruction set simulators. A secondary reason is that a system that uses only the FPGA fabric to model a processor may have a higher degree of fidelity since no functionality is implemented by a magical software routine. An- other way to use FPGA-based emulation is to offload infrequent or slow running instructions and I/O operations to a software simulator but retain the core func- tionality in FPGA hardware [7]. In our work we model the entire multiprocessor system on reconfigurable logic, although commercial simulator accelerators like Palladium and automated simulator parallelization efforts also take advantage of reconfigurable technology [19]. Recent advances in multicore computer architecture research are being hin- dered by the inadequate performance of software-based instruction set simulators which has led many researchers to consider the use of FPGA-based emulation.