A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads Shuai Che Jeremy W. Sheaffer Michael Boyer sc5nf@virginia.edu jws9c@cs.virginia.edu mwb7w@cs.virginia.edu Lukasz G. Szafaryn Liang Wang Kevin Skadron lgs9a@virginia.edu lw2aw@virginia.edu skadron@cs.virginia.edu The University of Virginia Department of Computer Science Abstract—The recently released Rodinia benchmark suite enables users to evaluate heterogeneous systems including both accelerators, such as GPUs, and multicore CPUs. As Rodinia sees higher levels of acceptance, it becomes important that researchers understand this new set of benchmarks, especially in how they differ from previous work. In this paper, we present recent extensions to Rodinia and conduct a detailed characterization of the Rodinia benchmarks (including performance results on an NVIDIA GeForce GTX480, the first product released based on the Fermi architecture). We also compare and contrast Rodinia with Parsec to gain insights into the similarities and differences of the two benchmark collections; we apply principal component analysis to analyze the application space coverage of the two suites. Our analysis shows that many of the workloads in Rodinia and Parsec are complementary, capturing different aspects of certain performance metrics. I. I NTRODUCTION Computer systems are increasingly exposing a heteroge- neous computing model consisting of accelerators—such as graphics processors (GPUs), media processors, and even re- configurable hardware like FPGAs—combined with one or more conventional CPUs. GPUs, for instance, offer parallelism at scales unachievable with other processors and afford about an order of magnitude greater peak throughput than general- purpose, multicore CPUs, while the CPUs offer high single- thread performance and programmability. A vision of heterogeneous computer systems that incor- porate diverse accelerators and automatically select the best computational unit for a particular task is widely shared among researchers and many industry analysts; however, there are no agreed-upon benchmarks to support the research needed in the development of such a platform. There are many benchmark suites for parallel computing on general-purpose CPU architectures, but accelerators fall into a gap that is not covered by current benchmark suites or benchmark de- velopment. There is a dearth of publicly available code for heterogeneous platforms. The Rodinia benchmark suite [8], a set of free and open benchmarks and associated methodologies, was developed to address these concerns. The Rodinia applications are de- signed for heterogeneous computing infrastructures, and, using OpenMP and CUDA, target both GPUs and multicore CPUs. The implementations for each distinct platform can also serve as independent suites to evaluate multicore and manycore architectures separately. The Rodinia suite is structured to span a range of parallelism and compute patterns, providing re- searchers with various feature options to identify architectural bottlenecks and to fine tune hardware designs. Several multithreaded benchmark suites for multicore CPUs, including SPLASH-2 [35], Parsec [5], and SPEC OMP [29], are available. Rodinia was developed to address the issues of benchmarking heterogeneous systems, particularly those including a GPU. There is growing support for use of the Rodinia workloads [6], [8]–[10], [24], but there are some important questions yet to be answered: How much do those Rodinia workloads which are de- signed for heterogeneous platforms (those with GPU accelerators) differ from those of other suites designed for multicore CPUs? Do the workload designs of other suites demonstrate overlapping or orthogonal features? How well do the chosen applications span the workload space? How well can traditional, multithreaded CPU workloads map onto GPU platforms? A better understanding of these issues will not only expand the knowledge of parallel benchmark construction, but could also inform decisions on workload scheduling and partitioning on different architectures and guide researchers to choose appropriate benchmarks for their research as well. In this paper we make the following contributions: We present important extensions to the Rodinia bench- mark suite that have been added since its initial publica- tion at IISWC 2009 [8]. We conduct a more detailed characterization of the Ro- dinia GPU workloads to aid researchers in understanding