Cluster Comput (2009) 12: 123–140 DOI 10.1007/s10586-009-0076-0 On GPU’s viability as a middleware accelerator Samer Al-Kiswany · Abdullah Gharaibeh · Elizeu Santos-Neto · Matei Ripeanu Received: 1 January 2009 / Accepted: 5 January 2009 / Published online: 17 January 2009 © Springer Science+Business Media, LLC 2009 Abstract Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or mid- dleware level. This study starts from the hypothesis that generic middleware-level techniques that improve distrib- uted system reliability or performance (such as content ad- dressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support. We take a first step towards validating this hypothesis and we design StoreGPU, a library that accelerates a number of hashing-based middleware primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up twenty five fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files. Keywords Middleware · Storage system · Graphics Processing Unit · GPU hashing · StoreGPU S. Al-Kiswany () · A. Gharaibeh · E. Santos-Neto · M. Ripeanu Electrical and Computer Engineering Department, The University of British Columbia, Vancouver, BC Canada, V6T 1Z4 e-mail: samera@ece.ubc.ca A. Gharaibeh e-mail: abdullah@ece.ubc.ca E. Santos-Neto e-mail: elizeus@ece.ubc.ca M. Ripeanu e-mail: matei@ece.ubc.ca 1 Introduction Recent advances in processor technology [1] have re- sulted in a wide availability of massively parallel Graphics Processing Units (GPUs). Low-end GPUs like NVIDIA’s GeForce 8600 priced at about $100 have 32 processors and 256 MB of memory while high-end GPUs, like the NVIDIA 8800 GTX priced at about $300, have up to 128 processors running at 575 MHz and 768 MB of memory, for instance. With these characteristics, GPUs are often underutilized in desktops deployments (as these are generally provisioned for graphics-intensive workloads such as high-definition video) and may be cost-effective enhancements to high-end server systems. However, the constraints introduced by the GPU pro- gramming model which, until recently, specialized in sup- porting only graphical processing, have led past efforts aimed at harnessing this resource to focus exclusively on computationally intensive scientific applications [2]. Al- though these efforts confirmed that significant speedup is achievable, the development cost for this specialized plat- form was often prohibitive. Recently, however, the in- troduction of general-purpose programming models (e.g., NVIDIA’s CUDA [3]) lowered the development cost mak- ing GPUs attractive to a broader spectrum of applications. Additionally, improvements on GPUs architecture created the opportunity to data intensive applications to benefit from GPUs. This study starts from the observation that a number of techniques that enhance the reliability and/or performance of distributed storage systems (e.g., content addressability in data storage [4, 5], erasure codes [6], on-the-fly data sim- ilarity detection [7]) incur computational overheads that of- ten preclude their effective usage with today’s commodity