Abstract – In the era of modern data communications, as the need for data security arises, the need to reduce the execution time and computation overhead associated with the execution of cryptographic algorithms increases correspondingly. Parallelizing the computation of cryptographic algorithms on many-core computing platforms can be a promising approach to reduce the execution time and eventually the energy consumption of such algorithms. In this paper, we build a pipelined model to analyze and compare the execution time and energy consumption of the Blowfish cryptographic algorithm on the Single-Chip Cloud Computer (SCC), an experimental processor created by Intel Labs. In this model the Blowfish cryptographic algorithm is divided to smaller chunks and each chunk is run only by one core. Using message passing interface, the input data passes in turn through all the cores involved. Due to the communication overhead and latency associated with this model, we experimented and identified the optimal message size to pass between the cores to avoid saturating the on-chip communication network. Our results illustrate that our parallel approach is 27X faster than the sequential approach and yields close to 16X less energy consumption on the SCC platform. Keywords: Cryptography, Blowfish, Energy Efficiency, MPI, Pipeline, SCC I. INTRODUCTION In the new era of data-intensive computing, global data communications, and inexpensive internet connections, there is a higher demand for data security, energy efficiency, and computational speedup. Cryptography plays an important role for protecting data from destructive forces and the unwanted actions from unauthorized users. In an effort to respond to the ever-increasing need for data security, cryptographic algorithms become mathematically more and more complex with time. However, the increase in the complexity of such algorithms incurs more computation overhead, which leads to a longer execution time and therefore higher energy consumption of the computing system. In recent years, successful studies have been made using hardware acceleration techniques to speed up the execution of cryptographic algorithms. A general hardware- assisted approach was presented by Tang et al. [8, 19-25]. However, their design mainly focused on accelerating the garbage collection function with exploiting prefetch techniques in the middleware layer, which is for a different application domain comparing with ours. Ro et al. [26, 27] presented a decoupled architecture design to remedy the limits of traditional data prefetching methods, improving the overall memory access latency. Liu et al. [1] explored and presented implementation techniques for energy-efficient hardware acceleration of RSA and Blowfish cryptography. They were able to reduce the energy consumption by 9.6% for RSA and 36.0% for the Blowfish cryptographic algorithms, separately. However, their approach is based on co-processor design on an FPGA platform. This could incur extra hardware overhead and power consumption overhead, while in this work we focus on future many-core architecture. Increasing the number of cores on a single chip can increase the computational speed and improve the energy consumption of the system. Using techniques to increase the energy efficiency in many-core systems will reduce energy consumption and excess heat, lower operational costs and improve system reliability [2]. The emergence of many-core systems provides the opportunity to revisit the realization of high-impact computing problems on more capable hardware [16]. There are major benefits from parallel computing on many- core platforms. The advantage of such a system lies in its ability to handle large and extremely complex computations. However, the big challenge is not only developing powerful many-core hardware architecture, but also developing applications that could effectively run in parallel and take advantage of the capabilities offered by many-core architectures. The idea which serves as our motivation for this paper is to examine if complicated cryptographic algorithms could split up their tasks to run in parallel successfully, so that they achieve faster execution and less energy consumption. In this paper, using the message- passing data type (MPDP) on a many-core platform, we High-Performance Implementation and Evaluation of Blowfish Cryptographic Algorithm on Single-Chip Cloud Computer: A Pipelined Approach Kamak Ebadi Victor Pena Department of Electrical and Computer Engineering Florida International University Miami, Florida, USA E-mail: {kebad001, vpena005}@fiu.edu Chen Liu Department of Electrical and Computer Engineering Clarkson University Potsdam, New York, USA E-mail: cliu@clarkson.edu