Journal of High Speed Networks 24 (2018) 89–106 89 DOI 10.3233/JHS-180583 IOS Press High-performance multi/many-core architectures with shared and private queues: Network processing approaches Reza Falamarzi a , Bahram Bahrambeigy b , Mahmood Ahmadi a, and Amir Rajabzadeh a a Computer Engineering and Information Technology Department, Razi University, Kermanshah, Iran E-mails: rezafalamarzi@gmail.com, m.ahmadi@razi.ac.ir, rajabzadeh@razi.ac.ir b Information Technology Department, Islamic Azad University of Kermanshah, Kermanshah, Iran E-mail: bahramwhh@gmail.com Abstract. Software solutions are not effective to be used in network applications because of their low throughput. By employing hardware implementation on FPGA, not only sufficient flexibility is achieved but also the throughput is increased considerably. In this paper, two multi- core architectures are proposed for Bloom filter and CRC as two main network processing core functions. These architectures called multi-core architecture with shared queue and multi-core architecture with private queue. The proposed architectures are implemented for 1, 2, 4, 8 and 16 cores. Experimental results show that multi-core architecture with private queue achieves higher throughput In comparison to the other one. As compared to Bloom filter, CRC application leads to less computational load and consequently more throughput. Moreover, Bloom filter is implemented on GPU and CPU and the results are compared with each other. When number of packets in GPU memory is 16384, the speedup achieved by GPU implementations using CUDA is about 274 times compared with CPU implementations. However, FPGA results outperform GPU, so that the throughput of the first architecture (shared queue) and second architecture (private queue) with 16 cores are almost 5.5 and 7.1 times higher than GPU throughput, respectively. Keywords: Bloom filter, Cyclic Redundancy Check (CRC), Field-Programmable Gate Arrays (FPGA), Graphics Processing Unit (GPU), Multi- core/many-core processors 1. Introduction In recent years, the role of parallel architectures to achieve higher performance and speed-up in various programs has become more and more important. According to Moore’s law, the number of transistors on a chip has become double every 18 to 24 months. This growth in the number of transistors has increased the efficiency of on-chip hardware. However, this law is just about hardware while development in software area is far behind than hardware developments [20]. The purpose of advances in processor developments is not mainly because of optimizing the serial performance of general-purpose processors. On the contrary, the parallelism and employing more cores in processors are the main goals. This is due to the fact that increasing the frequency of a single-core CPU leads to more power consumption which is the main reason of moving toward multi-core technology [6,29]. Multi-core processors are used in various aspects of computer field such as signal processing, image processing, embedded systems, desktops and etc. These processors can provide the required processing power with a rea- sonable power consumption level. Moreover, the Internet and other computer networks are growing rapidly. This growth is because of increasing number of different services such as Firewalls, Quality of Service (QoS), Virtual Private Networks (VPN) and network security. Therefore, the bigger number of users and services becomes, the * Corresponding author. E-mail: m.ahmadi@razi.ac.ir. 0926-6801/18/$35.00 © 2018 – IOS Press and the authors. All rights reserved