Journal of High Speed Networks 24 (2018) 89–106 89
DOI 10.3233/JHS-180583
IOS Press
High-performance multi/many-core
architectures with shared and private queues:
Network processing approaches
Reza Falamarzi
a
, Bahram Bahrambeigy
b
, Mahmood Ahmadi
a,∗
and Amir Rajabzadeh
a
a
Computer Engineering and Information Technology Department, Razi University, Kermanshah, Iran
E-mails: rezafalamarzi@gmail.com, m.ahmadi@razi.ac.ir, rajabzadeh@razi.ac.ir
b
Information Technology Department, Islamic Azad University of Kermanshah, Kermanshah, Iran
E-mail: bahramwhh@gmail.com
Abstract. Software solutions are not effective to be used in network applications because of their low throughput. By employing hardware
implementation on FPGA, not only sufficient flexibility is achieved but also the throughput is increased considerably. In this paper, two multi-
core architectures are proposed for Bloom filter and CRC as two main network processing core functions. These architectures called multi-core
architecture with shared queue and multi-core architecture with private queue. The proposed architectures are implemented for 1, 2, 4, 8 and
16 cores. Experimental results show that multi-core architecture with private queue achieves higher throughput In comparison to the other one.
As compared to Bloom filter, CRC application leads to less computational load and consequently more throughput. Moreover, Bloom filter is
implemented on GPU and CPU and the results are compared with each other. When number of packets in GPU memory is 16384, the speedup
achieved by GPU implementations using CUDA is about 274 times compared with CPU implementations. However, FPGA results outperform
GPU, so that the throughput of the first architecture (shared queue) and second architecture (private queue) with 16 cores are almost 5.5 and
7.1 times higher than GPU throughput, respectively.
Keywords: Bloom filter, Cyclic Redundancy Check (CRC), Field-Programmable Gate Arrays (FPGA), Graphics Processing Unit (GPU), Multi-
core/many-core processors
1. Introduction
In recent years, the role of parallel architectures to achieve higher performance and speed-up in various programs
has become more and more important. According to Moore’s law, the number of transistors on a chip has become
double every 18 to 24 months. This growth in the number of transistors has increased the efficiency of on-chip
hardware. However, this law is just about hardware while development in software area is far behind than hardware
developments [20]. The purpose of advances in processor developments is not mainly because of optimizing the
serial performance of general-purpose processors. On the contrary, the parallelism and employing more cores in
processors are the main goals. This is due to the fact that increasing the frequency of a single-core CPU leads to
more power consumption which is the main reason of moving toward multi-core technology [6,29].
Multi-core processors are used in various aspects of computer field such as signal processing, image processing,
embedded systems, desktops and etc. These processors can provide the required processing power with a rea-
sonable power consumption level. Moreover, the Internet and other computer networks are growing rapidly. This
growth is because of increasing number of different services such as Firewalls, Quality of Service (QoS), Virtual
Private Networks (VPN) and network security. Therefore, the bigger number of users and services becomes, the
*
Corresponding author. E-mail: m.ahmadi@razi.ac.ir.
0926-6801/18/$35.00 © 2018 – IOS Press and the authors. All rights reserved