GPU-based String Matching Method using Warp Shuffle Instructions for Service-oriented Routers Satoshi Koibuchi, Kazumasa Ikeuchi, Shinichi Ishida, Hiroaki Nishi Graduate School of Science and Technology, Keio University, Japan {satoshi, ikeuchi, sin}@west.sd.keio.ac.jp, west@sd.keio.ac.jp AbstractService-oriented Router (SoR), a new router architecture for providing useful Internet services that could not be given by a traditional router. As a service of SoR, to prevent a network intrusion in a network will become a significant service. To attain the service, we proposed SoR-Network Intrusion Detection System (SoR-NIDS) using deep packet inspection (DPI) in order to protect malicious streams on the router. Typical applications like this SoR-NIDS require an effective analysis mechanism of traffic information. Namely, a string matching function is an essential problem. Moreover, router architecture becomes more commoditized. It will be possible in the future to use GPUs for accelerating processing on routers. We propose a new GPU-based string matching design and efficient multistring matching function for multiple streams on a service-oriented router using warp shuffle (shfl) instructions to accelerate data stream analysis. The proposed method was evaluated, and the effectiveness of the method was confirmed. KeywordsService-oriented router; GPU; string matching; warp shuffle instructions; application layer analysis I. Introduction The Internet has become an indispensable communication tool, and the amount of Internet traffic is continually increasing. Accordingly, the threat of attacks on the Internet is also increasing. In particular, the number of attacks that exploit software vulnerabilities on client PCs is increasing. In general, vendors provide software patches for known vulnerabilities; however, the user is responsible for the installation of such patches, which can contribute to lack of security. In the clientserver network model, the administrator of each end-host has discretion over all security; thus, the security level depends on the discretion. Therefore, a new security system that does not depend on the security level of the end-host is required. In future, many sensors will be distributed around the world, and these sensors will not have sufficient battery power, processing power, and memory. In addition, it may be difficult to install antivirus software on such sensors. If these sensors are cracked, a new threat will be introduced. This will strengthen the need for antivirus functionality on the Internet. A router relays communication between end-hosts in a network. A typical router forwards a packet to the appropriate destination based on a routing table and the destination IP address contained in the packet. We propose a service-oriented router (SoR) as new router architecture. SoR reconstructs TCP streams in a router using the packet information of the plurality of fragments in a network by considering memory efficiency. Moreover, SoR can decode, extract, and analyze application layer information. In addition, based on the results of analysis, SoR can provide a new service. A network intrusion detection system (NIDS) will be one of the new services that SoR can provide, which we refer to as SoR-NIDS. It is possible to increase security of an end-host network by matching with a black list. In addition, NIDS can prevent potential threats and provide warnings to users. In general, NIDS searches for a signature represented as a string or regular expression to distinguish whether the target data can be permitted. SoR-NIDS can be provided to all Internet users. Similar to a general antivirus system, it is possible to provide robust security against new attack methods by updating the blacklist on SoR. However, a problem must be solved before realizing SoR- NIDS. First, wire-rate processing throughput must be achieved in the router. Second, intrusion detection processing must be realized for multiple streams. A large number of streams flow through a router; thus, it is necessary to achieve high throughput processing for multiple streams. As an existing method, dedicated hardware, such as network processors or FPGAs, have been studied to achieve high throughput [1][2]. However, to reflect recent backbone router trends, the use of conventional cost-effective devices will be a practical solution to achieve SoR-NIDS. In addition, it is preferable to implement programmability and to continue with architecture trends for Internet backbone routers, i.e., commodity devices. To achieve high-throughput processing of string matching functions, it is indispensable to parallelize the process. Graphics processing units (GPUs) [3] are used as co- processors to realize high throughput. A GPU has hundreds or thousands of processing cores on one semiconductor die. GPUs can process at high throughput by using these cores in parallel. In addition, a GPU has dedicated memory, and its bandwidth is several times high than main memory. This means that it can obtain higher processor-memory bandwidth than common processors if conditions are appropriate. In a general system, a GPU is connected through a peripheral component interconnect (PCI) interface that transfers data and instructions. Recently, router architecture