A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification Weirong Jiang Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089, USA Email: weirongj@usc.edu Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089, USA Email: prasanna@usc.edu Abstract—Multi-field packet classification is a critical func- tion that enables network routers to support a variety of applications such as firewall processing, Quality of Service differentiation, traffic billing, and other value added services. Explosive growth of Internet traffic requires the future packet classifiers be implemented in hardware. However, most of the existing packet classification algorithms need large amount of memory, which inhibits efficient hardware implementations. This paper exploits the modern FPGA technology and presents a partitioning-based parallel architecture for scalable and high-speed packet classification. We propose a coarse-grained independent sets algorithm and then combine it seamlessly with the cross-producting scheme. After partitioning the original rule set into several coarse-grained independent sets and applying the cross-producting scheme for the remaining rules, the memory requirement is dramatically reduced. Our FPGA implementation results show that our architecture can store 10K real-life rules in a single state-of-the-art FPGA while consuming a small amount of on-chip resources. Post place and route results show that the design sustains 90 Gbps throughput for minimum size (40 bytes) packets, which is more than twice the current backbone network link rate. Keywords-FPGA; packet classification; partitioning; I. I NTRODUCTION Evolution of the Internet demands next-generation routers to support a variety of network applications, such as firewall processing, Quality of Service (QoS) differentiation, virtual private networks, policy routing, traffic billing, and other value added services. In order to provide these services, the router needs to classify the packets into different categories based on a set of predefined rules, which specify the value ranges of the multiple fields in the packet header. Such a function is called multi-field packet classification. Due to the rapid growth of the network link rate, as well as the rule set size, multi-field packet classification has become one of the fundamental challenges to designing high speed routers. For example, the current link rate has been pushed beyond the OC-768 rate, i.e. 40 Gbps, which requires processing a packet every 8 ns in the worst case (where the packets are of minimum size i.e. 40 bytes). Such throughput is impossible to achieve using existing software-based solutions [1]. This work is supported by the United States National Science Foundation under grant No. CCF-0702784. Equipment grant from Xilinx Inc. is gratefully acknowledged. To meet the throughput requirement, recent research in this area seeks to combine algorithmic and architectural approaches, most of which are based on ternary content addressable memories (TCAMs) [2]–[4] or various hashing schemes such as Bloom Filters [5]–[7]. However, as shown in [8]–[10], TCAMs are not scalable with respect to clock rate, power consumption, or circuit area, compared to static random access memories (SRAMs). Most of TCAM-based solutions also suffer from range expansion when converting ranges into prefixes [3], [4]. Bloom Filters have become popular due to their O(1) time performance and low memory requirement. However, a secondary module is needed to resolve false positives inherent in Bloom Filters, which may be slow and can limit the overall performance [11]. On the other hand, FPGA technology has become an at- tractive option for implementing real-time network process- ing engines [4], [7], [12], due to its ability to reconfigure and massive parallelism. State-of-the-art SRAM-based FPGA devices such as Xilinx Virtex-5 [13] provide high clock rate and large amounts of on-chip dual-port memory with configurable word width. Some researchers have explored implementing existing packet classification algorithms on FPGAs to achieve high throughput [4], [7], [12]. However, few of them can support large rule sets (e.g. more than 10K rules), due to their excessive memory requirement. Note that some partitioning-based packet classification schemes which are recently proposed [14]–[16] can achieve much lower memory consumption, while none of them has been exploited for hardware implementation. The major challenge for mapping those schemes onto hardware is to bound the number of partitions which is nondeterministic (e.g. varying from 34 to 61 for different rule sets [14]) in original schemes. To address the challenge, this paper proposes a FPGA-based parallel architecture for scalable and high-throughput packet classification. The paper makes following contributions. • Based on the idea of the Independent Sets [14], we propose a coarse-grained independent sets algorithm to reduce the number of partitions at the cost of increasing the number of linear search. Such extra cost is allevi- ated by pipelining the search process in hardware. • We combine the coarse-grained independent sets al-