A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classiﬁcation Weirong Jiang Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089, USA Email: weirongj@usc.edu Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089, USA Email: prasanna@usc.edu Abstract—Multi-ﬁeld packet classiﬁcation is a critical func- tion that enables network routers to support a variety of applications such as ﬁrewall processing, Quality of Service differentiation, trafﬁc billing, and other value added services. Explosive growth of Internet trafﬁc requires the future packet classiﬁers be implemented in hardware. However, most of the existing packet classiﬁcation algorithms need large amount of memory, which inhibits efﬁcient hardware implementations. This paper exploits the modern FPGA technology and presents a partitioning-based parallel architecture for scalable and high-speed packet classiﬁcation. We propose a coarse-grained independent sets algorithm and then combine it seamlessly with the cross-producting scheme. After partitioning the original rule set into several coarse-grained independent sets and applying the cross-producting scheme for the remaining rules, the memory requirement is dramatically reduced. Our FPGA implementation results show that our architecture can store 10K real-life rules in a single state-of-the-art FPGA while consuming a small amount of on-chip resources. Post place and route results show that the design sustains 90 Gbps throughput for minimum size (40 bytes) packets, which is more than twice the current backbone network link rate. Keywords-FPGA; packet classiﬁcation; partitioning; I. I NTRODUCTION Evolution of the Internet demands next-generation routers to support a variety of network applications, such as ﬁrewall processing, Quality of Service (QoS) differentiation, virtual private networks, policy routing, trafﬁc billing, and other value added services. In order to provide these services, the router needs to classify the packets into different categories based on a set of predeﬁned rules, which specify the value ranges of the multiple ﬁelds in the packet header. Such a function is called multi-ﬁeld packet classiﬁcation. Due to the rapid growth of the network link rate, as well as the rule set size, multi-ﬁeld packet classiﬁcation has become one of the fundamental challenges to designing high speed routers. For example, the current link rate has been pushed beyond the OC-768 rate, i.e. 40 Gbps, which requires processing a packet every 8 ns in the worst case (where the packets are of minimum size i.e. 40 bytes). Such throughput is impossible to achieve using existing software-based solutions [1]. This work is supported by the United States National Science Foundation under grant No. CCF-0702784. Equipment grant from Xilinx Inc. is gratefully acknowledged. To meet the throughput requirement, recent research in this area seeks to combine algorithmic and architectural approaches, most of which are based on ternary content addressable memories (TCAMs) [2]–[4] or various hashing schemes such as Bloom Filters [5]–[7]. However, as shown in [8]–[10], TCAMs are not scalable with respect to clock rate, power consumption, or circuit area, compared to static random access memories (SRAMs). Most of TCAM-based solutions also suffer from range expansion when converting ranges into preﬁxes [3], [4]. Bloom Filters have become popular due to their O(1) time performance and low memory requirement. However, a secondary module is needed to resolve false positives inherent in Bloom Filters, which may be slow and can limit the overall performance [11]. On the other hand, FPGA technology has become an at- tractive option for implementing real-time network process- ing engines [4], [7], [12], due to its ability to reconﬁgure and massive parallelism. State-of-the-art SRAM-based FPGA devices such as Xilinx Virtex-5 [13] provide high clock rate and large amounts of on-chip dual-port memory with conﬁgurable word width. Some researchers have explored implementing existing packet classiﬁcation algorithms on FPGAs to achieve high throughput [4], [7], [12]. However, few of them can support large rule sets (e.g. more than 10K rules), due to their excessive memory requirement. Note that some partitioning-based packet classiﬁcation schemes which are recently proposed [14]–[16] can achieve much lower memory consumption, while none of them has been exploited for hardware implementation. The major challenge for mapping those schemes onto hardware is to bound the number of partitions which is nondeterministic (e.g. varying from 34 to 61 for different rule sets [14]) in original schemes. To address the challenge, this paper proposes a FPGA-based parallel architecture for scalable and high-throughput packet classiﬁcation. The paper makes following contributions. • Based on the idea of the Independent Sets [14], we propose a coarse-grained independent sets algorithm to reduce the number of partitions at the cost of increasing the number of linear search. Such extra cost is allevi- ated by pipelining the search process in hardware. • We combine the coarse-grained independent sets al-