Scalable High Throughput and Power Efficient IP-Lookup on FPGA Hoang Le and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, USA {hoangle, prasanna}@usc.edu Abstract Most high-speed Internet Protocol (IP) lookup imple- mentations use tree traversal and pipelining. Due to the available on-chip memory and the number of I/O pins of Field Programmable Gate Arrays (FPGAs), state-of-the- art designs cannot support the current largest routing table (consisting of 257K prefixes in backbone routers). We pro- pose a novel scalable high-throughput, low-power SRAM- based linear pipeline architecture for IP lookup. Using a single FPGA, the proposed architecture can support the current largest routing table, or even larger tables of up to 400K prefixes. Our architecture can also be easily par- titioned, so as to use external SRAM to handle even larger routing tables (up to 1.7M prefixes). Our implementation shows a high throughput (340 mega lookups per second or 109 Gbps), even when external SRAM is used. The use of SRAM (instead of TCAM) leads to an order of magnitude re- duction in power dissipation. Additionally, the architecture supports power saving by allowing only a portion of the memory to be active on each memory access. Our design also maintains packet input order and supports in-place non-blocking route updates. 1 Introduction 1.1 Internet Protocol Packet Forwarding With the rapid growth of the Internet, IP packet forward- ing, or simply IP lookup, becomes the bottle-neck in net- work traffic management. Therefore, the design of high speed IP routers has been a major area of research. Ad- vances in optical networking technology are pushing link rates in high speed IP routers beyond OC-768 (40 Gbps). Such high rates demand that packet forwarding in IP routers must be performed in hardware. For instance, a 40 Gbps link requires a throughput of 125 million packets per sec- Supported by the U.S. National Science Foundation under grant No. CCF-0702784. Equipment grant from Xilinx is gratefully acknowledged. Table 1: Comparison of TCAM and SRAM TCAM(18Mb) SRAM(18Mb) Maximum clock rate (MHz) 266 400 Cell size (# of transistors/bit) 16 6 Power consumption (Watts) 12 15 1 ond (MPPS), for a minimum size (40-byte) packet. Such throughput is impossible to achieve using existing software- based solutions [1]. IP lookup is a classic problem. Most hardware-based solutions in network routers fall into two main categories: TCAM-based and dynamic/static random access memory (DRAM/SRAM)-based solutions. Although TCAM-based engines can retrieve results in just one clock cycle, their throughput is limited by the relatively low speed of TCAMs. They are expensive, power-hungry, and offer little adapt- ability to new addressing and routing protocols [7]. As shown in Table 1, SRAMs outperform TCAMs with respect to speed, density, and power consumption [2, 3, 4, 5, 6]. SRAM-based solutions, on the other hand, require mul- tiple cycles to process a packet. Therefore, pipelining techniques are commonly used to improve the throughput. These SRAM-based approaches, however, result in an inef- ficient memory utilization. This inefficiency limits the size of the supported routing tables. In addition, it is not fea- sible to use external SRAM in these architectures, due to the constraint on the number of I/O pins. This constraint restricts the number of external stages, while the amount of on-chip memory confines the size of memory for each pipeline stage. Due to these two constraints, state-of-the-art SRAM-based solutions do not scale to support larger rout- ing tables. This scalability has been a dominant issue for any implementations on FPGAs. Furthermore, pipelined architectures increase the total number of memory accesses per clock cycle, and thus, increase the dynamic power con- sumption. The power dissipation in the memory dominates that in the logic [8, 9, 10]. Therefore, reducing memory power dissipation contributes to a large reduction in the to- 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines 978-0-7695-3716-0/09 $25.00 © 2009 IEEE DOI 10.1109/FCCM.2009.42 167 Authorized licensed use limited to: National Cheng Kung University. Downloaded on January 25, 2010 at 21:10 from IEEE Xplore. Restrictions apply.