Scalable High Throughput and Power Efficient IP-Lookup on FPGA
∗
Hoang Le and Viktor K. Prasanna
Ming Hsieh Department of Electrical Engineering
University of Southern California
Los Angeles, USA
{hoangle, prasanna}@usc.edu
Abstract
Most high-speed Internet Protocol (IP) lookup imple-
mentations use tree traversal and pipelining. Due to the
available on-chip memory and the number of I/O pins of
Field Programmable Gate Arrays (FPGAs), state-of-the-
art designs cannot support the current largest routing table
(consisting of 257K prefixes in backbone routers). We pro-
pose a novel scalable high-throughput, low-power SRAM-
based linear pipeline architecture for IP lookup. Using a
single FPGA, the proposed architecture can support the
current largest routing table, or even larger tables of up
to 400K prefixes. Our architecture can also be easily par-
titioned, so as to use external SRAM to handle even larger
routing tables (up to 1.7M prefixes). Our implementation
shows a high throughput (340 mega lookups per second or
109 Gbps), even when external SRAM is used. The use of
SRAM (instead of TCAM) leads to an order of magnitude re-
duction in power dissipation. Additionally, the architecture
supports power saving by allowing only a portion of the
memory to be active on each memory access. Our design
also maintains packet input order and supports in-place
non-blocking route updates.
1 Introduction
1.1 Internet Protocol Packet Forwarding
With the rapid growth of the Internet, IP packet forward-
ing, or simply IP lookup, becomes the bottle-neck in net-
work traffic management. Therefore, the design of high
speed IP routers has been a major area of research. Ad-
vances in optical networking technology are pushing link
rates in high speed IP routers beyond OC-768 (40 Gbps).
Such high rates demand that packet forwarding in IP routers
must be performed in hardware. For instance, a 40 Gbps
link requires a throughput of 125 million packets per sec-
∗
Supported by the U.S. National Science Foundation under grant No.
CCF-0702784. Equipment grant from Xilinx is gratefully acknowledged.
Table 1: Comparison of TCAM and SRAM
TCAM(18Mb) SRAM(18Mb)
Maximum clock rate (MHz) 266 400
Cell size (# of transistors/bit) 16 6
Power consumption (Watts) 12 ∼ 15 ≈ 1
ond (MPPS), for a minimum size (40-byte) packet. Such
throughput is impossible to achieve using existing software-
based solutions [1].
IP lookup is a classic problem. Most hardware-based
solutions in network routers fall into two main categories:
TCAM-based and dynamic/static random access memory
(DRAM/SRAM)-based solutions. Although TCAM-based
engines can retrieve results in just one clock cycle, their
throughput is limited by the relatively low speed of TCAMs.
They are expensive, power-hungry, and offer little adapt-
ability to new addressing and routing protocols [7]. As
shown in Table 1, SRAMs outperform TCAMs with respect
to speed, density, and power consumption [2, 3, 4, 5, 6].
SRAM-based solutions, on the other hand, require mul-
tiple cycles to process a packet. Therefore, pipelining
techniques are commonly used to improve the throughput.
These SRAM-based approaches, however, result in an inef-
ficient memory utilization. This inefficiency limits the size
of the supported routing tables. In addition, it is not fea-
sible to use external SRAM in these architectures, due to
the constraint on the number of I/O pins. This constraint
restricts the number of external stages, while the amount
of on-chip memory confines the size of memory for each
pipeline stage. Due to these two constraints, state-of-the-art
SRAM-based solutions do not scale to support larger rout-
ing tables. This scalability has been a dominant issue for
any implementations on FPGAs. Furthermore, pipelined
architectures increase the total number of memory accesses
per clock cycle, and thus, increase the dynamic power con-
sumption. The power dissipation in the memory dominates
that in the logic [8, 9, 10]. Therefore, reducing memory
power dissipation contributes to a large reduction in the to-
2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
978-0-7695-3716-0/09 $25.00 © 2009 IEEE
DOI 10.1109/FCCM.2009.42
167
Authorized licensed use limited to: National Cheng Kung University. Downloaded on January 25, 2010 at 21:10 from IEEE Xplore. Restrictions apply.