IEEE COMMUNICATIONS LETTERS, VOL. 8, NO. 3, MARCH 2004 183
Packet Processing Acceleration With a 3-Stage
Programmable Pipeline Engine
I. Papaefstathiou, K. Vlachos, N. Nikolaou, N. Zervos, and V. B. Lawrence, Member, IEEE
Abstract—In this letter, we present the architecture and imple-
mentation of a novel, 3-stage processing engine, suitable for deep
packet processing in high-speed networks. The engine, which has
been fabricated as part of a network processor, comprises of a typ-
ical RISC core and programmable hardware. To assess the perfor-
mance of the engine, experiments with packets of various lengths
have been performed and compared against the IXP1200 network
processor. The comparison has revealed that for the case study
shown in this letter, the proposed packet-processing engine is up
to three times faster. Moreover, the engine is simple to be fabri-
cated, less expensive than the corresponding hardware cores of
IXP1200 and can be easily programmed for different networking
applications.
Index Terms—ASIC, Network Processor, Special Purpose
Processor.
I. INTRODUCTION
W
ITH the rapid growth of Internet traffic and the
increasing line rates, the execution of the various
networking tasks is increasingly considered to be the main
bottleneck for communications. To meet the stringent pro-
cessing demands, designers are faced with two alternatives:
either create a custom hardware solution (ASIC) or use a
special purpose processor, called network processor (NP).
The ASIC approach can achieve the desired speeds, but it is
inflexible, since changes in the functionality are very limited
or not permitted at all. However, since protocols continue to
evolve, accommodating new features that comply with the
latest standards is of significant importance. To this respect,
NPs can provide the required flexibility and programmability.
In this letter, we present a flexible and programmable engine
that can sustain wire speed protocol processing, even for
complex and high demanding networking tasks. The design
can be easily embedded in any networking environment (i.e.,
both ASICs and NPs). It combines a typical RISC core [1]
with custom-made, fully programmable hardware in a 3-stage
pipeline module. In this way, the efficiency of a typical CPU
is enhanced by providing the means to tailor its circuits for
special tasks and, reversely, the application diversity of highly
optimized hardware is significantly broadened. Using this
engine, that incorporates a low cost and simple general purpose
Manuscript received June 11, 2003. The associate editor coordinating the re-
view of this letter and approving it for publication was Prof. K. Park.
I. Papaefstathiou, N. Nikolaou, and N. Zervos are with Ellemedia Technolo-
gies, Athens GR17121, Greece (e-mail: yanni@ellemedia.com; nikolaou@
ellemedia.com; nzervos@ellemedia.com).
K. Vlachos was with Bell Laboratories Advance Technology EMEA, Lucent
Technologies, 1200BD Hilversum, The Netherlands.
V. B. Lawrence is with Bell Labs, Lucent Technologies,Holmdel, NJ 07733
USA (e-mail: vbl@lucent.com).
Digital Object Identifier 10.1109/LCOMM.2004.823427
Fig. 1. Programmable processing functional model and block diagram.
RISC, at 200 MHz, we were able to sustain stateful inspection
firewall processing and Network Address Translation (NAT)
for 2.5 Gb/s TCP/IP traffic.
II. THE PROGRAMMABLE PROCESSING ENGINE—PPE
The Programmable Processing Engine (PPE) (see Fig. 1) is a
3-stage pipeline module, consisting of three logical sub-units:
a Field Extractor (FEX) unit, a typical RISC core and a Field
Modification unit (FMO). More particularly, programmable
hardware is commissioned to extract fields from incoming
packets and feed them to the processing core. After the fields’
processing in the RISC core, FMO updates, in a programmable
manner, specific fields of the packet. Additionally, an I/O data
controller is used to relieve the processing core from I/O duties
and free available resources for real processing.
The Field Extraction operation is controlled by microcode,
stored in an internal SRAM. The instruction set comprises of
simple and generic instructions that operate over data stored
in a FIFO of 32-bit words. The FEX instruction set supports
the following operations: 1) variable length (1 to 32 bits) field
extraction; 2) backward/forward movement in the data FIFO;
3) conditional jumps; and 4) addition. FEX instructions are flex-
ible enough to allow conditional branches based on the content
of extracted filed (e.g., protocol field of the IP header), as well as
parsing of protocol headers based on header and packet length
information (e.g., FEX can be easily programmed to recognize
and extract or skip IP and TCP options). The execution time of
the field extraction operation is constant and does not depend on
the number of extracted bits (only on the number of the fields
extracted).
Packet processing is initiated by a packet arrival at the FEX
input interface. After the field extraction, the I/O Data Con-
troller places the extracted fields directly to the register file of
the RISC core. In this way the RISC performance is significantly
enhanced, as I/O operations are performed in parallel with the
1089-7798/04$20.00 © 2004 IEEE