Towards High-performance Flow-level Packet Processing on Multi-core Network Processors Yaxuan Qi 1, 3 , Bo Xu 1, 2 , Fei He 1, 2 , Baohua Yang 1, 2 , Jianming Yu 1, 2 and Jun Li 1, 3 1 Research Institute of Information Technology, Tsinghua University 2 Department of Automation, Tsinghua University 3 Tsinghua National Lab for Information Science and Technology {yaxuan, junl}@tsinghua.edu.cn ABSTRACT There is a growing interest in designing high-performance network devices to perform packet processing at flow level. Applications such as stateful access control, deep inspection and flow-based load balancing all require efficient flow-level packet processing. In this paper, we present a design of high-performance flow-level packet processing system based on multi-core network processors. Main contribution of this paper includes: a) A high performance flow classification algorithm optimized for network processors; b) An efficient flow state management scheme leveraging memory hierarchy to support large number of concurrent flows; c) Two hardware-optimized order-preserving strategies that preserve internal and external per-flow packet order. Experimental results show that: a) The proposed flow classification algorithm, AggreCuts, outperforms the well-known HiCuts algorithm in terms of classification rate and memory usage; b) The presented SigHash scheme can manage over 10M concurrent flow states on the Intel IXP2850 NP with extremely low collision rate; c) The performance of internal packet order-preserving scheme using SRAM queue-array is about 70% of that of external packet order- preserving scheme realized by ordered-thread execution. Categories and Subject Descriptors C.2.0 [Computer Communication Networks]: General Security and protection (e.g., firewalls); C.4 [Performance of Systems]: Design studies. General Terms Algorithms, Experimentation, Performance Keywords Classification, Hashing, Packet Order, Network Processor 1. INTRODUCTION The continual growth of network communication bandwidth and the increasing sophistication of types of network traffic processing have driven the need for designing high-performance network devices to perform packet processing at flow level. Applications such as stateful access control in firewalls, deep inspection in IDS/IPS, and flow-based scheduling in load balancers all require flow-level packet processing. Basic operations inherent to such applications include: Flow classification: Flow-level packet processing devices are required to classify packets into flows according to a classifier and process them differently. As the new demand for supporting multiple services (voice, video, and data) arises, the workload for such devices to perform fast flow classification becomes much heavier. Therefore, it is challenging to perform flow classification at line speed on these network devices. Flow state management: Per-flow states are maintained in order to correctly perform packet processing at a semantic level higher than the network layer. Such stateful analysis brings with it the core problem of state management: what hardware resources to allocate for holding states and how to efficiently access them. This is particularly the case for in-line devices, where flow state management can significantly affect the overall performance. Per-flow packet order-preserving: Another important requirement for networking devices is to preserve packet order. Typically, order-preserving is only required between packets on the same flow. Although there have been some order-preserving techniques for traditional switch architecture, flow-level packet order-preserving for parallel switches still remains as an open issue and motivates the research today. Traditionally flow-level packet processing devices rely on ASIC/FPGA to perform IP forwarding at line-rate speed (10Gbps) [12] [23] [24]. As the network processor (NP) emerges as a promising candidate for a networking building block, NP is expected to retain the same high performance as that of ASIC and to gain the time-to-market Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ANCS’07, December 3–4, 2007, Orlando, Florida, USA. Copyright 2007 ACM 978-1-59593-945-6/07/0012...$5.00. 17