A PRACTICAL APPROACH FOR BUILDING A PARALLEL FIREWALL FOR TEN GIGABIT ETHERNET BACKBONE Kasom Koht-arsa Faculty of Engineering, Kasetsart University Bangkok, 10900 Thailand Kasom.K@ku.ac.th Surasak Sanguanpong Faculty of Engineering and Office of Computer Services, Kasetsart University Bangkok, 10900 Thailand Surasak.S@ku.ac.th Abstract- In a very high-speed network environment such as gigabit Ethernet network, firewalls that have to inspect and filter all flowing packets are reaching their limits. A firewall running on a single machine is potential bottleneck and cannot scale over certain thresholds, even if it has particular hardware built-in. Hence, parallel system appears as an alternative approach under this circumstance. This paper describes a design and implementation of parallel firewall architecture that is able to handle packets for high-speed network. The implementation utilizes arrays of Linux-based firewall under data parallel scheme running incorporate with specific ASIC switch. The load balancing mechanism, using hashing of disjoint subset, distributes the traffic among a configurable number of parallel machines, providing high performance with reliability, flexibility, and scalability. Implementation and measurements in a real network show that the proposed system is scalable to handle a data rate of 10 gigabit per second. Index Terms - Gigabit Ethernet, 10 Gbps, Load Distributor, Parallel, Firewalls, Security I. INTRODUCTION A network firewall is considered as a mandatory device used to filter network traffic according to the predefined set of rules. To guarantee a precise filtering the firewall must inspect packets at a wire speed. To handle the rapid growth in IP traffic, today's high-speed networks increasingly require firewalls that can operate at interface speeds as high as 10 gigabit per second (Gbps). Currently, Linux-based firewall running on general purpose processor has probably the most widespread implemented system and has continued to improve with a robust and supremely flexible. Linux-based firewalls provide low-cost, off- the-shelf, software based firewall solution up to the gigabit range. However, the situation is very different for 10 Gbps link. Since, the primary system bottleneck on a firewall is typically CPU. Network I/O can take up substantial amounts of CPU time because it might interrupt CPU every time a packet is transmitted or received. The overhead of handling numerous interrupts and context switching makes CPU unable to process packets at the 10 Gbps rate [1]. Moreover, modern firewall implementation is also incorporating Intrusion Detection System (IDS) and Intrusion Prevention System (IDS), improving security at the cost of increased computation. The capability of a conventional firewall can no longer meet the growing performance and security demands, resulting in lost packets and lower network throughput. One approach to building a high performance firewall is to offload the CPU using specific hardware component liked Field Programmable Gate Array (FPGA) and Network Processors (NP). Liberouter[2] is one of a well known project that utilizes a FPGA card to achieve firewall performance at 10 Gbps wire speed but is still not mature and lacks many functionality. Accardi et al., [3] demonstrated a network processor based Linux firewall which achieved performance nearly to 2 Gbps, but still cannot handle the full 10 Gbps traffic. Alternatively to build a single high-speed firewall, there are efforts on the performance improvement of firewall using the parallel or distributed approach. Benecke [4] proposed a parallel load-sharing packet processing architecture based on a broadcast enabled Ethernet switch. Fulp and Farley [6] instead suggested a function-based parallel architecture. Fulp [5] also proposed the parallel architecture with the simulation results. However, these architectures require that every machine in the system receives the full ten gigabit per second traffic thus expose it to the potential interruptions and context switching and therefore negating the parallel effect. For the need of network operation and management purposes, several campus backbones have upgraded their links to 10 Gbps and require a practical solution to handle this situation. Although several approaches are presented in the literature, but to the best of our knowledge, none are available as a best practice for running firewall in 10 Gbps link in real network environment. This paper describes a practical approach for building a scalable parallel firewall for handling network traffic in 10 Gbps Ethernet environments. The design shows how to spilt 10 Gbps traffic load over multiple 1 Gbps links for an array of firewalls. A proprietary ASIC monitoring switch is adapted for distributing packets using disjoint subset technique across Linux firewalls running general purpose processors. Specific interconnections and configurations between firewalls and the ASIC switch as well as additional Ethernet switch for both half-duplex and full-duplex mode are explained. The remainder of this paper is organized as follows: Section II describes the design methodology for load splitting; Section III gives detailed implementation of the overall system. Section IV discusses the performance and load balance characteristics of the system. Finally, Section V gives the conclusion. 978-1-4244-1817-6/08/$25.00 ©2008 IEEE ICCST 2008 331