Tuan D. A. Nguyen DRAFT BBFex: A Flexible and Efficient FPGA-based Pattern Matching Engine for Large Database Tuan D. A. Nguyen, Bui Trung Hieu, and Tran Ngoc Thinh Vietnam National University - Ho Chi Minh City University of Technology 268 Ly Thuong Kiet Street, Ho Chi Minh City, Vietnam Abstract. There are many Network Intrusion Detection Systems (NIDS) devel- oped on FPGA-based platform with various pattern matching algorithms to accel- erate the speed and to increase the accuracy in inspecting packets as compared to software-based systems. Nevertheless, those systems only support a small num- ber of short patterns and are not appropriate to work with a significant number of long patterns such as ones in Clam Antivirus database. In this paper, we propose Bloom-Bloomier Filter extension (BBFex) as a practical pattern matching engine which is designed to work specially with large various-length pattern database. The Pattern Analyzer is developed to examine and to split each long pattern into two overlapping fragments then apply Bloom-Bloomier Filter, a combination of Bloom Filter and Bloomier Filter, on those newly created fragments to reduce the off-chip memory access rate to less than 5x as compared to the previous similar work. This paper also suggests an efficient technique to merge fragments into one original pattern in at most 2 phases in case of processing shared fragments. BBFex can recognize nearly 84,000 static patterns in Clam Antivirus database while retaining low on-chip memory utilization, approximately 0.4 bits per char- acter, doing exact comparison process to avoid false positive at hardware level and keeping adequate processing throughput of 1.36 Gbps. Furthermore, BBFex is not limited to Clam Antivirus database since its architecture is constructed in respect to general character-based database. BBFex also has the flexibility in up- dating or changing database without reconfiguration because of the advantage of hashing method. Keywords: Bloom Filter, Bloomier Filter, Clam Antivirus, Hashing, Long Pat- tern, Large Database, Pattern Matching 1 Introduction The more popular Internet becomes, the more insecure it is. Despite of many advan- tages and improvements in software-based antivirus programs, these applications can- not catch up with the high-speed gigabits networks nowadays yet. Furthermore, the pattern matching process, which examines the occurrence of known virus patterns in file content, consumes a noticeable amount of time as well as system resources due to the increasing number of viruses. This limitation leads to the demand of improv- ing performance in pattern matching process by utilizing hardware-based technology. FPGA is the most chosen platform along other researches in this field such as Appli- cation Specific Integrated Circuit (ASIC), Content Addressable Memory (CAM) and