A Comparative Study of Low-Power Techniques for Ternary CAMs Nitin Mohan and Manoj Sachdev Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada – N2L 3G1 nitinm@ieee.org Abstract—Ternary content addressable memories (TCAMs) are attractive for applications such as packet forwarding and classification in network routers. However, the high cost and power consumption are limiting their popularity and versatility. In this paper, we present a comparative study of the design techniques for low-power TCAMs. I. INTRODUCTION Content addressable memory (CAM) is an outgrowth of random access memory (RAM) technology. Unlike RAMs which access a word based on its address, CAMs access a word based on its contents. A CAM compares an incoming key with all the words in parallel, and returns the address of the “best” match. CAMs have been attractive for artificial intelligence (AI) applications and translation look-aside buffers (TLBs) in microprocessors. CAMs are also used for tag-comparison in cache memory, data compression, and radar signal tracking. Recent applications include real-time pattern matching in virus-detection and intrusion-detection systems, gene pattern searching in bioinformatics, and image processing. CAMs can perform fast and deterministic pattern-searches for large databases. A binary CAM stores and searches only ‘0’s and ‘1’s. Hence, its utility is limited to exact-match SEARCH operations. A ternary CAM (TCAM) can store and search an additional state, called “mask” or “don’t care”. Therefore, a TCAM can also perform partial matching. This partial-match feature makes TCAMs attractive for applications such as packet forwarding and classification in network routers. Increasing line rates, quality of service (QoS), and network security requirements demand routing tables with high-speed lookups. Moreover, an increasing number of Internet users and the migration of the Internet Protocol (IP) from IPv4 to IPv6 are further increasing the word-size and storage capacity of routing tables. Hence, current network routers require large-capacity TCAMs with high search speeds. Fig. 1 illustrates the conventional 16T static TCAM cell. It consists of two SRAM cells to store ternary states (‘0’, ‘1’, and “don’t care”). Transistors N1 through N4 form a bit-level comparison logic to compare the stored value with the corresponding bit of the search key. A TCAM word is implemented by connecting several TCAM cells in parallel (in a row). Similarly, a TCAM array is formed by connecting several TCAM words in parallel (in a column). Fig. 1: Conventional 16T static TCAM cell MLSAs TCAM Array 128 x 144 Block 0 MLSAs WL Decoder & Drivers Priority Encoder (PE) BLSAs & Drivers SL Drivers/Mask TCAM Array 128 x 144 Block 1 BLSAs & Drivers SL Drivers/Mask WL Decoder & Drivers Priority Encoder (PE) MLSAs TCAM Array 128 x 144 Block 2 MLSAs WL Decoder & Drivers Priority Encoder (PE) BLSAs & Drivers SL Drivers/Mask TCAM Array 128 x 144 Block 3 BLSAs & Drivers SL Drivers/Mask WL Decoder & Drivers Priority Encoder (PE) Inter-Block PE & Test Control Address Out <8:0> Data In <35:0> Data In <35:0> Address In <8:7> Decoder Block Enable <3:0> Address In <6:0> Address In <6:0> = Test Structures Legend: Fig. 2: Conventional 16T static TCAM cell A typical TCAM chip consists of three major parts: (i) TCAM arrays for ternary data storage, (ii) peripheral circuitry for READ, WRITE, and SEARCH operations, and (iii) test and repair circuitry for functional verification and yield improvement. The peripheral circuits include decoders, bit line sense amplifiers (BLSAs), search line (SL) drivers, match line sense amplifiers (MLSAs), and priority encoders (PEs). The test and repair circuitry includes on-chip test structures and redundancy. Fig. 2 shows a simplified block diagram of a 512 x 144 TCAM. The TCAM is implemented as four smaller TCAM arrays. Each row in a TCAM array stores a word. Within a word, a bit is located by its column number. All the TCAM cells in a row share a word line (WL) and a match line