Dual low-power and crosstalk immune encoding scheme for on-chip data buses Z. Khan, A.T. Erdogan and T. Arslan A new encoding scheme is introduced for low-power and crosstalk immune communication of generic data on long parallel on-chip buses. The scheme uses a limited weight codebook with one-to-one data-to-code mapping. The new scheme and its implementation using a generic system-on-chip platform are described and results are provided that indicate a 30% saving in total switching activity with an 8-bit communication example. Introduction: The scaling of CMOS technology to ultra-deep sub- micrometre causes the coupled switched capacitance to dominate the wire-to-substrate capacitance. This coupled capacitance in turn will lead to crosstalk noise, which is a potential source for delay faults, logical malfunctions and energy consumption in on-chip communica- tion. Techniques based on bus invert [1] and the coupling driven bus invert (CBI) [2] are among those commonly used in the literature for reducing self and coupled switched capacitance for generic data of unknown probabilistic information. However, none of the techniques in the literature is capable of both eliminating worst crosstalk coupling and reducing power for such data. Worst crosstalk coupling is classified into three types namely type-4, type-3 and type-2 [4]. In type-4, three adjacent wires undergo opposite state transition (e.g. 101-to-010). In type-3, two adjacent wires undergo opposite state transition while the third maintains its previous state (e.g. 101-to-110) and in type-2, the centre wire is in opposite state transition with one adjacent wire while the other adjacent wire undergoes the same transition as the centre wire (e.g. 001-to-110). The coupled switched capacitance becomes 4 * C I ,3* C I and 2 * C I in type-4, type-3 and type-2 crosstalk, respectively, with C I being the coupled capacitance when only one wire changes state (type-1 crosstalk). The method proposed here eliminates worst crosstalk coupling and reduces self transitions for generic data in which probabilistic information cannot be known a priori. Method description and implementation: Spatially redundant limited weight code (LWC) has Hamming weight less than the data it repre- sents and is used in off-chip communication for energy efficiency [3]. The concept of the limited weight code can be exploited for the case of on-chip communication to achieve two performance goals. The first is worst crosstalk coupling elimination and the second is energy efficiency. The flow graph of the codec scheme is shown in Fig. 1. The encoder performs the encoding in three stages. The first stage performs modulo-2 addition of the present and previous raw m-bit data ( y m (n) ¼ x m (n) ! x m (n 1)). The second stage maps the m-bit pattern with a p-bit LWC, i.e. (L p (n) ¼ m=2-LWCCode- book( y m (n)). This stage consists of a semi-perfect m=2-limited weight codebook of size 2 m p bits. The codebook has been devel- oped with emphasis to eliminate the worst crosstalk coupling. The organisation of the p-bit limited weight codewords is done in such a way that if two codewords of Hamming weight greater than zero are summed up using modulo-2 addition, the resulting pattern eliminates type-4, type-3 and type-2 crosstalk coupling discussed in [4] with a modest increase in type-1 crosstalk. However, type-1 is the least severe crosstalk coupling. The modulo-2 addition is performed in the third stage on the pattern already present on the bus (D p (n 1)) and the corresponding codeword L p (n) to be transmitted on the bus (D p (n) ¼ L p (n) ! D p (n 1)). The resulting pattern D p (n) from the modulo-2 adder not only is immune to worst crosstalk coupling (type-4, type-3 and type-2) but also has less self switching activity. An m-to-p-bit expansion in the on-chip data bus occurs. The decoder is exactly a reverse replica of its encoder. In the first stage, present and previous bus states are modulo-2 summed up to recover the p-bit codeword L p (n), i.e. L p (n) ¼ D p (n) ! D p (n 1). The codeword is then applied to the m-bit LWC decodebook and the m-bit data pattern y m (n) is recovered ( y m (n) ¼ m-bitLWCDecodebook(L p (n)). This m-bit pattern is the result of the modulo-2 addition of the present and previous original m-bit data. The third stage then recovers the original m-bit raw data by performing modulo-2 addition of the present m-bit pattern y m (n) and the previous m-bit raw data x m (n 1), i.e. x m (n) ¼ y m (n) ! x m (n 1). The bandwidth m can be 4 or any integer multiple of 4. For 8-bit transfer, the 8-bit bus is partitioned into two 4-bit lanes. Each lane is encoded and decoded according to the proposed method using a 4-to-6-bit codebook and a 6-to-4-bit decode- book. Inter-lane worst crosstalk coupling is eliminated by a shield wire between the lanes. The 8-bit bus expands to 13 bits. The codec architecture has been tested on AMBA-AHB based generic SoC plat- form. The target is the AHB data bus. Fig. 1 Flow graph of the codec scheme Simulations and results: The codec architecture has been synthesised down to 0.18 mm CMOS technology. The resulting netlist is then simulated for an 8-bit biomedical data set. The 8-bit biomedical data is chosen as a sample example. Any data type of any transfer size can be chosen to prove the dual purpose of the encoding scheme. A test bench is developed to simulate the code of the AMBA-AHB bus and the proposed codec architecture using a Verilog-XL simulator. The number of self and coupled switched transitions is monitored at the output of the encoder. The total switching activity has been estimated using the approximate expression given below (1) that is derived using lumped model of the on-chip bus: N t ¼ N x þ l * ð4* N 4 þ 3* N 3 þ 2* N 2 þ 1* N 1 Þ ð1Þ where N t , N x , N 4 , N 3 , N 2 and N 1 are, respectively, total, self, type-4, type-3, type-2 and type-1 switching activity and l ¼ C t =C L is the ratio of coupling to wire-to-substrate capacitance and is 3.2 for 0.18 mm CMOS technology and a minimum distance between wires. The results are shown in Table 1. The total switching as given by (1) has also been ELECTRONICS LETTERS 2nd October 2003 Vol. 39 No. 20