Wavelet Transforms Dedicated to Compress Recorded ENGs from Multichannel Implants: Comparative Architectural Study C. Dumortier, B. Gosselin and M. Sawan Polystim Neurotechnologies Laboratory Electrical Engineering Department, École Polytechnique de Montréal, cyprien.dumortier@polymtl.ca Abstract—Bandwidth of wireless multichannel neural re- cording systems is one of the most significant limitation to in- crease the number of channels monitored. Data compression is being efficiently used to process multichannel recordings. This paper explores Discrete Wavelet Transform (DWT) processor architectures suited to compress ENGs and so, increase the number of channels. Low power consumption, low silicon area and specificity of multichannel neural recording systems are considered for this investigation. Six architectures were im- plemented and compared. All of them implement a 3 level Daubechies-4 wavelet decomposition. This comparative study allows to conclude that an excellent trade-off between power consumption and silicon area is obtained through a DWT polyphase structure using a careful balance of parallelism and folding. Also, it arises that multiplexing several channels to- ward a shared DWT processor provides the best savings for both, power and area. I. INTRODUCTION Recording of Electroneurogram (ENG) signals from many sites in the cortex is becoming a necessity for research in neuroscience. Implantable wireless multichannel recording devices have been recently proposed to replace the complex apparatus needed in this type of neurophysiological experi- mentations. The main purpose of these devices is to imple- ment a system able to record from many sites and send the recorded signals outside the body for off-chip treatment. However, the main bottleneck faced by this application is the limited bandwidth allowed by inductively coupled telemetry links. For instance, a 100 channels device would presently be able to transmit only 10 % of the recorded data. As men- tioned in [1], spikes detection and compression performed using Wavelet Transform (WT) is an interesting approach to overcome this limitation. The design of a WT processor for implantable devices presents two main challenges in term of power consumption and integration area. Both have to be minimized. Several architectures allow to implement a Discrete Wavelet Transform (DWT) [2-6]. However, most of the pub- lished results are focusing on improving the working fre- quency for image compression dedicated processors. More- over, power consumption and silicon area of a digital proces- sor are not only due to the implemented algorithm (i.e. the arrangement of computations) but also to the routing com- plexity, the sensitivity to quantization and the internal word length. As a result, usual metrics including the number of multipliers and adders, the operating frequency, gives only a rough approximation of the real power consumption and area of the architectures [7]. Consequently, an implementation based comparative study is essential in order to find the suited architecture for implantable multichannel recording systems. Especially, multiplexing of several channels over one DWT processor seem to be an interesting approach. The remaining of this paper is organised as follow: section 2 introduces the one dimension DWT and reviews the main existing architectures. Section 3 and 4 address the implemen- tation and power evaluation of the most relevant architec- tures. Finally, results are reported in section 5 and conclu- sions are summarized in section 6. II. DWT ALGORITHMS AND ARCHITECTURES A one level DWT is defined by the following equations: - = k n k n x k g d ] 2 [ ] [ 1 (1a) ] 2 [ ] [ 1 k n x k h a k n - = (1b) where x represents the recorded data, g et h are high-pass and low-pass filters respectively, a 1 n and d 1 n are the DWT coeffi- cients. The convolution between x and g gives the detail co- efficients, d 1 n , and the convolution between x and h, gives the approximation coefficients, a 1 n . Mallat shows that the ap- proximation coefficients, a j n , have to be input in the same two channel filter bank, recursively, in order to perform a multilevel wavelet decomposition as shown in Fig. 1 [8]. However, a straight forward circuit implementation of this scheme leads to inefficiency (low hardware utilization). If the decomposition level 1 is clocked by frequency f 0 , the decomposition level j is clocked at frequency f 0 /2 j-1 . A care- ful consideration of this decimation is necessary to design efficient architectures. Figure 1. Three-level decomposition DWT Two different architectures enable to compute the DWT: the convolution-based and the lifting-based. The first one uses the two channels filter bank defined by equation (1) directly (or its polyphase form). Weeks et al [2], reveals that most convolution-based architectures try to minimize the number of processing elements (PE), composed of multipli- 2129 ISCAS 2006 0-7803-9390-2/06/$20.00 ©2006 IEEE