Wavelet Transforms Dedicated to Compress Recorded ENGs from Multichannel
Implants: Comparative Architectural Study
C. Dumortier, B. Gosselin and M. Sawan
Polystim Neurotechnologies Laboratory
Electrical Engineering Department, École Polytechnique de Montréal,
cyprien.dumortier@polymtl.ca
Abstract—Bandwidth of wireless multichannel neural re-
cording systems is one of the most significant limitation to in-
crease the number of channels monitored. Data compression is
being efficiently used to process multichannel recordings. This
paper explores Discrete Wavelet Transform (DWT) processor
architectures suited to compress ENGs and so, increase the
number of channels. Low power consumption, low silicon area
and specificity of multichannel neural recording systems are
considered for this investigation. Six architectures were im-
plemented and compared. All of them implement a 3 level
Daubechies-4 wavelet decomposition. This comparative study
allows to conclude that an excellent trade-off between power
consumption and silicon area is obtained through a DWT
polyphase structure using a careful balance of parallelism and
folding. Also, it arises that multiplexing several channels to-
ward a shared DWT processor provides the best savings for
both, power and area.
I. INTRODUCTION
Recording of Electroneurogram (ENG) signals from many
sites in the cortex is becoming a necessity for research in
neuroscience. Implantable wireless multichannel recording
devices have been recently proposed to replace the complex
apparatus needed in this type of neurophysiological experi-
mentations. The main purpose of these devices is to imple-
ment a system able to record from many sites and send the
recorded signals outside the body for off-chip treatment.
However, the main bottleneck faced by this application is the
limited bandwidth allowed by inductively coupled telemetry
links. For instance, a 100 channels device would presently be
able to transmit only 10 % of the recorded data. As men-
tioned in [1], spikes detection and compression performed
using Wavelet Transform (WT) is an interesting approach to
overcome this limitation. The design of a WT processor for
implantable devices presents two main challenges in term of
power consumption and integration area. Both have to be
minimized.
Several architectures allow to implement a Discrete
Wavelet Transform (DWT) [2-6]. However, most of the pub-
lished results are focusing on improving the working fre-
quency for image compression dedicated processors. More-
over, power consumption and silicon area of a digital proces-
sor are not only due to the implemented algorithm (i.e. the
arrangement of computations) but also to the routing com-
plexity, the sensitivity to quantization and the internal word
length. As a result, usual metrics including the number of
multipliers and adders, the operating frequency, gives only a
rough approximation of the real power consumption and area
of the architectures [7]. Consequently, an implementation
based comparative study is essential in order to find the
suited architecture for implantable multichannel recording
systems. Especially, multiplexing of several channels over
one DWT processor seem to be an interesting approach.
The remaining of this paper is organised as follow: section
2 introduces the one dimension DWT and reviews the main
existing architectures. Section 3 and 4 address the implemen-
tation and power evaluation of the most relevant architec-
tures. Finally, results are reported in section 5 and conclu-
sions are summarized in section 6.
II. DWT ALGORITHMS AND ARCHITECTURES
A one level DWT is defined by the following equations:
∑
- =
k
n k n x k g d ] 2 [ ] [
1 (1a)
] 2 [ ] [
1
k n x k h a
k
n - =
∑
(1b)
where x represents the recorded data, g et h are high-pass and
low-pass filters respectively, a
1
n
and d
1
n
are the DWT coeffi-
cients. The convolution between x and g gives the detail co-
efficients, d
1
n
, and the convolution between x and h, gives the
approximation coefficients, a
1
n
. Mallat shows that the ap-
proximation coefficients, a
j
n
, have to be input in the same
two channel filter bank, recursively, in order to perform a
multilevel wavelet decomposition as shown in Fig. 1 [8].
However, a straight forward circuit implementation of this
scheme leads to inefficiency (low hardware utilization). If
the decomposition level 1 is clocked by frequency f
0
, the
decomposition level j is clocked at frequency f
0
/2
j-1
. A care-
ful consideration of this decimation is necessary to design
efficient architectures.
Figure 1. Three-level decomposition DWT
Two different architectures enable to compute the DWT:
the convolution-based and the lifting-based. The first one
uses the two channels filter bank defined by equation (1)
directly (or its polyphase form). Weeks et al [2], reveals that
most convolution-based architectures try to minimize the
number of processing elements (PE), composed of multipli-
2129 ISCAS 2006 0-7803-9390-2/06/$20.00 ©2006 IEEE