IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. 11, NOVEMBER 1996 2865 Correspondence zyxw Analysis and Efficient Implementation of Partitioned Block LMS Adaptive Filters B. Farhang-Boroujeny Abstract-The frequency domain block LMS (FBLMS) algorithm used for efficient implementation of adaptive filters whose length may exceed a few thousands of taps is explored further by Asharif zyxwvutsrqp et al.. To reduce the latency of such filters, it has been proposed that the adaptive filter convolution sum be partitioned into a few smaller sums and then the FBLMS be applied. In this correspondence, we show that the scheme proposed by Asharif et al. suffers from a serious eigenvalue spread problem. We identify the source of this problem and propose a solution to that. I. INTRODUCTION Despite the great advancements in digital signal processing (DSP) hardware, engineers dealing with adaptive filters whose length ex- ceeds a few hundreds of taps may still face difficulties in imple- menting them on general-purpose DSP processors. For example, an acoustic echo canceler that works at a sampling frequency of 8 kHz needs 3200 taps to cover 400 ms of echoed speech that may appear in a normal size room [2]. Using the least mean square (LMS) algorithm [ l ] to implement such an echo canceler requires at least 102.1 zyxwvutsrqpon (= 4 x 3200 x 8000) Mega instructions per second. The current state-of-the-art DSP processors with an acceptable price to the industry are at least a few times slower than this figure. An effective solution that can significantly reduce the computa- tional complexity of such filters is block processing of the signal samples in frequency domain. The result, which is well understood, is the frequency domain block LMS (FBLMS) algorithm [3]. A major drawback of the FBLMS algorithm, in its original form as proposed by Ferrara [3], is that it introduces a significant delay in producing the filter output samples. Two solutions have been proposed in the literature for reducing the FBLMS delay. i) A trivial solution is to use a block length smaller than the filter length. An algorithm for efficient implementation of this scheme has been proposed in zyxwvutsrqpo [5]. ii) Asharif et al. [6] (see also [7]-[9] and further references in zyxwvutsrqp [7]) have proposed an FBLMS algorithm that partitions the time domain linear convolution of the adaptive filter into a few separate convolutional sums and implement each sum separately in the frequency domain. The resulting algorithm is called the multidelay block frequency domain adaptive filter in [8] and [9] and frequency bin adaptive filtering by Asharif et al. [6], [7]. We use the name partitioned FBLMS (PFBLMS) as we believe this name exactly reflects the principles that this algorithm is based on. Our emphasis is on the PFBLMS algorithm. We present an analysis of this algorithm and find that it suffers from a serious eigenvalue spread problem because of overlapping of successive blocks. This leads us to a simple solution for a better implementation Manuscript received May 25, 1995; revised April 3, 1996. The associate editor coordinating the review of this paper and approving it for publication was Dr. Akihiko Sugiyama. The author is with the Department of Electrical Engineering, National University of Singapore, Singapore 05 1 1. Publisher Item Identifier S 1053-587X(96)07 1 16-4. of the PFBLMS algorithm, which converges much faster than the implementation proposed in [6]-[9]. zyxw 11. PARTITIONED FBLMS AND ITS ANALYSIS We consider the implementation of an adaptive transversal filter with N = P . 114 ' taps, where P and M are two integers. The filter output y(rc) is related to .its input z(n) according to the equation P.h-1 r=O The convolution sum of ( 1) may be partitioned into P smaller size convolution sums as P-1 zyxwv d74 = yt(n) (2) 1=0 where z=o Application of this concept for implementation of the FBLMS algo- rithm gives the structure shown in Fig. 1 [6]. The thick lines in Fig. 1 represent vectors whose lengths are indicated beside them. Lee and Un [4] have developed a thorough analysis of the conventional FBLMS algorithm when no partitioning is applied. The main result there is that, under the condition that z(n) is a stationary and 717-dependent process, and ni << N. the various modes of convergence (the learning curve time constants) of the FBLMS algorithm, when step normalization is not applied, are determined by the eigenvalues of the diagonal matrix where L block length UK, X(k) FFT of the input data block. mean square of the ith element of X(k) Moreover, R will be replaced by (diag[B])-'R once the step- normalization is applied. Noting that for a diagonal R. (diag[R])- zy 'R is the identity matrix, we find the great advantage of the step- normalized FBLMS algorithm, and unlike the conventional LMS algorithm, it does not suffer from any serious eigenvalue spread problem. However, this fe,aturemay not remain once the partitioning is applied to reduce the latlzncy of the algorithm, as we shall see next. The analysis proposed in [4] may also be applied to the PFBLMS algorithm. If this is done,' one may find that the convergence of the step-normalized F'FBLMS algorithm is controlled by the eigenvalues of a block diagonal matrix whose ith diagonal block is (diag[R,])-'R,, where R, = E{~~(k)x:(k)}; . ' I denotes Herm- tian, k is block index, and :xz (k) is the column vector whose elements are the successive samples of the ith frequency bin of the input data, i.e., X,(k) = [X",,(k) xl,t(k.) . . . Xr - 1, zyxw L (k )IT ' This requires some effort. However, the procedure is very similar to the one given in [4], and, therefore, is omitted here. 1053-587X/96$05.00 zyxwvut 0 1996 IEEE