IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 1, JANUARY 1999 75 A New Fast Block Adaptive Algorithm Kostas Berberidis and Sergios Theodoridis Abstract— This paper describes a novel efficient algorithm appropriate for adapting filters of long order. The scheme is an exact block processing counterpart of the recently introduced fast Newton transversal filtering algorithm. The filters required by the algorithm blocks are much smaller than the filter length, and the obtained estimates are mathematically equivalent to those of the sample-by-sample version. This leads to a substantial saving in computational complexity without sacrificing performance as well as not having to resort to long processing delays, which limit the performance of the adaptive system. I. INTRODUCTION A DAPTIVE filtering has been a major research discipline for many years due to its relevance with a number of application areas such as spectral analysis, control and system identification, channel equalization, echo cancellation, data communications, etc. [1]. Among the various issues, concerning the performance of an adaptive filtering algorithm, computational complexity is of paramount importance for real- time applications. The task becomes a critical issue in those applications where long filters with a few hundreds or even thousands of taps are involved. Acoustic echo cancellation is a typical application of the kind. Most of the existing recursive schemes (including the fast recursive least squares schemes [1, ch. 5] and even the least mean square algorithm in some cases) are disqualified from being used in such applications, with todays technology digital signal processors (DSP’s). To remedy the above problem, research has evolved mainly along three lines. One is to use infinite impulse response (IIR) filtering. However IIR filters pose potential instability problems and possibility of local minimum solutions [3]. The second line is to develop schemes where updates are performed not on the input sampling rate but at a fraction of it. Block adaptive algorithms, such as block least mean square (BLMS) [6]; frequency domain adaptive filters, either of the gradient type [3] or of the quasi-Newton type [8]; and subband implementation schemes [4], [5] belong to this category. In these schemes, reduction in the computations, which is required per input sample, is achieved due to the use of fast convolutional techniques and/or sampling rate reduction. However, their computational efficiency is usually optimized for block length equal to the filter length, which Manuscript received July 1, 1994; revised October 8, 1997. This work was supported by the Computer Technology Institute, Patras, Greece. The associate editor coordinating the review of this paper and approving it for publication was Dr. Pierre Duhamel. K. Berberidis is with the Department of Computer Engineering and Infor- matics, School of Engineering, University of Patras, Patras, Greece (e-mail: berberid@cti.gr). S. Theodoridis is with the Department of Informatics, University of Athens, Athens, Greece (e-mail: stheodor@di.uoa.gr). Publisher Item Identifier S 1053-587X(99)00170-1. implies that for large filters, a significant processing delay is introduced. Moreover in some cases, as, for instance, in BLMS, tracking ability as well as convergence speed is often sacrificed with respect to the standard LMS. An alternative solution to the block processing approach was introduced in [9]: the so-called fast exact LMS (FELMS). The filter taps are estimated in a manner that is mathematically equivalent to that of the conventional LMS. Thus, using clever internal fast computations, the LMS solution is obtained at a considerably reduced computational complexity. The block size , which is used to achieve computational reduction, can be much smaller than the filter’s order ; thus, the delay introduced is small for most practical applications. Exact block adaptive versions of the recursive least squares (RLS) and fast transversal filtering (FTF) algorithms have recently been developed in [10]–[12]. The key point in deriving the so-called fast subsampled updating (FSU) FTF algorithm in [10] is the interpretation of the FTF algorithm as a rotation applied to a set of filters. The resulting algorithm offers considerable computational saving, and for relatively long filters, its complexity is even lower than that of the step-by- step LMS. A third line followed to tackle the difficulties imposed by long filters was via schemes whose performance ranges between LMS and RLS with a corresponding tradeoff in complexity [1, ch. 5]. The fast Newton transversal filtering (FNTF) algorithm [13]–[15] belongs to this category of algo- rithms. Specifically, the FNTF algorithm exploits the fact that the most computationally thirsty part of fast RLS versions is the updating of the forward and backward predictors of the input time series, whose orders turn out to be equal to the filter order: a fact far from true in practice. Reversing the argument, the FNTF algorithm starts from a low-order prediction problem of order and extrapolates the gain vector (equivalently the autocorrelation matrix) from this low-order problem up to the filtering order using a saddle point (min- max) approach. The complexity of the FNTF is if the stabilized FTF (SFTF) algorithm [7], [1, ch. 5] is used in the prediction part. By varying from to , a class of algorithms results having normalized LMS on one end and fast RLS on the other. When the input time series can be adequately modeled as an autoregressive process of order , then taking , a significant reduction in complexity is achieved with practically no loss in performance with respect to RLS. The algorithm has been implemented on a DSP and used successfully for mobile radio echo cancellation [14]. However, for long filter lengths, the dependence of the complexity disqualifies the algorithm from being implemented in today’s range of DSP’s. 1053–587X/99$10.00 1999 IEEE