IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 1, JANUARY 1999 75 A New Fast Block Adaptive Algorithm Kostas Berberidis and Sergios Theodoridis Abstract— This paper describes a novel efﬁcient algorithm appropriate for adapting ﬁlters of long order. The scheme is an exact block processing counterpart of the recently introduced fast Newton transversal ﬁltering algorithm. The ﬁlters required by the algorithm blocks are much smaller than the ﬁlter length, and the obtained estimates are mathematically equivalent to those of the sample-by-sample version. This leads to a substantial saving in computational complexity without sacriﬁcing performance as well as not having to resort to long processing delays, which limit the performance of the adaptive system. I. INTRODUCTION A DAPTIVE ﬁltering has been a major research discipline for many years due to its relevance with a number of application areas such as spectral analysis, control and system identiﬁcation, channel equalization, echo cancellation, data communications, etc. [1]. Among the various issues, concerning the performance of an adaptive ﬁltering algorithm, computational complexity is of paramount importance for real- time applications. The task becomes a critical issue in those applications where long ﬁlters with a few hundreds or even thousands of taps are involved. Acoustic echo cancellation is a typical application of the kind. Most of the existing recursive schemes (including the fast recursive least squares schemes [1, ch. 5] and even the least mean square algorithm in some cases) are disqualiﬁed from being used in such applications, with todays technology digital signal processors (DSP’s). To remedy the above problem, research has evolved mainly along three lines. One is to use inﬁnite impulse response (IIR) ﬁltering. However IIR ﬁlters pose potential instability problems and possibility of local minimum solutions [3]. The second line is to develop schemes where updates are performed not on the input sampling rate but at a fraction of it. Block adaptive algorithms, such as block least mean square (BLMS) [6]; frequency domain adaptive ﬁlters, either of the gradient type [3] or of the quasi-Newton type [8]; and subband implementation schemes [4], [5] belong to this category. In these schemes, reduction in the computations, which is required per input sample, is achieved due to the use of fast convolutional techniques and/or sampling rate reduction. However, their computational efﬁciency is usually optimized for block length equal to the ﬁlter length, which Manuscript received July 1, 1994; revised October 8, 1997. This work was supported by the Computer Technology Institute, Patras, Greece. The associate editor coordinating the review of this paper and approving it for publication was Dr. Pierre Duhamel. K. Berberidis is with the Department of Computer Engineering and Infor- matics, School of Engineering, University of Patras, Patras, Greece (e-mail: berberid@cti.gr). S. Theodoridis is with the Department of Informatics, University of Athens, Athens, Greece (e-mail: stheodor@di.uoa.gr). Publisher Item Identiﬁer S 1053-587X(99)00170-1. implies that for large ﬁlters, a signiﬁcant processing delay is introduced. Moreover in some cases, as, for instance, in BLMS, tracking ability as well as convergence speed is often sacriﬁced with respect to the standard LMS. An alternative solution to the block processing approach was introduced in [9]: the so-called fast exact LMS (FELMS). The ﬁlter taps are estimated in a manner that is mathematically equivalent to that of the conventional LMS. Thus, using clever internal fast computations, the LMS solution is obtained at a considerably reduced computational complexity. The block size , which is used to achieve computational reduction, can be much smaller than the ﬁlter’s order ; thus, the delay introduced is small for most practical applications. Exact block adaptive versions of the recursive least squares (RLS) and fast transversal ﬁltering (FTF) algorithms have recently been developed in [10]–[12]. The key point in deriving the so-called fast subsampled updating (FSU) FTF algorithm in [10] is the interpretation of the FTF algorithm as a rotation applied to a set of ﬁlters. The resulting algorithm offers considerable computational saving, and for relatively long ﬁlters, its complexity is even lower than that of the step-by- step LMS. A third line followed to tackle the difﬁculties imposed by long ﬁlters was via schemes whose performance ranges between LMS and RLS with a corresponding tradeoff in complexity [1, ch. 5]. The fast Newton transversal ﬁltering (FNTF) algorithm [13]–[15] belongs to this category of algo- rithms. Speciﬁcally, the FNTF algorithm exploits the fact that the most computationally thirsty part of fast RLS versions is the updating of the forward and backward predictors of the input time series, whose orders turn out to be equal to the ﬁlter order: a fact far from true in practice. Reversing the argument, the FNTF algorithm starts from a low-order prediction problem of order and extrapolates the gain vector (equivalently the autocorrelation matrix) from this low-order problem up to the ﬁltering order using a saddle point (min- max) approach. The complexity of the FNTF is if the stabilized FTF (SFTF) algorithm [7], [1, ch. 5] is used in the prediction part. By varying from to , a class of algorithms results having normalized LMS on one end and fast RLS on the other. When the input time series can be adequately modeled as an autoregressive process of order , then taking , a signiﬁcant reduction in complexity is achieved with practically no loss in performance with respect to RLS. The algorithm has been implemented on a DSP and used successfully for mobile radio echo cancellation [14]. However, for long ﬁlter lengths, the dependence of the complexity disqualiﬁes the algorithm from being implemented in today’s range of DSP’s. 1053–587X/99$10.00  1999 IEEE