1 Efﬁcient VLSI Architecture for Variable Length Block LMS Adaptive Filter Basant Kumar Mohanty, Senior Member, IEEE and Sujit Kumar Patel Abstract In this paper, we made an analysis on computational complexity of block least mean square (BLMS) ﬁnite impulse response (FIR) ﬁlter and decompose the ﬁlter computation into M sub-ﬁlters, where M = N/L, N is the ﬁlter length and L is the block-size. Each sub-ﬁlter acts like a short-length BLMS FIR ﬁlter of size L. The proposed decomposition scheme favors time- multiplexing the ﬁltering computation and weight-increment term computation of each short-length ﬁlter. Using the proposed scheme, we have derived an efﬁcient architecture for BLMS FIR ﬁlter. The proposed structure can be reconﬁgured for different ﬁlter lengths with negligible overhead complexity and it supports variable convergence factor μ. Besides, the proposed structure has 100% hardware utilization efﬁciency (HUE) and its register complexity is independent of block-size. Compared with recently proposed LMS-based FIR structure, the proposed structure involves L times more multipliers, proportionately less adders and the same number of registers, and it offers L times higher throughput. Due to register and adder saving, the proposed structure has signiﬁcantly less area-delay product (ADP) and energy-per sample (EPS) than the existing structure. ASIC synthesis results shows that the proposed structure for block-size 4 and ﬁlter length 64 involve 21.4% less ADP and 26.6% less EPS than those of the existing structure and offers 3.8 times higher throughput. Adaptive ﬁlters, Block Least Mean Square (BLMS), VLSI, Architecture. I. I NTRODUCTION Adaptive ﬁlters (ADFs) are used in various digital signal processing (DSP) applications such as noise cancellation, echo cancellation, channel equalization and system identiﬁcation [1] . Least mean square (LMS) algorithm based FIR ﬁlter is the most popular one due to its simplicity in implementation and satisfactory convergence behavior [2]. LMS-based ADF uses a direct- form structure which has a long critical path. The conventional LMS algorithm does not favour pipelined implementation due to its recursive behavior. LMS algorithm is expressed in modiﬁed form as delayed LMS (DLMS) for pipelined implementation LMS-based ADF [3]. The DLMS algorithm is similar to the LMS except that the weight increment term is computed using the past available error value to update the weight-vector during the current iteration. The delay introduced in the error value is known as adaptation delay basically used for pipelining the ﬁltering section. The adaptation delay affects the error performance of the LMS ADF. In general, N th order DLMS ADF involves an adaptation delay of N cycles and degrade error performance signiﬁcantly. Poltmann has suggested a modiﬁed DLMS algorithm, where a correction term is included with the weight- increment term to reduce the performance degradation [4]. Several multiplier-based architectures have been suggested in the literature over the last ﬁfteen years [5] - [14] for high-throughput implementation of DLMS-based ADFs with less adaptation delay. Distributed arithmetic (DA) based designs have been suggested in recent years for area-delay efﬁcient implementation Basant Kumar Mohanty and Sujit Kumar Patel are with the Dept. of Electronics and Communication Engineering, Jaypee University of Engineering and Technology, Raghogarh, Guna, Madhya Pradesh, India-473226, Email: {bk.mohanti,sujit.patel}@juet.ac.in