1 Efficient VLSI Architecture for Variable Length Block LMS Adaptive Filter Basant Kumar Mohanty, Senior Member, IEEE and Sujit Kumar Patel Abstract In this paper, we made an analysis on computational complexity of block least mean square (BLMS) finite impulse response (FIR) filter and decompose the filter computation into M sub-filters, where M = N/L, N is the filter length and L is the block-size. Each sub-filter acts like a short-length BLMS FIR filter of size L. The proposed decomposition scheme favors time- multiplexing the filtering computation and weight-increment term computation of each short-length filter. Using the proposed scheme, we have derived an efficient architecture for BLMS FIR filter. The proposed structure can be reconfigured for different filter lengths with negligible overhead complexity and it supports variable convergence factor μ. Besides, the proposed structure has 100% hardware utilization efficiency (HUE) and its register complexity is independent of block-size. Compared with recently proposed LMS-based FIR structure, the proposed structure involves L times more multipliers, proportionately less adders and the same number of registers, and it offers L times higher throughput. Due to register and adder saving, the proposed structure has significantly less area-delay product (ADP) and energy-per sample (EPS) than the existing structure. ASIC synthesis results shows that the proposed structure for block-size 4 and filter length 64 involve 21.4% less ADP and 26.6% less EPS than those of the existing structure and offers 3.8 times higher throughput. Adaptive filters, Block Least Mean Square (BLMS), VLSI, Architecture. I. I NTRODUCTION Adaptive filters (ADFs) are used in various digital signal processing (DSP) applications such as noise cancellation, echo cancellation, channel equalization and system identification [1] . Least mean square (LMS) algorithm based FIR filter is the most popular one due to its simplicity in implementation and satisfactory convergence behavior [2]. LMS-based ADF uses a direct- form structure which has a long critical path. The conventional LMS algorithm does not favour pipelined implementation due to its recursive behavior. LMS algorithm is expressed in modified form as delayed LMS (DLMS) for pipelined implementation LMS-based ADF [3]. The DLMS algorithm is similar to the LMS except that the weight increment term is computed using the past available error value to update the weight-vector during the current iteration. The delay introduced in the error value is known as adaptation delay basically used for pipelining the filtering section. The adaptation delay affects the error performance of the LMS ADF. In general, N th order DLMS ADF involves an adaptation delay of N cycles and degrade error performance significantly. Poltmann has suggested a modified DLMS algorithm, where a correction term is included with the weight- increment term to reduce the performance degradation [4]. Several multiplier-based architectures have been suggested in the literature over the last fifteen years [5] - [14] for high-throughput implementation of DLMS-based ADFs with less adaptation delay. Distributed arithmetic (DA) based designs have been suggested in recent years for area-delay efficient implementation Basant Kumar Mohanty and Sujit Kumar Patel are with the Dept. of Electronics and Communication Engineering, Jaypee University of Engineering and Technology, Raghogarh, Guna, Madhya Pradesh, India-473226, Email: {bk.mohanti,sujit.patel}@juet.ac.in