VLSI implementations of efﬁcient isotropic ﬂexible 2D convolvers S. Perri and P. Corsonello Abstract: A new 2D isotropic convolver designed to operate on 256 grey-level images is presented. When realised by using 90 nm 1V CMOS technology, the core of the proposed circuit exhibits a 1.25-GHz running frequency with an average power dissipation of only ’1 mW/MHz. The new convolver can also be efﬁciently realised using FPGAs. 1 Introduction In image and video processing algorithms and applications, such as image ﬁltering, image restoration, feature recog- nition, object tracking, template matching and many others, 2-D convolution is one of the most frequently required operations [1]. 2D convolution mainly consists of computing the weighted sum of neighbouring pixels. Thus, given an input image, the convolution of the generic pixel P (x,y) is computed by summing the k 2 products obtained by multiplying a k  k neighbourhood of pixels centred at P (x,y) by a k  k convolution kernel. As is well known, 2D convolution is a very hard task, since it requires a lot of operations to be performed. As an example, it can be easily veriﬁed that, supposing k ¼ 3, more than nine million multiplications and eight million additions are required for computing the 2D convo- lution of a 1024  1024 image. As a consequence, software solutions appear to be more time consuming. Commercial DSPs, such as the TMS320C40 [2], are often inefﬁcient owing to the high number of instruction cycles required for such a complex operation. As discussed in [3], when k ¼ 3, the TMS320C40 DSP requires ’20 instruction cycles per pixel. For these reasons, in recent years, the design of efﬁcient circuits for convolution has received a great deal of attention and many approaches have been pro- posed for optimising area requirement and speed perform- ance [3–11]. In image and video processing, 2D isotropic kernels, such as Gaussian, Laplacian, Laplacian of Gaussian (LoG), mean, median, sharpening, smoothing and many others, are most frequently used. An isotropic kernel has the prop- erty of being equally well applied in all directions in an image, with no special sensitivity or bias towards one par- ticular set of directions. This feature could be exploited to reduce the hardware complexity of 2D convolvers. However, to the best of our knowledge, examples of circuits designed for this purpose do not yet exist in the literature. Circuits known for convolution can be classiﬁed into two main categories: ﬁxed-kernel and variable-kernel circuits. The ﬁrst category includes the circuits designed for operat- ing on kernel values ﬁxed a priori, whereas the second cat- egory includes the hardware architectures designed for elaborating any kernel values. In ﬁxed-kernel convolvers, just multiplications by constants have to be executed. They can be reduced to addition/subtraction and shift oper- ations. Therefore optimisation algorithms able to efﬁciently synthesise ﬁxed-kernel convolvers can be used [8–11]. The aim of these synthesis algorithms is to reduce the overall requirement of hardware resources by minimising multi- plier adder cost and/or minimising multiplier logic depth and pipeline registers. However, in the design of circuits for image and video processing, costs and performances are not the only con- cerns. In fact, ﬂexibility also plays a crucial role to make the computing platform able to support efﬁciently a large class of existing and future applications [12]. For this reason, the design of variable-kernel convolvers is often desirable, even if they obviously imply higher costs than ﬁxed-kernel convolvers. Examples of efﬁcient hardware architectures for 2D con- volution are described in [3–7]. The 2D 3  3 convolver described in [3] can elaborate kernels that contain only the values 24, 22, 21, 0, 1, 2, 4. In [4], it is shown how a 3  3 fully pipelined variable-kernel convolver for video applications can be efﬁciently implemented on FPGAs. In [5], a parameterised convolution core is pro- posed for the XILINX SPARTANXL device family, and characterisation results are given for the implementation of a 3  3 Laplace ﬁlter. Finally, in [7], several approaches for shifting a moving window over an image are presented for area-efﬁcient 2D FPGA-based convolvers. This paper presents a new 2D 3  3 convolver purpose- designed to optimally operate on variable isotropic kernels, thus providing a good ﬂexibility in the chosen context of image and video applications. The approach pro- posed here is based on the observation that an isotropic kernel has several identical coefﬁcients and that in comput- ing horizontally/vertically adjacent convolved pixels several two-operand additions can be performed only once and their results can be re-used many times. As demon- strated in the following, owing to these properties, the number of operations required to elaborate an input image is signiﬁcantly reduced with respect to conventional # The Institution of Engineering and Technology 2007 doi:10.1049/iet-cds:20070056 Paper ﬁrst received 12th February and in revised form 28th May 2007 The authors are with the Department of Electronics, Computer Science and Systems, University of Calabria, Arcavacata di Rende, Rende (CS) 87036, Italy E-mail: p.corsonello@unical.it IET Circuits Devices Syst., 2007, 1, (4), pp. 263–269 263