1 A reconfigurable interconnected Filter for Face Recognition based on Convolution Neural Network Shefa A. Dawwd 1 , Basil Sh. Mahmood 2 Abstract— A dynamically reconfigurable hardware model for Convolutional Neural Network (CNN) is presented. The modular prototyping system is based on XILINX FPGAs and is capable of emulating hardware implementations of CNN for the task of face recognition. The system is capable of emulating the complex structure of CNN with exploitation of a small chip area by using the property of reconfiguration. A speedup of about 88 is achieved with FPGA modules of 50 MHz compared to a software implementation on a state of the art personal computerfor typical applications of CNN. Index Terms—neural implementation, CNN, VLSI . I. INTRODUCTION The CNN operates like a system of interconnected filters,and profitable comparisons may be made between other filtering systems, since the neural weights of a CNN operate like the taps of a system of finite impulse response (FIR) or wavelet filters. Thus a trained CNN may be thought of as a trainable filter system, custom made for a certain function mapping application. Finally, CNNs allow the processing of large spatially distributed arrays without a correspondingly large number of free parameters, increasing the chances of minima avoidance and generalization. To implement the complex architecture of CNNs in hardware, different implementation approaches are used. Since the CNN neuron and layer models are so complex and different updating rules may be used to adjust neuron’s weights and parameters, then the training is usually implemented in software (off-chip) .The hardware implementation in turn focuses onto the CNNs architecture rather than the training process. Korekado et al.[1][2] proposed a convolutional network VLSI architecture using a hybrid approach composed of pulse-width modulation (PWM) and digital circuits. This approach was called merged/mixed analog-digital architecture. The VLSI includes PWM neuron circuits, PWM/digital converters, digital adder-subtracters, and digital memory. nonlinear conversion and multiplication are performed by two MOSFETs. The VLSI chip was designed and fabricated using a 0.35 μm CMOS process. In this architecture, neuron circuits repetitively sed by time- sharing operation. The VLSI chip can perform 6-bit precision convolution calculations for an image of 100×100 pixels with a receptive field area of up to 20×20 pixels within 5 ms, which means a performance of 2 Giga operation per second(GOPS). Neveu and Kumar[3] presented the implementation of the neocognitron convolutional network on a SIMD parallel computer(the DECmppn 12000). This parallel computer consist of a matrix of 64×32 PEs. Cell and plane parallelism can be viewed in this system. A network of 16×16 pixel digit input image, three simple layers and three complex layers was implemented. The network performed a recognition in 0.9 second. In the work of Fieres et al. [4], the goal was to design and implement an electronic system capable of merging sensory input of different modalities to generate a perceptual representation of the outside world. The mixed-mode neural network ASIC (application specific integrated circuit) HAGEN is used as the basis within a complete software/hardware system and used to implement the neocognitron convolutional network. Analog, digital(ASIC/FPGA) techniques were used in this system. The analog computations are fully confined within the blocks. Inputs and outputs are interfaced digitally, and the synaptic weights are converted by digital to analog converters. Thus, all communication is digital, ensuring data integrity while still exploiting the advantage of fast and highly integratable analog computing units which embedded in HAGEN. All connected components were controlled by an FPGA control unit. Because all external components and its internal cores (PowerPC CPU and multi gigabit transceivers) are connected to its programmable logic, the operation of the network module can completely be controlled just by configuring the FPGA, which can be reconfigured at any time, if concepts or requirements change. Propagating an image with a pixel size of 480×480 through network layers, the computing time of the HAGEN chip including reading and writing the data from and to local memory on the PCI board, was only 730 ms. FPGAs offer speed comparable to dedicated and fixed hardware systems for parallel algorithm acceleration while, as with a software implementation, retaining a high degree of flexibility for device reconfiguration as the application demands. Therefore, in this paper the complete reconfigurable FPGA digital implementation of a CNN is proposed. To design and implement an efficient parallel architecture which 1 Ph.D, Senior Lecturer, Computer Engg. Dept/ College of Engg./ Univ. of Mosul. E-mail:shefadawwd@yahoo.com 2 Ph.D, Dean of the College of Electronic/Univ. of Mosul E-mail:basil_mahmood@yahoo.com 978-1-4244-5750-2/10/$26.00 ©2009 IEEE