1
A reconfigurable interconnected Filter for
Face Recognition based on Convolution
Neural Network
Shefa A. Dawwd
1
, Basil Sh. Mahmood
2
Abstract— A dynamically reconfigurable hardware model for
Convolutional Neural Network (CNN) is presented. The modular
prototyping system is based on XILINX FPGAs and is capable of
emulating hardware implementations of CNN for the task of face
recognition. The system is capable of emulating the complex
structure of CNN with exploitation of a small chip area by using
the property of reconfiguration. A speedup of about 88 is
achieved with FPGA modules of 50 MHz compared to a software
implementation on a state of the art personal computerfor typical
applications of CNN.
Index Terms—neural implementation, CNN, VLSI .
I. INTRODUCTION
The CNN operates like a system of interconnected filters,and
profitable comparisons may be made between other filtering
systems, since the neural weights of a CNN operate like the
taps of a system of finite impulse response (FIR) or wavelet
filters. Thus a trained CNN may be thought of as a trainable
filter system, custom made for a certain function mapping
application. Finally, CNNs allow the processing of large
spatially distributed arrays without a correspondingly large
number of free parameters, increasing the chances of minima
avoidance and generalization.
To implement the complex architecture of CNNs in
hardware, different implementation approaches are used.
Since the CNN neuron and layer models are so complex and
different updating rules may be used to adjust neuron’s
weights and parameters, then the training is usually
implemented in software (off-chip) .The hardware
implementation in turn focuses onto the CNNs architecture
rather than the training process.
Korekado et al.[1][2] proposed a convolutional network
VLSI architecture using a hybrid approach composed of
pulse-width modulation (PWM) and digital circuits. This
approach was called merged/mixed analog-digital
architecture. The VLSI includes PWM neuron circuits,
PWM/digital converters, digital adder-subtracters, and digital
memory. nonlinear conversion and multiplication are
performed by two MOSFETs. The VLSI chip was designed
and fabricated using a 0.35 μm CMOS process.
In this architecture, neuron circuits repetitively sed by time-
sharing operation. The VLSI chip can perform 6-bit precision
convolution calculations for an image of 100×100 pixels with
a receptive field area of up to 20×20 pixels within 5 ms, which
means a performance of 2 Giga operation per second(GOPS).
Neveu and Kumar[3] presented the implementation of the
neocognitron convolutional network on a SIMD parallel
computer(the DECmppn 12000). This parallel computer
consist of a matrix of 64×32 PEs. Cell and plane parallelism
can be viewed in this system. A network of 16×16 pixel digit
input image, three simple layers and three complex layers was
implemented. The network performed a recognition in 0.9
second.
In the work of Fieres et al. [4], the goal was to design and
implement an electronic system capable of merging sensory
input of different modalities to generate a perceptual
representation of the outside world. The mixed-mode neural
network ASIC (application specific integrated circuit)
HAGEN is used as the basis within a complete
software/hardware system and used to implement the
neocognitron convolutional network. Analog,
digital(ASIC/FPGA) techniques were used in this system.
The analog computations are fully confined within the blocks.
Inputs and outputs are interfaced digitally, and the synaptic
weights are converted by digital to analog converters. Thus,
all communication is digital, ensuring data integrity while still
exploiting the advantage of fast and highly integratable analog
computing units which embedded in HAGEN. All connected
components were controlled by an FPGA control unit.
Because all external components and its internal cores
(PowerPC CPU and multi gigabit transceivers) are connected
to its programmable logic, the operation of the network
module can completely be controlled just by configuring the
FPGA, which can be reconfigured at any time, if concepts or
requirements change. Propagating an image with a pixel size
of 480×480 through network layers, the computing time of the
HAGEN chip including reading and writing the data from and
to local memory on the PCI board, was only 730 ms.
FPGAs offer speed comparable to dedicated and fixed
hardware systems for parallel algorithm acceleration while, as
with a software implementation, retaining a high degree of
flexibility for device reconfiguration as the application
demands. Therefore, in this paper the complete reconfigurable
FPGA digital implementation of a CNN is proposed. To
design and implement an efficient parallel architecture which
1 Ph.D, Senior Lecturer, Computer Engg. Dept/ College of Engg./ Univ. of
Mosul. E-mail:shefadawwd@yahoo.com
2 Ph.D, Dean of the College of Electronic/Univ. of Mosul
E-mail:basil_mahmood@yahoo.com
978-1-4244-5750-2/10/$26.00 ©2009 IEEE