This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS 1
Architecture of a Fully Pipelined Real-Time Cellular
Neural Network Emulator
Nerhun Yildiz, Member, IEEE, Evren Cesur, Member, IEEE, Kamer Kayaer,
Vedat Tavsanoglu, Senior Member, IEEE, and Murathan Alpay
Abstract—In this paper, architecture of a Real-Time Cellular
Neural Network (CNN) Processor (RTCNNP-v2) is given and the
implementation results are discussed. The proposed architecture
has a fully pipelined structure, capable of processing full-HD
1080p@60 (1920 1080 resolution at 60 Hz frame rate, 124.4
MHz visible pixel rate) video streams, which is implemented on
both high-end and low-cost FPGA devices, Altera Stratix IV GX
230, and Cyclone III C 25, respectively. Many features of the
architecture are designed to be either pre-synthesis configurable
or runtime programmable, which makes the processor extremely
flexible, reusable, scalable, and practical.
Index Terms—Cellular neural networks, field programmable
gate arrays, real time systems, reconfigurable architectures.
I. INTRODUCTION
C
ELLULAR neural networks (CNN) is a parallel com-
puting paradigm [1] having many applications like
image processing, artificial vision, solving partial differential
equations, etc. A -dimensional -layer CNN structure consists
of a -dimensional spatial grid of neural cells and each cell
contains memory nodes. The spatio-temporal dynamics of
the system are tuned for specific tasks by defining local spatial
interconnections between the neural cells.
Generally, a 2-D 1-layer CNN structure with space invariant
neural weights [2] is used in image processing applications,
which is the focus of this work. Extending the architecture pro-
posed in this paper to support two- or multi-layer CNN struc-
tures is an ongoing work and beyond the scope of this paper.
A continuous-time CNN (CT CNN) implementation has
many advantages: a continuous-time circuit is by nature a fully
parallel structure, whose convergence rate is generally much
faster than that of a digital approximation. Furthermore, it is
easier to combine the architecture with an imaging sensor and
obtain a focal plane processor to directly process the captured
data and use it as a pre-processor or artificial retina. However,
Manuscript received March 25, 2014; revised June 11, 2014; accepted July
15, 2014. This research was supported by The Scientific and Technological Re-
search Council of Turkey (TÜBİTAK) under project number 108E023. This
paper was recommended by Associate Editor M. Frasca.
N. Yildiz and M. Alpay are with the Department of Electronics and Com-
munications Engineering, Yildiz Technical University, 34220 Esenler, Istanbul,
Turkey (e-mail: nerhuny@yildiz.edu.tr; ecesur@yildiz.edu.tr; malpay@yildiz.
edu.tr).
E. Cesur was with the Department of Electronics and Communications En-
gineering, Yildiz Technical University, 34220, Esenler, Istanbul, Turkey. He is
now with the Applied DSP and VLSI Research Group, University of Westmin-
ster, W1W 6UW, London, U.K.
K. Kayaer is with the Scientific and Technological Research Council of
Turkey, 41470 Gebze, Kocaeli, Turkey (e-mail: kamerkayaer@gmail.com).
V. Tavsanoglu is with the Department of Electrical and Electronics
Engineering, Isik University, 34398 Maslak, Istanbul, Turkey (e-mail:
vtavsanoglu@isik.edu.tr).
Digital Object Identifier 10.1109/TCSI.2014.2345502
the highest number of cells implemented in a CT CNN pro-
cessor to date is 176 144 [3], hence even a low resolution
input comparable to QVGA (320 240) may only be pro-
cessed by tiling, i.e., divide the image to smaller overlapped
“tiles” and process them individually. Furthermore, tiling is
not always reliable for some CNN templates, hence for large
images these networks can only be simulated or emulated on
a digital platform. Second, bit depth of a CT CNN is limited
to 7 bits due to the electrical noise and crosstalk of an analog
implementation. Consequently, even obtaining a regular 256
level gray-scale result is not possible with CT CNN. Finally,
as opposed to a digital implementation, modifying an analog
IC design is an extremely comprehensive work, which can
almost be considered as a new project. As a result, digital
implementations of CNN are preferable in most cases.
The difference equation of the discrete-time CNN (DT CNN)
is obtained by the discretization of the differential equation of
the CT CNN. Then the difference equation may be solved on
a software platform like a PC, DSP, or GPU, or a custom hard-
ware can be implemented either on an FPGA device or as ASIC.
Software solutions are easier to design and modify while hard-
ware implementations provide high performance.
Using an FPGA device for a DT CNN implementation is
preferable in most cases, as it has flexible parallel structures,
faster than software implementations and cheaper than ASIC
solutions. Consequently, the most notable DT CNN implemen-
tations [4], [5] are implemented on FPGA devices. An alterna-
tive FPGA architecture of DT CNN was proposed in [6], which
is named as real-time CNN processor (RTCNNP, RTCNNP-v1).
The architecture proposed in this paper is a second-generation
RTCNNP called RTCNNP-v2 [7], [8]. The aim of this work is to
design a real-time DT CNN implementation supporting not only
higher frame-rates, but also high resolutions, including full-HD
1080p.
II. MATHEMATICAL OVERVIEW
An -neighborhood space-invariant CT CNN with
rectangular array of cells is completely described [2] by
the cell-state and output equation pair
(1)
(2)
where , , are the spatial
Cartesian coordinates, is the cell state at time , is the
cell input, and , , are
the feedback and input coefficients, respectively, and is the
1549-8328 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.