Journal of VLSI Signal Processing 23, 239–266 (1999) c  1999 Kluwer Academic Publishers. Manufactured in The Netherlands. MOST-Based Design and Scaling of Synaptic Interconnections in VLSI Analog Array Processing CNN Chips ANGEL RODR ´ IGUEZ-V ´ AZQUEZ, ELISENDA ROCA, MANUEL DELGADO-RESTITUTO, SERVANDO ESPEJOAND RAFAEL DOM ´ INGUEZ-CASTRO Instituto de Microelectr´ onica de Sevilla, Centro Nacional de Microelectr´ onica, Avda. Reina Mercedes s/n, Ediﬁcio CICA-CNM, E-41012 Sevilla, Spain I. Introduction After some decades of microelectronics and infotech revolution, today we marvel at how digital computers have inﬁltrated into every corner of our lives. Partic- ularly, in signal processing applications, the adoption of digital solutions has resulted in newer generations of apparatuses with larger performance and function- ality, and at lower costs, than their classical analog counterparts. Thus, over the last years we have wit- nessed a pervasive trend towards replacing the analog processing circuitry used classically for instrumenta- tion, telecom, radio, etc. with digital circuitry. As a result, in most modern electronic systems the role of analog is basically limited to that of an interface be- tween the real-world analog signals and the numbers handled by the core digital processors. However, there are applications where such a task di- vision between analog and digital circuitry may lead to important bottlenecks. This is the case encountered in real-time processing of multidimensional, interacting signals; particularly, in the processing of 2-D optical ﬂows. For instance, a 3-colour@512 × 512 camera delivers some F × 10 6 bytes/sec, where F is the frame rate. Digital computers can handle such a huge rate for auto-focus, image stabilization, luminance/chromi- nance control, and the like. However, they are cumber- some and vastly inefﬁcient when implementing the al- gorithms needed for the spatial-temporal operations of processing. Indeed, the array supercomputer shipped recently by INTEL contains almost 10,000 Pentium processors to achieve t ¯ rillion o ¯ perations per s ¯ econd (TeraOPS) computing power. We need this speed for real-time processing of multi-dimensional sensory sig- nals; however, we don’t need the 32-bit ﬂoating point accuracy. In order to overcome the limitations of conventional digital computing methods, a fundamentally different approach is needed. And such an approach may involve signiﬁcant, even revolutionary modiﬁcations regarding our way of thinking about the roles of digital and ana- log techniques. Presently, a variety of parallel analog computational architectures have been developed and are beginning to ﬁnd their way into applications. How- ever, industrial applications demand platforms where both analog and digital computation are exploited at their best, capable of ﬂexible operation, with program- mable features and standard interfacing to conventional equipment. The CNN Universal Machine (CNN-UM) is an example of model architecture for such newer generation of analog/digital, general-purpose, parallel- processing platforms [1]. This paper deals with networks consisting of A ¯ rrays of locally-interacting, P ¯ rogrammable, A ¯ nalog/digital P ¯ rocessors (APAPs) with embedded, distributed ana- log/digital cache memory for storage of intermediate results, and the capability to execute a sequence of user-selectable operations over the input and the in- termediate results. The potentials of these APAPs for high-speed multi-dimensional processing are only fully realized provided that they are implemented by mixed- signal VLSI chips. Figure 1 is representative of the ad- vantages reported by these mixed-signal APAP chips as compared to purely digital solutions. To obtain this ﬁgure we have calculated the computing time needed by different processors to realize different processing tasks on a 128 × 128 pixel image. The time expended for data transfer is included in the computing time. The processing tasks are well-known image operations; al- gorithm A consists of a sequence of 10 convolutions plus 10 erosions plus 1 heat diffusion operator; in al- gorithm B the number of heat diffusion operators is