Low Power Spatial Computing using Null Convention Logic Kashfia Haque School of Electrical & Computer Engineering RMIT University kashfia.haque@rmit.edu.au Conrad Jakob CrunchDSP Pty. Ltd. Carlton, Melbourne, Conrad@crunchdsp.com.au Paul Beckett School of Electrical & Computer Engineering RMIT University pbeckett@rmit.edu.au AbstractWe describe and analyze the design and organiza- tion of an homogeneous asynchronous bit-level array based on Null Convention Logic. A bit-element (bel) array represents a "hybrid# bit-level processor that exhibits both Spatial and Tem- poral computing characteristics. Bels are connected in a 2D grid, with eight-way connectivity. Programs represented as Directed Acyclic Graphs can be mapped to the array either fully statically or with a small level of element reuse (dynamic mapping). Using a 28nm FDSOI process, we present some preliminary estimation for area and power of a bel. Keywords spatial computing, ncl,directed acyclic graph I. INTRODUCTION The evolution of computer architecture might be characte- rized as a series of attempts to side-step the basic von Neumann separation of memory and processing. In this context, interest in !flow computing" or data flow has waxed and waned over the years. However, as device sizes continue to shrink, new micro-architectural solutions will be increasingly needed to overcome the problems of process, voltage and temperature (PVT) variability at advanced fabrication nodes. Asynchronous spatial computing [1, 2] has been suggested as one potential solution to these issues. The term dark silicon" [3] describes the exponentially re- ducing percentage of a chip that can switch at full frequency with each successive process generation. Power consumption is a direct function of switching activity, which in asynchronous components can be as high as in their clocked counterparts. However, as asynchronous techniques are intrinsically event triggered, responding only to an incoming data value, their component gates will tend to be !dark" by default, consuming only standby current unless actually contributing to a computa- tion. Thus, in general, asynchronous techniques will be a good choice to support massively parallel spatial computing systems at future advanced nodes. In a spatial computing model, operations and their opera- tors are connected in space rather than time. Where billions of transistors can be built on a single die, and where multiple chip stacking is becoming more common, spatial organizations can exploit this abundance of resources to expose parallelism avail- able in a task [4] by executing instructions in-place rather than moving them and their operands from storage to processing sites, and back again. This will be the key to their application in the extreme nano-scale domain, where active devices are plentiful whereas interconnection costs are high. Computation- al functions, along with any intermediate variables are created as and when required, while the procedural flow of the soft- ware definition is effectively !wired-in". Static data transfer operations are embedded in the fabric of the computer and represent wires rather than gates and parallel operations are exposed more-or-less automatically. Spatial computing systems proposed to date have tended to be synchronous so their design has involved striking a balance between clock speed (performance) and power consumption. However, issues such as process variability at advanced nodes are beginning to have a significant impact on system timing closure, especially at the lower supply voltages of low power and portable systems. Here, asynchronous (clockless) tech- niques can exhibit an important advantage. In particular Null Convention Logic (NCL) has been shown to be robust in the face of extreme variability across supply voltage, temperature and processing parameters [5]. In this paper, we analyze a single-bit spatial computing bit- element (bel) built using NCL and wired into an homogeneous 2D array. Each bel supports simple Boolean processing on pairs of adjacent bits derived from its eight nearest neighbors. The array is entirely asynchronous, there is no central clock and is essentially !flat" in that it exhibits very little hierarchical structure. A bel can execute a small list of instructions sequen- tially under the control of the NCL handshaking. In this way, the bel processor exhibits both Spatial and Temporal compu- ting characteristics. By Spatial, we mean that fragments of a Directed Acyclic Graph description of a process can be mapped (in space) directly to the array. Temporal implies that the fabric can be re-used to handle different fragments of the DAG in a time-ordered sequence. As such, the bel architecture can be viewed as one in a long sequence of fine-grained homo- genous reconfigurable arrays (see [4] for a survey of fine and coarse grained reconfigurable architectures and [5] for a good historical survey). We envisage the array being applied as a co-processor to a conventional computing platform to manage highly demanding multimedia applications such as video and audio streaming operations. Our working hypothesis here is that an asynchron- ous logic style such as NCL may offer lower power operation but will only be useful if performance can be maintained through the use of parallelism. The bel array is intended to be a research platform to explore these issues. The remainder of the paper proceeds as follows. In Section II we cover the background to Spatial Computing and look at other homogeneous platforms. In Section III, we describe the organization of the bel processor while in Section IV we show some representative area and power results for a trial imple-