A Parallel-friendly Majority Gate to Accelerate In-memory Computation John Reuben Chair of Computer Science 3 - Hardware Architecture Friedrich-Alexander-Universit¨ at Erlangen-N¨ urnberg (FAU) 91058 Erlangen, Germany johnreuben.prabahar@fau.de Stefan Pechmann Chair of Communications Electronics Universit¨ at Bayreuth 95447 Bayreuth, Germany stefan.pechmann@uni-bayreuth.de Abstract—Efforts to combat the ‘von Neumann bottleneck’ have been strengthened by Resistive RAMs (RRAMs), which enable computation in the memory array. Majority logic can accelerate computation when compared to NAND/NOR/IMPLY logic due to it’s expressive power. In this work, we propose a method to compute majority while reading from a transistor- accessed RRAM array. The proposed gate was verified by sim- ulations using a physics-based model (for RRAM) and industry standard model (for CMOS sense amplifier) and, found to tolerate reasonable variations in the RRAMs’ resistive states. Together with NOT gate, which is also implemented in-memory, the pro- posed gate forms a functionally complete Boolean logic, capable of implementing any digital logic. Computing is simplified to a sequence of READ and WRITE operations and does not require any major modifications to the peripheral circuitry of the array. The parallel-friendly nature of the proposed gate is exploited to implement an eight-bit parallel-prefix adder in memory array. The proposed in-memory adder could achieve a latency reduction of 70% and 50% when compared to IMPLY and NAND/NOR logic-based adders, respectively. Index Terms—Resistive RAM (RRAM), majority logic, major- ity gate, memristor, 1 Transistor-1 Resistor(1T–1R), von Neu- mann bottleneck, in-memory computing, compute-in-memory, processing-in-memory, parallel-prefix adder I. I NTRODUCTION T HE movement of data between processing and memory units in present day computing systems is their main performance and energy-efficiency bottleneck, often referred to as the ‘von Neumann bottleneck’ or ‘memory wall’. The emergence of non-volatile memory technologies like Resistive RAM (RRAM) has created opportunities to overcome the memory wall by enabling computing at the residence of data. RRAMs are two terminal devices (usually a Metal-Insulator- Metal structure) capable of storing data as resistance. The change of resistance is due to the formation or rupture of a conductive filament, depending on the direction of the current flow through the structure. The word ‘memristor’ is also used by researchers to denote such a device, because it is essentially a resistor with memory. Connecting such RRAM devices in a certain manner, or by applying certain voltage patterns, or by modifying the sensing circuitry, basic Boolean gates (NOR, NAND, XOR, IMPLY logic) have been demonstrated in RRAM arrays [1]–[6]. The motivation for such efforts is to perform Boolean operations on data stored in the memory array, without moving them out to a separate processing circuit, thus mitigating the von Neumann bottleneck. Reviews of such in-memory computing approaches are presented in [7], [8]. To construct a memory array using such devices, two configurations are common: 1Transistor–1Resistor (1T–1R) and 1Selector–1Resistor (1S–1R). The 1T–1R configuration uses a transistor as an access device for each cell, isolating the accessed cell from its neighbours in the array. The 1S–1R configuration uses a two-terminal device called a ‘selector’ which is fabricated in series with the memristive device. The 1S–1R is area-efficient, but suffers from current leakage (sneak–path problem) due to the inability to access a particular cell without interfering with its neighbours [9]. Majority logic, a type of Boolean logic, is defined to be true if more than half of the n inputs are true, where n is odd. Hence, a majority gate is a democratic gate and can be expressed in terms of Boolean AND/OR as MAJ (a, b, c)= a.b + b.c + a.c, where a, b, c are Boolean variables. Although majority logic was known since 1960, there has been a revival in using it for computation in many emerging nan- otechnologies (spin waves, magnetic Quantum-Dot cellular automata, nano magnetic logic, Single Electron Tunneling). Recent research [10]–[12] has confirmed that majority logic is to be preferred not only because a particular nanotechnology can realize it, but also because of its ability to implement arithmetic-intensive circuits with less gates. It must be em- phasized that majority logic did not become the dominant logic to compute because it was more efficient to implement NAND/NOR gate than a majority gate, in CMOS technology. However, with many emerging nanotechnologies, this is not the case anymore, therefore, majority logic needs to be re- evaluated for its computing efficiency. In [13]–[15], majority logic is implemented in RRAM by applying the two inputs of the majority gate as voltages across its terminals, and the initial state of the RRAM (which is also the third input) switches to evaluate majority. Such an approach complicates the peripheral circuitry and is also not parallel-friendly, because two of the three inputs of a majority gate need to be applied as voltages at wordline/bitline (see Fig.1(a)). In this paper, we propose a majority gate whose structure is conducive for parallel-processing in the memory array. By activating three rows of the array simultaneously, the This is author’s version of the accepted paper. For the published paper, see the 31st IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) proceedings in https://ieeexplore.ieee.org/ See Conference presentation (20 min video) at https://asap2020.cs.manchester.ac.uk/paper.php?id=72 © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.