An MRAM-based Deep In-Memory Architecture for Deep Neural Networks Ameya D. Patil * , Haocheng Hua * , Sujan Gonugondla * , Mingu Kang † and Naresh R. Shanbhag * * Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801 † IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 Abstract—This paper presents an MRAM-based deep in- memory architecture (MRAM-DIMA) to efficiently implement multi-bit matrix vector multiplication for deep neural networks using a standard MRAM bitcell array. The MRAM-DIMA achieves an 4.5× and 70× lower energy and delay, respectively, compared to a conventional digital MRAM architecture. Behav- ioral models are developed to estimate the impact of circuit non- idealities, including process variations, on the DNN accuracy. An accuracy drop of ≤ 0.5% (≤ 1%) is observed for LeNet-300-100 on the MNIST dataset (a 9-layer CNN on the CIFAR-10 dataset), while tolerating 24% (12%) variation in cell conductance in a commercial 22 nm CMOS-MRAM process. I. I NTRODUCTION There is a growing interest in implementing deep neural network (DNN) based algorithms in variety of the edge platforms, such as smartphones, IoT sensors, etc [1]. However, conventional digital implementations of DNNs consume large energy and delay per decision primarily due to large data movement [2] requirements, since memory accesses require at least 10× more energy/delay compared to multiply-accumulate (MAC) operations [3]. This inhibits deployment of DNNs in resource-constrained, battery-operated platforms. Recently, near-memory and in-memory computing ap- proaches [4]–[18] are being widely explored to achieve sig- nificant energy-delay benefits over conventional digital im- plementations [19], [20]. These works use SRAM/MRAM arrays to realize binary matrix vector computation within the memory array [4], [8]–[11], and/or employ modified bitcell circuit to achieve multi-bit computation at the expense of lower density [5], [7], [12]. As an exception, in [21], [22], an SRAM-based deep in-memory architecture was proposed to efficiently achieve multi-bit vector dot product without requiring any modification in standard 6T SRAM bitcell. How- ever, for state-of-the-art DNNs, on-chip SRAM may not be sufficient to store all the parameters, requiring highly energy and latency expensive DRAM accesses. Hence, it is necessary to extend such approach to emerging high density memories, such as MRAM. Works on multi-level RRAM/PCM devices have explored multi-bit in-memory computation [13]–[18]. However, realizing an in-memory MRAM architecture that implements a multi-bit matrix-vector multiplication (MVM) without modifying bitcell structure remains a challenge. In this paper, we propose an MRAM-based deep in- memory architecture (MRAM-DIMA) to achieve multi-bit MVM within the memory array. We employ standard MRAM bitcell without requiring any modifications, thus preserving magnetic tunnel junction (MTJ) L:1 Sense Amplifier 1 … Sense Amplifier 2 Sense Amplifier $/L … L:1 L:1 … … … … &’ )) Read Bit 1 Read Bit 2 Read Bit N/L Digital Processor Input vector Output vector , , $ BL WL SL ./0 ./0 ./0 ./23 ./23 ./23 Fig. 1. Digital implementation of matrix-vector multiplication (MVM) with weights stored in 1T-1MTJ MRAM bitcell array (BCA). its density. We propose modified peripheral circuits to achieve such multi-bit computation, even though, unlike RRAM/PCM, each bitcell stores only 1-bit. Proposed MRAM DIMA achieves 4.5× and 70× lower energy and delay, respectively, compared to digital MRAM implementation with the matrix stored in an identical MRAM array. We further quantify accuracy drop due to analog computation and MTJ process variations for LeNet-300-100 on MNIST dataset and a 9-layer CNN on CIFAR-10 dataset and find it to be within 0.5% and 1%, respectively. II. PRELIMINARIES A. Notation In this paper, we assume following matrix vector multipli- cation (MVM) computation needs to be executed: a = Wx (1) where W denotes a M ×N weight matrix, and x and a denote input and output vectors, respectively. Each weight is denoted as w ij ∀ i ∈{1,...M },j ∈{1,...N } and is quantized to B w bits. Each element of vectors x and a is denoted as x i and a i , respectively, and quantized to B a bits. B. Digital MRAM Architecture Figure 1 shows a diagram of a conventional digital MRAM architecture with a bitcell array (BCA) of size N row × N col = M × NB w which stores the weight matrix W. This BCA has sourceline (SL) and wordline (WL) perpendicular to the bitline (BL). The MRAM bitcell consists of an NMOS access transistor and a magnetic tunnel junction (MTJ) (1T-1MTJ)