ACAM: Approximate Computing Based on Adaptive
Associative Memory with Online Learning
Mohsen Imani
†
, Yeseong Kim
†
, Abbas Rahimi
‡
, Tajana Rosing
†
†
Computer Science and Engineering, UC San Diego, La Jolla, CA 92093, USA
‡
Electrical Engineering and Computer Science, UC Berkeley, Berkeley, CA 94720, USA
{moimani, yek048, tajana}@ucsd.edu, abbas@eecs.berkeley.edu
ABSTRACT
The Internet of Things (IoT) dramatically increases the amount of
data to be processed for many applications including multimedia.
Unlike traditional computing environment, the workload of IoT
significantly varies overtime. Thus, an efficient runtime profiling is
required to extract highly frequent computations and pre-store them
for memory-based computing. In this paper, we propose an
approximate computing technique using a low-cost adaptive
associative memory, named ACAM, which utilizes runtime
learning and profiling. To recognize the temporal locality of data in
real-world applications, our design exploits a reinforcement
learning algorithm with a least recently use (LRU) strategy to select
images to be profiled; the profiler is implemented using an
approximate concurrent state machine. The profiling results are
then stored into ACAM for computation reuse. Since the selected
images represent the observed input dataset, we can avoid
redundant computations thanks to high hit rates displayed in the
associative memory. We evaluate ACAM on the recent AMD
Southern Island GPU architecture, and the experimental results
shows that the proposed design achieves by 34.7% energy saving
for image processing applications with an acceptable quality of
service (i.e., PSNR>30dB).
Keywords
Approximate computing, Associative memory, Online learning,
Non-volatile memory
1. INTRODUCTION
Going toward the Internet of Things (IoT) and the big data
computation significantly increases the size of input data on the
recent processors. In this era, many IoT workloads are going to be
run on the GPUs in either mobiles or the clouds such as data
centers. In particular, multimedia processing as an instance of IoT
workload have rapidly proliferated, and to achieve timely
performance demand, they require to be accelerated using efficient
massive parallel processors [1, 2]. In addition, due to locality of
dataset, similar computations repeatedly happen, thus giving an
opportunity to significantly reduce the amount of computations
based on memory-based computations [3]. To this end, an
associative memory in the form of a lookup table has been exploited
to reduce the number of redundant computations. A software
implementation pre-stores frequent patterns on a hash table and
retrieves them using a set of keys that replace original
computations. In order to enhance the performance of the lookup
table, associative memories can be implemented in hardware using
ternary content addressable memory (TCAM).
However, to utilize TCAMs in computation-with-memory [4], there
are two technical challenges. First, the system design has to
consider the actual workloads which keep changing rapidly over
different contexts such as time, place, and applications. Market
research shows significant growth on interactions with external
environment using sensor employments. Therefore, it is obvious
that filling associative memories with offline data, on design time,
cannot provide desirable hit rates [5]. Since with today’s interactive
IoT workloads, we need to have a context-aware associative
memory which should adapt to the environment. Therefore,
runtime profiling is one the essential components of the associative
memories for their practical deployment on parallel processors.
Second, CMOS-based TCAMs consume very high energy for the
search operation. This limits the applicability of these memories to
classification and IP routing [6]. Non-volatile memories (NVMs)
open a new field to have an efficient memory-based computation
[7]. Resistive random access memory (ReRAM) and spin-transfer
torque RAM (STT-RAM) are two kinds of low leakage and dense
NVMs which are based on memristive and magnetic tunneling
junction (MTJ) devices respectively. Moreover, NVM-based
TCAMs can further reduce energy consumption by applying
voltage over scaling (VOS) [8] or reducing the search switching
activity [9].
In this paper, we propose a novel approximate computing
framework using an adaptive associative memory, called ACAM,
with a capability of learning-based runtime profiling. The proposed
design also addresses the endurance and cost issues of associative
memories for online learning, thus providing a robust and practical
solution for a wide range of dynamic workloads on parallel
processor architectures. Our design goal is to find the best input
data with higher hit rate to adaptively fill the rows of an associative
memory and improve overall energy. The learning-based profiling
runs in the following steps: (i) Machine learning algorithm finds
the image of interest from input dataset based on pixel similarities.
The algorithm identifies the most represented data, which is likely
to be used in the near future, for profiling based on the proposed
TD-LRU policy. (ii) We profile the selected images of interest
based on a low-cost approximate concurrent state machine to keep
track of the number of repeated computations. The approximate
profiling is implemented using hash functions and a bloom filter,
thus enhancing energy efficiency at the expense of minimal
acceptable errors. In the circuit-level design, to address the
endurance and the lifetime issues caused by frequent runtime
updates, ACAM exploits high endurance and robust MTJ-based
TCAM and memory block. In addition, we apply approximation for
a selected part of associative memory to balance the tradeoff
between energy and accuracy. Thanks to the proposed method with
an efficient runtime profiling, parallel processors can efficiently
process a large and active dataset with a support of the adaptive
associative memory. Our evaluation shows that the proposed
ACAM improves the energy efficiency of GPGPU by 34.7% with
acceptable PSNR (peak signal-to-noise ratio) of more than 30dB
for image processing applications.
2. RELATED WORK
Non-volatile memories such as ReRAM and STT-RAM are good
candidate to design an efficient and low leakage power associative
memories [7] [10] [11]. Earlier efforts have used these ReRAM and
STT-RAM technologies to design a stable and efficient TCAM.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
ISLPED '16, August 08-10, 2016, San Francisco Airport, CA, USA
© 2016 ACM. ISBN 978-1-4503-4185-1/16/08…$15.00
DOI: http://dx.doi.org/10.1145/2934583.2934595