INVITED: Building Robust Machine Learning Systems:
Current Progress, Research Challenges, and Opportunities
Jef (Jun) Zhang
*
, Kang Liu
*
, Faiq Khalid
†
, Muhammad Abdullah Hanif
†
,Semeen Rehman
†
,
Theocharis Theocharides
§
, Alessandro Artussi
§
, Muhammad Shafque
†
, Siddharth Garg
*
*
New York University, U.S.A;
†
Technische Universität Wien (TU Wien), Austria;
§
University of Cyprus, Cyprus
{sg175,jefjunzhang,kang.liu}@nyu.edu;{ttheocharides,aartus01}@ucy.ac.cy
{muhammad.hanif,faiq.khalid,semeen.rehman,muhammad.shafque}@tuwien.ac.at
ABSTRACT
Machine learning, in particular deep learning, is being used in almost
all the aspects of life to facilitate humans, specifcally in mobile and
Internet of Things (IoT)-based applications. Due to its state-of-the-art
performance, deep learning is also being employed in safety-critical
applications, for instance, autonomous vehicles. Reliability and secu-
rity are two of the key required characteristics for these applications
because of the impact they can have on human’s life. Towards this,
in this paper, we highlight the current progress, challenges and re-
search opportunities in the domain of robust systems for machine
learning-based applications.
KEYWORDS
Machine Learning, Deep Learning, Reliability, Security, Robustness,
Permanent Faults, Timing Errors, Adversarial Attacks.
1 INTRODUCTION
Machine learning (ML) has emerged as a leading tool for data analysis
because of its ability to learn directly from raw data, with minimal
human intervention. In particular, Deep Neural Networks (DNNs)
ofer state-of-the-art accuracy for many ML applications. Current
research in ML is focused on improving state-of-the-art DNNs to de-
velop learning algorithms which can help learn proper functionalities
without bias and thereby can help improve the accuracy of ML-based
systems. Moreover, as DNNs are inherently compute intensive, op-
timization methods are also being studied which can signifcantly
reduce the computational complexity as well as the memory require-
ments of these algorithms. Apart from the aforementioned-objectives,
the digital system design community is focusing on developing ef-
cient hardware accelerators which can further boost the efciency
gains by designing application-specifc hardware for DNN-based ap-
plications [25]. Moreover, driven by the current progress, DNNs are
also being explored for use in safety-critical applications; for example,
autonomous driving [8] and smart healthcare [7]. These applications
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
DAC ’19, June 2ś6, 2019, Las Vegas, NV, USA
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6725-7/19/06. . . $15.00
https://doi.org/10.1145/3316781.3323472
Drain Source
p
+
p+
n – substrate
Gate
Oxide Layer
V
g
= –V
dd
Si
H
TRAP
O H
+
NBTI
Aging
HCID
Process Variations
Soft Errors
n+
n+
P-Well
P-Substrate
Isolation
Gate
+
-
+
-
+
-
+
-+
-
+
-
+
-
+
-
+
-
+
-
Depletion
Region
High-Energy Particle
(Neutron or Proton)
Side Channel Attacks
1 0 1 1 0
Processing
Computations
Memory
Power Supply
Machine Learning-based System
Src: google Image
Hardware Trojans
Software Attacks Training/Inference Attacks
+ ➔
➔ ➔
Training
Inference
Figure 1: Reliability and security threats on machine learning-based
systems. (Source of Images: [23, 24])
have stringent robustness constraints as defned by the standardiza-
tion authorities because of the risks involved in the operation of these
systems.
Robustness refers to two main characteristics of a system, i.e., re-
silience against reliability threats and against security vulnerabilities.
Several reliability and security vulnerabilities are highlighted in Fig. 1,
which are later discussed in Section 2 and 3.
In this paper, we discuss the current state-of-the-art related to
DNN reliability and security. We also highlight the challenges which
are being faced in building reliable and secure yet efcient DNNs.
2 RELIABLE MACHINE LEARNING
Reliability threats are mainly due to faults that arise at the hardware-
layer of a system and can propagate all the way to the application-
layer, potentially causing mis-predictions. There are many types of
reliability faults, e.g., soft errors, timing faults, and permanent faults.
Diferent techniques have been proposed for mitigating these faults.
However, most of them are based on redundancy-based approaches
where spatial/temporal redundancy is exploited for executing mul-
tiple instances of an application that vote to ensure the correctness
of execution [29]. Based on the compute-intensive nature of DNNs,
naively applying these approaches might obviate many of the gains
obtained from hardware acceleration.
In this section, we highlight the impact of reliability faults on the
accuracy of several state-of-the-art networks for diferent datasets.
The details of the networks and the datasets are presented in Table 1.
Later in the section, we also discuss a few methods which can be
used for efciently mitigating the permanent faults and timing errors