INVITED: Building Robust Machine Learning Systems: Current Progress, Research Challenges, and Opportunities Jef (Jun) Zhang * , Kang Liu * , Faiq Khalid , Muhammad Abdullah Hanif ,Semeen Rehman , Theocharis Theocharides § , Alessandro Artussi § , Muhammad Shafque , Siddharth Garg * * New York University, U.S.A; Technische Universität Wien (TU Wien), Austria; § University of Cyprus, Cyprus {sg175,jefjunzhang,kang.liu}@nyu.edu;{ttheocharides,aartus01}@ucy.ac.cy {muhammad.hanif,faiq.khalid,semeen.rehman,muhammad.shafque}@tuwien.ac.at ABSTRACT Machine learning, in particular deep learning, is being used in almost all the aspects of life to facilitate humans, specifcally in mobile and Internet of Things (IoT)-based applications. Due to its state-of-the-art performance, deep learning is also being employed in safety-critical applications, for instance, autonomous vehicles. Reliability and secu- rity are two of the key required characteristics for these applications because of the impact they can have on human’s life. Towards this, in this paper, we highlight the current progress, challenges and re- search opportunities in the domain of robust systems for machine learning-based applications. KEYWORDS Machine Learning, Deep Learning, Reliability, Security, Robustness, Permanent Faults, Timing Errors, Adversarial Attacks. 1 INTRODUCTION Machine learning (ML) has emerged as a leading tool for data analysis because of its ability to learn directly from raw data, with minimal human intervention. In particular, Deep Neural Networks (DNNs) ofer state-of-the-art accuracy for many ML applications. Current research in ML is focused on improving state-of-the-art DNNs to de- velop learning algorithms which can help learn proper functionalities without bias and thereby can help improve the accuracy of ML-based systems. Moreover, as DNNs are inherently compute intensive, op- timization methods are also being studied which can signifcantly reduce the computational complexity as well as the memory require- ments of these algorithms. Apart from the aforementioned-objectives, the digital system design community is focusing on developing ef- cient hardware accelerators which can further boost the efciency gains by designing application-specifc hardware for DNN-based ap- plications [25]. Moreover, driven by the current progress, DNNs are also being explored for use in safety-critical applications; for example, autonomous driving [8] and smart healthcare [7]. These applications Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. DAC ’19, June 2ś6, 2019, Las Vegas, NV, USA © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6725-7/19/06. . . $15.00 https://doi.org/10.1145/3316781.3323472 Drain Source p + p+ n substrate Gate Oxide Layer V g = V dd Si H TRAP O H + NBTI Aging HCID Process Variations Soft Errors n+ n+ P-Well P-Substrate Isolation Gate + - + - + - + -+ - + - + - + - + - + - Depletion Region High-Energy Particle (Neutron or Proton) Side Channel Attacks 1 0 1 1 0 Processing Computations Memory Power Supply Machine Learning-based System Src: google Image Hardware Trojans Software Attacks Training/Inference Attacks + Training Inference Figure 1: Reliability and security threats on machine learning-based systems. (Source of Images: [23, 24]) have stringent robustness constraints as defned by the standardiza- tion authorities because of the risks involved in the operation of these systems. Robustness refers to two main characteristics of a system, i.e., re- silience against reliability threats and against security vulnerabilities. Several reliability and security vulnerabilities are highlighted in Fig. 1, which are later discussed in Section 2 and 3. In this paper, we discuss the current state-of-the-art related to DNN reliability and security. We also highlight the challenges which are being faced in building reliable and secure yet efcient DNNs. 2 RELIABLE MACHINE LEARNING Reliability threats are mainly due to faults that arise at the hardware- layer of a system and can propagate all the way to the application- layer, potentially causing mis-predictions. There are many types of reliability faults, e.g., soft errors, timing faults, and permanent faults. Diferent techniques have been proposed for mitigating these faults. However, most of them are based on redundancy-based approaches where spatial/temporal redundancy is exploited for executing mul- tiple instances of an application that vote to ensure the correctness of execution [29]. Based on the compute-intensive nature of DNNs, naively applying these approaches might obviate many of the gains obtained from hardware acceleration. In this section, we highlight the impact of reliability faults on the accuracy of several state-of-the-art networks for diferent datasets. The details of the networks and the datasets are presented in Table 1. Later in the section, we also discuss a few methods which can be used for efciently mitigating the permanent faults and timing errors