Analyzing the Robustness of Open-World Machine Learning Vikash Sehwag ∗ Princeton University Arjun Nitin Bhagoji ∗ Princeton University Liwei Song ∗ Princeton University Chawin Sitawarin University of California, Berkeley Daniel Cullina Pennsylvania State University Mung Chiang Purdue University Prateek Mittal Princeton University ABSTRACT When deploying machine learning models in real-world applica- tions, an open-world learning framework is needed to deal with both normal in-distribution inputs and undesired out-of-distribution (OOD) inputs. Open-world learning frameworks include OOD de- tectors that aim to discard input examples which are not from the same distribution as the training data of machine learning clas- sifers. However, our understanding of current OOD detectors is limited to the setting of benign OOD data, and an open question is whether they are robust in the presence of adversaries. In this paper, we present the frst analysis of the robustness of open-world learning frameworks in the presence of adversaries by introduc- ing and designing OOD adversarial examples. Our experimental results show that current OOD detectors can be easily evaded by slightly perturbing benign OOD inputs, revealing a severe limita- tion of current open-world learning frameworks. Furthermore, we fnd that OOD adversarial examples also pose a strong threat to adversarial training based defense methods in spite of their efec- tiveness against in-distribution adversarial attacks. To counteract these threats and ensure the trustworthy detection of OOD inputs, we outline a preliminary design for a robust open-world machine learning framework. CCS CONCEPTS · Computing methodologies → Machine learning; Neural net- works; · Security and privacy → Intrusion/anomaly detection and malware mitigation; KEYWORDS Open world recognition; Adversarial example, Deep learning ACM Reference Format: Vikash Sehwag, Arjun Nitin Bhagoji, Liwei Song, Chawin Sitawarin, Daniel Cullina, Mung Chiang, and Prateek Mittal. 2019. Analyzing the Robustness of Open-World Machine Learning. In 12th ACM Workshop on Artifcial Intelligence and Security (AISec ’19), November 15, 2019, London, UK. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3338501.335737 ∗ Equal contribution Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). AISec ’19, November 15, 2019, London, United Kingdom © 2019 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-6833-9/19/11. https://doi.org/10.1145/3338501.335737 1 INTRODUCTION Machine learning (ML) models, especially deep neural networks, have become prevalent and are being widely deployed in real-world applications, such as image classifcation [40, 57], face recognition [51, 61], and autonomous driving [9, 16]. Motivated by the fact that real-world applications need to be resilient to arbitrary in- put data, an important line of work has developed the open-world learning framework that checks if the inputs are within the same distribution as training data (in-distribution examples), or if they come from a diferent distribution referred to as out-of-distribution (OOD) examples [5, 6]. State-of-the-art open-world learning sys- tems equip machine learning classifers with OOD detectors, and an input example is processed for classifcation only if the input passes through those detectors. In recent years, the research community has developed several OOD detection mechanisms that are efective in distinguishing OOD inputs [30, 42, 44]. However, a severe limitation of current open-world learning frameworks is that their development and investigation has been limited to the setting of benign (natural and unmodifed) OOD data. Despite their good performance in detecting benign OOD inputs, an important open question is whether open-world learning frameworks are robust in the presence of adversaries? Specifcally, can OOD detectors perform reliably when an adversary tries to evade them by maliciously perturbing OOD inputs? In this paper, we thoroughly evaluate the performance of open-world machine learning models against such maliciously-perturbed OOD inputs, which we refer to as OOD adversarial examples, motivated by the line of research on adversarial attacks against neural networks [8, 11, 62]. Our analysis shows that state-of-the-art OOD detectors [42, 44] are quite fragile: their detection performance drops drastically with perturbations to out-of-distribution inputs. For example, as highlighted in Figure 1, benign OOD inputs can be reliably detected as out-of-distribution by current open-world learning systems. However, OOD adversarial examples are able to both evade the OOD detector as well as achieve targeted misclassifcation by the classifer. Beyond revealing the lack of robustness of current OOD detec- tors, we further examine the behavior of OOD adversarial examples on state-of-the-art robustly trained classifers [46, 65], which were designed for robustness against in-distribution adversarial exam- ples. This novel examination is critical because once the adversary manages to bypass the OOD detector, the open-world learning frame- work will pass that input to the relevant classifer. We fnd that compared to in-distribution attacks, OOD adversarial examples result in much higher attack success rates against robust classifers. Session: Adversarial Machine Learning AISec ’19, November 15, 2019, London, United Kingdom 105