Analyzing the Robustness of Open-World Machine Learning
Vikash Sehwag
∗
Princeton University
Arjun Nitin Bhagoji
∗
Princeton University
Liwei Song
∗
Princeton University
Chawin Sitawarin
University of California, Berkeley
Daniel Cullina
Pennsylvania State University
Mung Chiang
Purdue University
Prateek Mittal
Princeton University
ABSTRACT
When deploying machine learning models in real-world applica-
tions, an open-world learning framework is needed to deal with both
normal in-distribution inputs and undesired out-of-distribution
(OOD) inputs. Open-world learning frameworks include OOD de-
tectors that aim to discard input examples which are not from the
same distribution as the training data of machine learning clas-
sifers. However, our understanding of current OOD detectors is
limited to the setting of benign OOD data, and an open question
is whether they are robust in the presence of adversaries. In this
paper, we present the frst analysis of the robustness of open-world
learning frameworks in the presence of adversaries by introduc-
ing and designing OOD adversarial examples. Our experimental
results show that current OOD detectors can be easily evaded by
slightly perturbing benign OOD inputs, revealing a severe limita-
tion of current open-world learning frameworks. Furthermore, we
fnd that OOD adversarial examples also pose a strong threat to
adversarial training based defense methods in spite of their efec-
tiveness against in-distribution adversarial attacks. To counteract
these threats and ensure the trustworthy detection of OOD inputs,
we outline a preliminary design for a robust open-world machine
learning framework.
CCS CONCEPTS
· Computing methodologies → Machine learning; Neural net-
works; · Security and privacy → Intrusion/anomaly detection and
malware mitigation;
KEYWORDS
Open world recognition; Adversarial example, Deep learning
ACM Reference Format:
Vikash Sehwag, Arjun Nitin Bhagoji, Liwei Song, Chawin Sitawarin, Daniel
Cullina, Mung Chiang, and Prateek Mittal. 2019. Analyzing the Robustness
of Open-World Machine Learning. In 12th ACM Workshop on Artifcial
Intelligence and Security (AISec ’19), November 15, 2019, London, UK. ACM,
New York, NY, USA, 12 pages. https://doi.org/10.1145/3338501.335737
∗
Equal contribution
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
AISec ’19, November 15, 2019, London, United Kingdom
© 2019 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-6833-9/19/11.
https://doi.org/10.1145/3338501.335737
1 INTRODUCTION
Machine learning (ML) models, especially deep neural networks,
have become prevalent and are being widely deployed in real-world
applications, such as image classifcation [40, 57], face recognition
[51, 61], and autonomous driving [9, 16]. Motivated by the fact
that real-world applications need to be resilient to arbitrary in-
put data, an important line of work has developed the open-world
learning framework that checks if the inputs are within the same
distribution as training data (in-distribution examples), or if they
come from a diferent distribution referred to as out-of-distribution
(OOD) examples [5, 6]. State-of-the-art open-world learning sys-
tems equip machine learning classifers with OOD detectors, and an
input example is processed for classifcation only if the input passes
through those detectors. In recent years, the research community
has developed several OOD detection mechanisms that are efective
in distinguishing OOD inputs [30, 42, 44].
However, a severe limitation of current open-world learning
frameworks is that their development and investigation has been
limited to the setting of benign (natural and unmodifed) OOD data.
Despite their good performance in detecting benign OOD inputs, an
important open question is whether open-world learning frameworks
are robust in the presence of adversaries? Specifcally, can OOD
detectors perform reliably when an adversary tries to evade them
by maliciously perturbing OOD inputs? In this paper, we thoroughly
evaluate the performance of open-world machine learning models
against such maliciously-perturbed OOD inputs, which we refer to
as OOD adversarial examples, motivated by the line of research on
adversarial attacks against neural networks [8, 11, 62]. Our analysis
shows that state-of-the-art OOD detectors [42, 44] are quite fragile:
their detection performance drops drastically with perturbations to
out-of-distribution inputs. For example, as highlighted in Figure 1,
benign OOD inputs can be reliably detected as out-of-distribution
by current open-world learning systems. However, OOD adversarial
examples are able to both evade the OOD detector as well as achieve
targeted misclassifcation by the classifer.
Beyond revealing the lack of robustness of current OOD detec-
tors, we further examine the behavior of OOD adversarial examples
on state-of-the-art robustly trained classifers [46, 65], which were
designed for robustness against in-distribution adversarial exam-
ples. This novel examination is critical because once the adversary
manages to bypass the OOD detector, the open-world learning frame-
work will pass that input to the relevant classifer. We fnd that
compared to in-distribution attacks, OOD adversarial examples
result in much higher attack success rates against robust classifers.
Session: Adversarial Machine Learning AISec ’19, November 15, 2019, London, United Kingdom
105