Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images Anh Nguyen University of Wyoming anguyen8@uwyo.edu Jason Yosinski Cornell University yosinski@cs.cornell.edu Jeff Clune University of Wyoming jeffclune@uwyo.edu Full Citation: Nguyen A, Yosinski J, Clune J. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In Computer Vision and Pattern Recognition (CVPR ’15), IEEE, 2015. Abstract Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Given that DNNs are now able to classify ob- jects in images with near-human-level performance, ques- tions naturally arise as to what differences remain between computer and human vision. A recent study [26] revealed that changing an image (e.g. of a lion) in a way imper- ceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a li- brary). Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable ob- jects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion). Specifically, we take con- volutional neural networks trained to perform well on either the ImageNet or MNIST datasets and then find images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class. It is possible to produce images totally unrecognizable to hu- man eyes that DNNs believe with near certainty are familiar objects. Our results shed light on interesting differences be- tween human vision and current DNNs, and raise questions about the generality of DNN computer vision. 1. Introduction Deep neural networks (DNNs) learn hierarchical lay- ers of representation from sensory input in order to per- form pattern recognition [1, 13]. Recently, these deep ar- chitectures have demonstrated impressive, state-of-the-art, and sometimes human-competitive results on many pattern recognition tasks, especially vision classification problems [15, 5, 27, 16]. Given the near-human ability of DNNs to classify visual objects, questions arise as to what differences remain between computer and human vision. Figure 1. Evolved images that are unrecognizable to humans, but that state-of-the-art DNNs trained on ImageNet believe with 99.6% certainty to be a familiar object. This result highlights differences between how DNNs and humans recognize objects. Images are either directly (top) or indirectly (bottom) encoded. A recent study revealed a major difference between DNN and human vision [26]. Changing an image, originally cor- rectly classified (e.g. as a lion), in a way imperceptible to human eyes, can cause a DNN to label the image as some- thing else entirely (e.g. mislabeling a lion a library). In this paper, we show another way that DNN and human vision differ: It is easy to produce images that are com- pletely unrecognizable to humans (Fig. 1), but that state-of- the-art DNNs believe to be recognizable objects with over 99% confidence (e.g. labeling with certainty that TV static 1