Published as a conference paper at ICLR 2021 S ALIENCY MIX : AS ALIENCY G UIDED DATA AUG - MENTATION S TRATEGY FOR B ETTER R EGULARIZA - TION A. F. M. Shahab Uddin uddin@khu.ac.kr Mst. Sirazam Monira monira@khu.ac.kr Wheemyung Shin wheemi@khu.ac.kr TaeChoong Chung ∗† tcchung@khu.ac.kr Sung-Ho Bae ∗† shbae@khu.ac.kr ABSTRACT Advanced data augmentation strategies have widely been studied to improve the generalization ability of deep learning models. Regional dropout is one of the popular solutions that guides the model to focus on less discriminative parts by randomly removing image regions, resulting in improved regularization. How- ever, such information removal is undesirable. On the other hand, recent strate- gies suggest to randomly cut and mix patches and their labels among train- ing images, to enjoy the advantages of regional dropout without having any pointless pixel in the augmented images. We argue that such random selec- tion strategies of the patches may not necessarily represent sufficient informa- tion about the corresponding object and thereby mixing the labels according to that uninformative patch enables the model to learn unexpected feature repre- sentation. Therefore, we propose SaliencyMix that carefully selects a repre- sentative image patch with the help of a saliency map and mixes this indica- tive patch with the target image, thus leading the model to learn more appro- priate feature representation. SaliencyMix achieves the best known top-1 error of 21.26% and 20.09% for ResNet-50 and ResNet-101 architectures on ImageNet classification, respectively, and also improves the model robustness against ad- versarial perturbations. Furthermore, models that are trained with SaliencyMix help to improve the object detection performance. Source code is available at https://github.com/SaliencyMix/SaliencyMix. 1 I NTRODUCTION Machine learning has achieved state-of-the-art (SOTA) performance in many fields, especially in computer vision tasks. This success can mainly be attributed to the deep architecture of convolu- tional neural networks (CNN) that typically have 10 to 100 millions of learnable parameters. Such a huge number of parameters enable the deep CNNs to solve complex problems. However, besides the powerful representation ability, a huge number of parameters increase the probability of overfitting when the number of training examples is insufficient, which results in a poor generalization of the model. In order to improve the generalization ability of deep learning models, several data augmentation strategies have been studied. Random feature removal is one of the popular techniques that guides the CNNs not to focus on some small regions of input images or on a small set of internal activations, thereby improving the model robustness. Dropout (Nitish et al., 2014; Tompson et al., 2015) and regional dropout (Junsuk & Hyunjung, 2019; Terrance & Graham, 2017; Golnaz et al., 2018; Singh & Lee, 2017; Zhun et al., 2017) are two established training strategies where the former randomly turns off some internal activations and later removes and/or alters random regions of the input im- ages. Both of them force a model to learn the entire object region rather than focusing on the most * Department of Computer Science & Engineering, Kyung Hee University, South Korea. Corresponding author. 1 arXiv:2006.01791v2 [cs.LG] 27 Jul 2021