Toward Improving the Robustness of Deep Learning Models via Model Transformation Yingyi Zhang State Key Laboratory of Communication Content Cognition, People’s Daily Online, Beijing, China 100733; College of Intelligence and Computing, Tianjin University yingyizhang@tju.edu.cn Zan Wang State Key Laboratory of Communication Content Cognition, People’s Daily Online, Beijing, China 100733; College of Intelligence and Computing, Tianjin University wangzan@tju.edu.cn Jiajun Jiang ∗ College of Intelligence and Computing, Tianjin University Tianjin, China jiangjiajun@tju.edu.cn Hanmo You College of Intelligence and Computing, Tianjin University Tianjin, China youhanmo@tju.edu.cn Junjie Chen College of Intelligence and Computing, Tianjin University Tianjin, China junjiechen@tju.edu.cn ABSTRACT Deep learning (DL) techniques have attracted much attention in recent years, and have been applied to many application scenarios, including those that are safety-critical. Improving the universal robustness of DL models is vital and many approaches have been proposed in the last decades aiming at such a purpose. Among existing approaches, adversarial training is the most representa- tive. It advocates a post model tuning process via incorporating adversarial samples. Although successful, they still sufer from the challenge of generalizability issues in the face of various attacks with unsatisfactory efectiveness. Targeting this problem, in this paper we propose a novel model training framework, which aims at improving the universal robustness of DL models via model trans- formation incorporated with a data augmentation strategy in a delta debugging fashion. We have implemented our approach in a tool, called Dare, and conducted an extensive evaluation on 9 DL models. The results show that our approach signifcantly outper- forms existing adversarial training techniques. Specifcally, Dare has achieved the highest Empirical Robustness in 29 of 45 testing scenarios under various attacks, while the number drops to 5 of 45 for the best baseline approach. CCS CONCEPTS · Computing methodologies → Neural networks; · Software and its engineering → Software testing and debugging. ∗ Jiajun Jiang is the corresponding author. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. ASE ’22, October 10ś14, 2022, Rochester, MI, USA © 2022 Association for Computing Machinery. ACM ISBN 978-1-4503-9475-8/22/10. . . $15.00 https://doi.org/10.1145/3551349.3556920 KEYWORDS Deep Neural Network, Delta Debugging, Model Robustness ACM Reference Format: Yingyi Zhang, Zan Wang, Jiajun Jiang, Hanmo You, and Junjie Chen. 2022. Toward Improving the Robustness of Deep Learning Models via Model Transformation. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22), October 10ś14, 2022, Rochester, MI, USA. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3551349.3556920 1 INTRODUCTION In recent years, deep learning (DL) techniques have attracted much attention from researchers, and have been prevalently used in both industrial practice and academic research, such as image process- ing [83, 85], machine translation [32, 49] and software engineer- ing [5, 42, 65, 76], etc. Particularly, some application scenarios are safety-critical, such as autonomous driving [4, 43, 84, 92] and air- craft collision avoidance [31]. However, as reported by existing studies [9, 58, 63] DL models, in practice, are fragile when facing perturbations and thus easy to be attacked by hackers. For example, researchers from Tencent Keen Security Lab successfully tricked the lane detection system of Tesla Model S with three small adver- sarial sticker images, making it swerve into the wrong lane without any warnings or precautions [1]. Therefore, it is vital to ensure the safety and enhance the adversarial robustness of DL models in the face of potential adversarial attacks. Unlike traditional handcrafted programs that are deterministic with a fxed code logic defned by a set of executable machine in- structions, deep learning models are built based on a set of input examples. That is, when providing a set of training examples, a model with a set of parameters will be learned according to a pre- defned neural network structure, which is expected to meet the functionality requirement, such as image classifcation. However, since the number of input examples is limited and the complete input space is usually enormous or infnite in practice, also known as the incomplete specifcation issue in traditional software engi- neering tasks like programming by examples [14ś18, 24, 33, 35], the learned model may not work well on unseen inputs, especially those samples that are decorated with crafted attacking features.