Citation: Liang, G.; Guerrero, J.; Zheng, F.; Alsmadi, I. Enhancing Neural Text Detector Robustness with µAttacking and RR-Training. Electronics 2023, 12, 1948. https://doi.org/10.3390/ electronics12081948 Received: 17 March 2023 Revised: 15 April 2023 Accepted: 19 April 2023 Published: 21 April 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). electronics Article Enhancing Neural Text Detector Robustness with μAttacking and RR-Training Gongbo Liang 1, * , Jesus Guerrero 1 , Fengbo Zheng 2 and Izzat Alsmadi 1, * 1 College of Arts and Sciences, Texas A&M University-San Antonio, San Antonio, TX 78224, USA 2 College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China * Correspondence: gliang@tamusa.edu (G.L.); ialsmadi@tamusa.edu (I.A.) Abstract: With advanced neural network techniques, language models can generate content that looks genuinely created by humans. Such advanced progress beneﬁts society in numerous ways. However, it may also bring us threats that we have not seen before. A neural text detector is a classiﬁcation model that separates machine-generated text from human-written ones. Unfortunately, a pretrained neural text detector may be vulnerable to adversarial attack, aiming to fool the detector into making wrong classiﬁcation decisions. Through this work, we propose µAttacking, a mutation-based general framework that can be used to evaluate the robustness of neural text detectors systematically. Our experiments demonstrate that µAttacking identiﬁes the detector’s ﬂaws effectively. Inspired by the insightful information revealed by µAttacking, we also propose an RR-training strategy, a straightforward but effective method to improve the robustness of neural text detectors through ﬁnetuning. Compared with the normal ﬁnetuning method, our experiments demonstrated that RR-training effectively increased the model robustness by up to 11.33% without increasing much effort when ﬁnetuning a neural text detector. We believe the µAttacking and RR-training are useful tools for developing and evaluating neural language models. Keywords: machine learning security; neural text generation; machine text detection; mutation testing 1. Introduction Since AlexNet [1], neural networks (NNs) have been the major driving force for rapid AI development in the last ten years and have shown promising results in various domains recently [2–10]. With the advanced techniques, NNs may produce textual and imagery content looking as if genuinely created by humans [11–13], which beneﬁts society in many domains, from medical image processing [14–17] to speech-to-text conversion [18,19], machine language translation [20,21], marketing communication [22,23], and so forth. However, such advanced technology also makes it easier to generate human-like content at a large scale for nefarious activities, for instance, generating misinformation [24,25] and targeting speciﬁc groups for political agenda [26,27]. This newly emerging threat is even more troubling with the recent development of large language models [28,29], such as OpenAI’s ChatGPT [30], GPT-4 [31], and Google’s Bard [32]. Researchers are actively developing neural text detectors for distinguishing neural text (i.e., NN-generated text) from human-written ones and have shown plausible performance for long sequence detection [33,34]. However, short sequence detection is still challenging with a performance that is not better than random guessing [26]. More importantly, pretrained neural text detectors are vulnerable to adversarial attacks, aiming to fool the detector into making wrong classiﬁcation decisions. Unfortunately, systematically evaluating the robustness of a neural text detector is still non-trivial due to the lack of existing evaluation tools. Electronics 2023, 12, 1948. https://doi.org/10.3390/electronics12081948 https://www.mdpi.com/journal/electronics