Citation: Liang, G.; Guerrero, J.;
Zheng, F.; Alsmadi, I. Enhancing
Neural Text Detector Robustness
with µAttacking and RR-Training.
Electronics 2023, 12, 1948.
https://doi.org/10.3390/
electronics12081948
Received: 17 March 2023
Revised: 15 April 2023
Accepted: 19 April 2023
Published: 21 April 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
Enhancing Neural Text Detector Robustness with μAttacking
and RR-Training
Gongbo Liang
1,
* , Jesus Guerrero
1
, Fengbo Zheng
2
and Izzat Alsmadi
1,
*
1
College of Arts and Sciences, Texas A&M University-San Antonio, San Antonio, TX 78224, USA
2
College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
* Correspondence: gliang@tamusa.edu (G.L.); ialsmadi@tamusa.edu (I.A.)
Abstract: With advanced neural network techniques, language models can generate content that
looks genuinely created by humans. Such advanced progress benefits society in numerous ways.
However, it may also bring us threats that we have not seen before. A neural text detector is a
classification model that separates machine-generated text from human-written ones. Unfortunately, a
pretrained neural text detector may be vulnerable to adversarial attack, aiming to fool the detector into
making wrong classification decisions. Through this work, we propose µAttacking, a mutation-based
general framework that can be used to evaluate the robustness of neural text detectors systematically.
Our experiments demonstrate that µAttacking identifies the detector’s flaws effectively. Inspired
by the insightful information revealed by µAttacking, we also propose an RR-training strategy, a
straightforward but effective method to improve the robustness of neural text detectors through
finetuning. Compared with the normal finetuning method, our experiments demonstrated that
RR-training effectively increased the model robustness by up to 11.33% without increasing much
effort when finetuning a neural text detector. We believe the µAttacking and RR-training are useful
tools for developing and evaluating neural language models.
Keywords: machine learning security; neural text generation; machine text detection; mutation
testing
1. Introduction
Since AlexNet [1], neural networks (NNs) have been the major driving force for rapid
AI development in the last ten years and have shown promising results in various domains
recently [2–10]. With the advanced techniques, NNs may produce textual and imagery
content looking as if genuinely created by humans [11–13], which benefits society in many
domains, from medical image processing [14–17] to speech-to-text conversion [18,19],
machine language translation [20,21], marketing communication [22,23], and so forth.
However, such advanced technology also makes it easier to generate human-like content
at a large scale for nefarious activities, for instance, generating misinformation [24,25]
and targeting specific groups for political agenda [26,27]. This newly emerging threat
is even more troubling with the recent development of large language models [28,29],
such as OpenAI’s ChatGPT [30], GPT-4 [31], and Google’s Bard [32]. Researchers are
actively developing neural text detectors for distinguishing neural text (i.e., NN-generated
text) from human-written ones and have shown plausible performance for long sequence
detection [33,34]. However, short sequence detection is still challenging with a performance
that is not better than random guessing [26]. More importantly, pretrained neural text
detectors are vulnerable to adversarial attacks, aiming to fool the detector into making
wrong classification decisions. Unfortunately, systematically evaluating the robustness of a
neural text detector is still non-trivial due to the lack of existing evaluation tools.
Electronics 2023, 12, 1948. https://doi.org/10.3390/electronics12081948 https://www.mdpi.com/journal/electronics