Decamouflage: A Framework to Detect
Image-Scaling Attacks on CNN
Bedeuro Kim
1,2
, Alsharif Abuadbba
2,3
, Yansong Gao
2,4
, Yifeng Zheng
2,5
, Muhammad Ejaz Ahmed
2
,
Surya Nepal
2,3
, Hyoungshick Kim
1,2
1
Department of Electrical and Computer Engineering, Sungkyunkwan University,
2
CSIRO’s Data61,
3
Cybersecurity CRC,
4
Nanjing University of Science and Technology,
5
Harbin Institute of Technology
kimbdr@skku.edu, sharif.abuadbba@data61.csiro.au, yansong.gao@data61.csiro.au,
yifeng.zheng@hit.edu.cn, ejaz.ahmed@data61.csiro.au, surya.nepal@data61.csiro.au, hyoung@skku.edu
Abstract—Image-scaling is a typical operation that processes
the input image before feeding it into convolutional neural
network models. However, it is vulnerable to the newly revealed
image-scaling attack. This work presents an image-scaling attack
detection framework, Decamouflage, consisting of three indepen-
dent detection methods: scaling, filtering, and steganalysis, to
detect the attack through examining distinct image characteris-
tics. Decamouflage has a pre-determined detection threshold that
is generic. More precisely, as we have validated, the threshold
determined from one dataset is also applicable to other different
datasets. Extensive experiments show that Decamouflage achieves
detection accuracy of 99.9% and 98.5% in the white-box and the
black-box settings, respectively. We also measured its running
time overhead on a PC with an Intel i5 CPU and 8GB RAM.
The experimental results show that image-scaling attacks can
be detected in milliseconds. Moreover, Decamouflage is highly
robust against adaptive image-scaling attacks (e.g., attack image
size variances).
Keywords—Image-scaling attack, Adversarial detection, Back-
door detection
I. I NTRODUCTION
Deep learning models have shown impressive success in solv-
ing various tasks [1], [2], [3], [4]. One representative domain
is the computer vision that is eventually the impetus for
the current deep learning wave [1]. The convolutional neural
network (CNN) models are widely used in the vision domain
because of its superior performance [1], [5], [2]. However, it
has been shown that deep learning models are vulnerable to
various adversarial attacks. Hence, significant research efforts
have been directed to defeat the mainstream of adversarial
attacks such as adversarial samples [6], [7], backdooring [8],
[9], and inference [10], [11].
Xiao et al. [12] introduced a new attack called image-
scaling attack (also referred to as camouflage attack) that
potentially affects all applications using scaling algorithms as
an essential pre-processing step, where the attacker’s goal is to
create attack images presenting a different meaning to humans
before and after a scaling operation. This attack would be
a serious security concern for computer vision applications.
Below we first give a concise example of the image-scaling
attack and exemplify its severe consequences.
Image-scaling attack example. Input of CNN models
typically takes fixed-size images such as 224 × 224 (repre-
senting the height, width) so as to reduce the complexity of
computations [2]. However, the size of raw input images can
be varied or become much larger (e.g., 800 × 600) than this
fixed-size. Therefore, the resizing or downscaling process is
a must before feeding such larger images into an underlying
CNN model. Xiao et al. [12] revealed that the image-scaling
process is vulnerable to the image-scaling attack, where an
attacker intentionally creates an attack image that is visually
similar to a base image for humans but recognized as a
target image by the CNN model after image-scaling function
(e.g., resizing or downscaling) is applied to the attack image.
Figure 1 illustrates an example of image-scaling attacks. The
‘wolf’ image is disguised delicately into the ‘sheep’ image as
base image to form an attack image. When the attack image
is down-sampled/resized, the ‘sheep’ pixels are discarded, and
the ‘wolf’ image is finally presented. General, image-scaling
attack abuses an inconsistent understanding of the same image
between humans and machines.
Fig. 1: Example of image-scaling attacks presenting a deceiving effect. The
left image shows what human sees before the scaling operation and the right
image shows what the CNN model sees after the scaling operation.
The strength of the image-scaling attack is its independence
on CNN models and data — it requires no knowledge of
training data and the model because it mainly exploits the
image-scaling function used for pre-processing. For image-
scaling attacks, only knowledge about the used image-scaling
function is required. It is noted that the attacker can relatively
easily obtain this information because a small number of
well-known image-scaling functions (e.g., nearest-neighbor,
bilinear, and bicubic interpolation methods) are commonly
used for real-world services, and a small number of input sizes
(e.g., 224 × 224 and 32 × 32) are used for representative CNN
models [12], as summarized in Table I. Furthermore, the pa-
63
2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
978-1-6654-3572-7/21/$31.00 ©2021 IEEE
DOI 10.1109/DSN48987.2021.00023
2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) | 978-1-6654-3572-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/DSN48987.2021.00023
Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 10,2021 at 17:19:48 UTC from IEEE Xplore. Restrictions apply.