Two ICS Security Datasets and Anomaly Detection Contest on
the HIL-based Augmented ICS Testbed
Hyeok-Ki Shin
∗
The Afliated Institute of ETRI
Daejeon, Republic of Korea
hkshin721@nsr.re.kr
Woomyo Lee
∗
The Afliated Institute of ETRI
Daejeon, Republic of Korea
wmlee@nsr.re.kr
Jeong-Han Yun
∗
The Afliated Institute of ETRI
Daejeon, Republic of Korea
dolgam@nsr.re.kr
Byung Gil Min
The Afliated Institute of ETRI
Daejeon, Republic of Korea
bgmin@nsr.re.kr
ABSTRACT
Security datasets with various operating characteristics and abnor-
mal situations of industrial control system (ICS) are essential to
develop artifcial intelligence (AI)-based control system security
technology. In this study, we built a hardware-in-the-loop (HIL)-
based augmented ICS (HAI) testbed and developed ICS security
datasets. Here, we introduce the second dataset (HAI 21.03), which
was developed with the user feedback of the frst released ver-
sion (HAI 20.07). All HAI datasets are publicly available at https:
//github.com/icsdataset/hai. HAI 21.03 was expanded by adding
data points and normal/attack scenarios to HAI 20.07. We also held
an AI-based anomaly detection contest (HAICon 2020) utilizing
the HAI datasets developed so far, giving many AI researchers
an opportunity to discuss and share ideas for ICS anomaly detec-
tion research. This paper presents the results of the HAICon 2020.
The results of the top teams in the competition can be used as a
performance comparison criterion when using HAI 21.03.
CCS CONCEPTS
· Information systems → Process control systems; · Comput-
ing methodologies → Anomaly detection.
KEYWORDS
security dataset, industrial control system, testbed, hardware-in-
the-loop, anomaly detection, artifcial intelligence
ACM Reference Format:
Hyeok-Ki Shin, Woomyo Lee, Jeong-Han Yun, and Byung Gil Min. 2021.
Two ICS Security Datasets and Anomaly Detection Contest on the HIL-
based Augmented ICS Testbed. In Cyber Security Experimentation and Test
Workshop (CSET ’21), August 9, 2021, Virtual, CA, USA. ACM, New York, NY,
USA, 5 pages. https://doi.org/10.1145/3474718.3474719
∗
These authors contributed equally to this research.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
CSET ’21, August 9, 2021, Virtual, CA, USA
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-9065-1/21/08. . . $15.00
https://doi.org/10.1145/3474718.3474719
1 INTRODUCTION
The detection of possible cyber attacks or unexpected failures in
industrial control systems (ICSs), such as water pumps, power grids,
and nuclear power plants, is crucial for the prevention of dire con-
sequences [2]. Although there is a growing number of studies on
ICS security, there is still a lack of open datasets that can be used
for research. Thus, our goal is to create datasets for ICS security re-
searchers working on anomaly detection. For this purpose, we frst
implemented a hardware-in-the-loop (HIL)-based augmented ICS
(HAI) testbed[3] to generate accurate datasets for various scenarios
while minimizing human efort.
Table 1: Release overview of HAI security datasets. HAI
20.07 is a bug fx release of the frst version [4] and the 2nd
version HAI 21.03 is released in March 2021.
Version
Data
points
(points/sec)
Training set Test set
File
(CSV)
Duration
(hours)
Size
(MB)
File
(CSV)
Attack
count
Duration
(hours)
Size
(MB)
HAI 21.03 78
train1 60 110 test1 5 12 22
train2 63 116 test2 20 33 62
train3 229 246 test3 8 30 56
test4 5 11 20
test5 12 26 48
HAI 20.07 59
train1 86 127 test1 28 81 119
train2 91 98 test2 10 42 62
An HAI dataset (HAI 20.07
1
)[4] was released at https://github.
com/icsdataset/hai. After the frst release of this dataset, we devel-
oped a new version of the dataset (HAI 21.03) using the HAI testbed.
Considering user opinions on the frst dataset, we focus on three
key issues for ICS anomaly detection research.
• Reconfguration of the testbed: The scaling and biasing fac-
tors of the analog signal between the HIL simulator and the
physical system were reconfgured to increase the mutual
infuence, hence the establishment of a new testbed with
new response characteristics.
• Causality of the dataset: Data collection points were added to
clearly interpret the causal relationship of the control process.
(e.g., set points, process variables, and control outputs)
• Various normal and attack scenarios: some scenarios were
additionally developed in the reconfgured testbed for learn-
ing and anomaly detection performance evaluation.
1
The initial version name of the HAI dataset was HAI 1.0 [4], but the version numbering
scheme was changed to specify the release date of the dataset.