Two ICS Security Datasets and Anomaly Detection Contest on the HIL-based Augmented ICS Testbed Hyeok-Ki Shin The Afliated Institute of ETRI Daejeon, Republic of Korea hkshin721@nsr.re.kr Woomyo Lee The Afliated Institute of ETRI Daejeon, Republic of Korea wmlee@nsr.re.kr Jeong-Han Yun The Afliated Institute of ETRI Daejeon, Republic of Korea dolgam@nsr.re.kr Byung Gil Min The Afliated Institute of ETRI Daejeon, Republic of Korea bgmin@nsr.re.kr ABSTRACT Security datasets with various operating characteristics and abnor- mal situations of industrial control system (ICS) are essential to develop artifcial intelligence (AI)-based control system security technology. In this study, we built a hardware-in-the-loop (HIL)- based augmented ICS (HAI) testbed and developed ICS security datasets. Here, we introduce the second dataset (HAI 21.03), which was developed with the user feedback of the frst released ver- sion (HAI 20.07). All HAI datasets are publicly available at https: //github.com/icsdataset/hai. HAI 21.03 was expanded by adding data points and normal/attack scenarios to HAI 20.07. We also held an AI-based anomaly detection contest (HAICon 2020) utilizing the HAI datasets developed so far, giving many AI researchers an opportunity to discuss and share ideas for ICS anomaly detec- tion research. This paper presents the results of the HAICon 2020. The results of the top teams in the competition can be used as a performance comparison criterion when using HAI 21.03. CCS CONCEPTS · Information systems Process control systems; · Comput- ing methodologies Anomaly detection. KEYWORDS security dataset, industrial control system, testbed, hardware-in- the-loop, anomaly detection, artifcial intelligence ACM Reference Format: Hyeok-Ki Shin, Woomyo Lee, Jeong-Han Yun, and Byung Gil Min. 2021. Two ICS Security Datasets and Anomaly Detection Contest on the HIL- based Augmented ICS Testbed. In Cyber Security Experimentation and Test Workshop (CSET ’21), August 9, 2021, Virtual, CA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3474718.3474719 These authors contributed equally to this research. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. CSET ’21, August 9, 2021, Virtual, CA, USA © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-9065-1/21/08. . . $15.00 https://doi.org/10.1145/3474718.3474719 1 INTRODUCTION The detection of possible cyber attacks or unexpected failures in industrial control systems (ICSs), such as water pumps, power grids, and nuclear power plants, is crucial for the prevention of dire con- sequences [2]. Although there is a growing number of studies on ICS security, there is still a lack of open datasets that can be used for research. Thus, our goal is to create datasets for ICS security re- searchers working on anomaly detection. For this purpose, we frst implemented a hardware-in-the-loop (HIL)-based augmented ICS (HAI) testbed[3] to generate accurate datasets for various scenarios while minimizing human efort. Table 1: Release overview of HAI security datasets. HAI 20.07 is a bug fx release of the frst version [4] and the 2nd version HAI 21.03 is released in March 2021. Version Data points (points/sec) Training set Test set File (CSV) Duration (hours) Size (MB) File (CSV) Attack count Duration (hours) Size (MB) HAI 21.03 78 train1 60 110 test1 5 12 22 train2 63 116 test2 20 33 62 train3 229 246 test3 8 30 56 test4 5 11 20 test5 12 26 48 HAI 20.07 59 train1 86 127 test1 28 81 119 train2 91 98 test2 10 42 62 An HAI dataset (HAI 20.07 1 )[4] was released at https://github. com/icsdataset/hai. After the frst release of this dataset, we devel- oped a new version of the dataset (HAI 21.03) using the HAI testbed. Considering user opinions on the frst dataset, we focus on three key issues for ICS anomaly detection research. Reconfguration of the testbed: The scaling and biasing fac- tors of the analog signal between the HIL simulator and the physical system were reconfgured to increase the mutual infuence, hence the establishment of a new testbed with new response characteristics. Causality of the dataset: Data collection points were added to clearly interpret the causal relationship of the control process. (e.g., set points, process variables, and control outputs) Various normal and attack scenarios: some scenarios were additionally developed in the reconfgured testbed for learn- ing and anomaly detection performance evaluation. 1 The initial version name of the HAI dataset was HAI 1.0 [4], but the version numbering scheme was changed to specify the release date of the dataset.