Proceedings of the 2017 Winter Simulation Conference W. K. V. Chan, A. D’Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds. MELODY: SYNTHESIZED DATASETS FOR EVALUATING INTRUSION DETECTION SYSTEMS FOR THE SMART GRID Vignesh Babu Rakesh Kumar Hoang Hai Nguyen David M. Nicol Kartik Palani Elizabeth Reed Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL 61801, USA ABSTRACT As smart grid systems become increasingly reliant on networks of control devices, attacks on their inherent security vulnerabilities could lead to catastrophic system failures. Network Intrusion Detection Systems(NIDS) detect such attacks by learning traffic patterns and finding anomalies in them. However, availability of data for robust training and evaluation of NIDS is rare due to associated operational and security risks of sharing such data. Consequently, we present Melody, a scalable framework for synthesizing such datasets. Melody models both, the cyber and physical components of the smart grid by integrating a simulated physical network with an emulated cyber network while using virtual time for high temporal fidelity. We present a systematic approach to generate traffic representing multi-stage attacks, where each stage is either emulated or recreated with a mechanism to replay arbitrary packet traces. We describe and evaluate the suitability of Melodys datasets for intrusion detection, by analyzing the extent to which temporal accuracy of pertinent features is maintained. 1 INTRODUCTION The smart grid is representative of a cyber-physical system, which uses a networked set of devices that sense its state and take appropriate control decisions (e.g. open/close a circuit breaker). Due to vulnerabilities in the communication protocols, end-host firmwares and control algorithms, the smart grid’s control network becomes a potential attack vector. The recent trend in attacks on power grids indicates usage of sophisticated attack campaigns characterized by multi-stage exploits (Falliere, Murchu, and Chien 2011), (Bencs´ ath, P´ ek, Butty´ an, and Felegyhazi 2012), (Assante and LEE 2015). In a typical attack campaign, the attacker creeps through different network layers by stealing legitimate credentials and/or exploiting vulnerabilities in the network services, progressively acquires more privileged access to one or more of the “critical assets” before finally delivering the attack, e.g. opening multiple circuit breakers at once (Lee, Assante, and Conway 2016). Such multi-stage attacks are observable on both the cyber (e.g. packet counts measured at network devices) and physical (e.g. power values measured by a phasor measurement unit) attributes of the smart grid system. Machine learning based network intrusion detection systems (NIDS) can detect such multi-stage attacks by observing statistical patterns in such attributes. These systems are trained with historical data comprising of normal background and attack traffic; two kinds of training occurs, one on normal traffic, so as to be able to detect abnormalities by deviation from the norm, and separately on specific patterns from known attacks, to detect specific abnormalities. We are concerned with both kinds of training. The accuracy of a 1061 978-1-5386-3428-8/17/$31.00 ©2017 IEEE