Two-tier Data-Driven Intrusion Detection for Automatic Generation Control in Smart Grid Muhammad Qasim Ali †§ , Reza Yousefian , Ehab Al-Shaer , Sukumar Kamalasadan , Quanyan Zhu Department of Software and Information Systems Department of Electrical and Computer Engineering University of North Carolina Charlotte, Charlotte, NC Princeton University, Princeton, NJ § Symantec Corporation, Mountain View, CA Email: {mali12, ryousefi, ealshaer, skamalas}@uncc.edu, quanyanz@princeton.edu Abstract—Legacy energy infrastructures are being replaced by modern smart grids. Smart grids provide bi-directional communications for the purpose of efficient energy and load management. In addition, energy generation is adjusted based on the load feedback. However, due to the dependency on the cyber infrastructure for load monitoring and reporting, generation control is inherently vulnerable to attacks. Recent studies have shown that the possibility of data integrity attacks on the generation control can significantly disrupt the energy system. In this work, we present simple yet effective data-driven two- tier intrusion detection system for automatic generation control (AGC). The first tier is a short-term adaptive predictor for system variables, such as load and area control error (ACE). The first tier provides a real-time measurement predictor that adapts to the underlying changing behavior of these system variables, and flags out the abnormal behavior in these variables independently. The second tier provides deep state inspection to investigate the presence of anomalies by incorporating the overall system variable correlation using Markov models. Moreover, we expand our second tier inspection to include multi-AGC environment where a behavior of one AGC is validated against the behavior of the interconnected AGC. The combination of tier-1 light-weight prediction and tier-2 offline deep state inspection offers a great advantage to balance accuracy and real-time requirements of intrusion detection for AGC environment. Our results show high detection accuracy ( 95%) under different multi-attack scenarios. Second tier successfully verified all the injected intrusions. I. I NTRODUCTION Smart grids have been replacing the legacy power in- frastructure as they provide efficient energy management by utilizing the bi-directional communications. Bi-directional communications enable the smart grid to take different sensor measurements using cyber infrastructure in order to control the power generation, transmission and distribution effectively and in real time. The bi-directional communications are as- sociated with the supervisory control and data acquisition system (SCADA). An important task of SCADA is automatic generation control which is responsible for adjusting the power generation according to the load in the area. Several threats have been targeted towards the SCADA system due to its dependency on the cyber infrastructure. According to a recent Bipartisan policy center report, a Wash- ington D.C. think tank, more than 150 cyber attacks targeted energy sector in 2013 [1]. There can be several entry points for an attacker to enter the SCADA system and/or control center including malware attachment in the email, malware on the storage device and WiFi enabled system in SCADA and/or control center. Moreover, SCADA systems and control centers are connected to the corporate offices using virtual private network, therefore, anybody having access to the corporate office can access the system. Attacks can be launched by two types of attacker i.e., naive and experienced/knowledged. Naive attackers lack the working knowledge of the smart grid system. On the contrary, experienced attackers may manipulate the generation control measurements such that it still satisfies the smart grid environment and look benign/normal. Although bad data detection algorithms provide some security to identify data integrity attacks, recent studies have shown that these algorithms can be bypassed by experienced attackers [15]. Moreover, attacks having attack vector after state estimation i.e., AGC, can not be detected by these detection algorithms. To this end, we present a data-driven two-tier intrusion detection approach. The first tier is an online short-term adap- tive predictor for both the load and Area Control Error (ACE), which are system variables in AGC. Load measurements are taken by the field sensors. However, ACE is calculated, in AGC, using the frequency and tie-line flow measurements. Generation control takes these measurements every few sec- onds. The basic hypothesis is that both the load and ACE have different behavior at different times of the day, therefore, at short intervals they exhibit a certain level of temporal dependence which can be used to predict their future behavior. The proposed predictor has the ability to adapt to the change in behavior of the variables. Since load and ACE forms the basis of calculation of set points and lowering/raising the generation, respectively, we use a data-driven approach to predict these variables. We show the prediction accuracy of the proposed predictor under normal conditions in a well known and widely used two-area power system model. Since the predictor shows high accuracy under normal conditions, therefore, deviations from the prediction can be flagged as anomalous. Prediction is done independently and does not take into account the other AGC system variables. Therefore, we build a Markov model of AGC using its system variables in the second tier of the intrusion detection system. The model incorporates the system-wide knowledge to detect anomalies. It observes the state transitions, where state is defined using multiple system variables, and calculates the individual variable probability given the system state. If the probability does not fall in the