Discovering and Labeling Power System Events in Synchrophasor Data with Matrix Profile Jie Shi, Nanpeng Yu, Eamonn Keogh University of California Riverside Riverside, California 92521 Email: nyu@ece.ucr.edu Heng (Kevin) Chen Commonwealth Edison Oakbrook Terrace, Illinois 60181 Email: heng.chen@comed.com Koji Yamashita Michigan Technological University Houghton, Michigan 49931 Email: kyamashi@mtu.edu Abstract—An increasing number of phasor measurement units (PMUs) are being installed to improve power systems’ reliability and visibility throughout the world. Due to the high sampling speed, PMUs generate a large volume of streaming synchrophasor data. This huge dataset calls for robust and efficient data analytic tools to discover and label system events, which will greatly enhance the stability of power systems. In this work, we introduce a novel event discovery and labeling framework based on matrix profile. This framework is model-free, fast, scalable, and only requires one user-defined parameter. Since matrix profiles are built by measuring the similarities between subsequences of a time series, our approach has great potential in automatically labeling the system events in the synchrophasor data. Case studies are carried out on real-world PMU data to validate the effectiveness of the proposed framework. Index Terms—PMU, synchrophasor data, anomaly detection, matrix profile, time series. I. I NTRODUCTION Phasor measurement units (PMUs) are devices that provide time-synchronized measurements of different variables in a power grid. PMU data are also called synchrophasor data and are taken at high temporal resolutions such as 30 to 60 records per second [1]. This sampling speed is a significant improvement over the traditional supervisory control and data acquisition (SCADA) system which takes measurements every 2 to 4 seconds. Thanks to the fast streaming speed and high quality of synchrophasor data, the broad deployment of PMUs has greatly enhanced power systems’ visibility and reliability. For example, PMUs have been utilized to improve wide- area situational awareness [2], state estimation [3], system protection [4], and control [5]. Meanwhile, the high sampling frequency of PMUs brings the grid operators unprecedented quantities of data describing the system conditions with high temporal resolution. This huge streaming dataset calls for robust and efficient data analytic tools to discover hidden infor- mation in a timely manner. Valuable data-driven applications can be built upon these tools, thus benefiting the power system operation. In this work, we investigate the utilization of a novel data analytic tool called matrix profile (MP) to discover and label system events in real-world PMU data. Event (fault, anomaly, or disturbance) discovery based on PMU data has been extensively studied in the past decade. The existing approaches can be divided into model-based methods and model-free methods. Many of the model-based approaches develop estimates of the power system states based on the given model information. An anomaly is detected if the differences between raw measurements and the estimates are beyond certain thresholds. See [6] for an example. The perfor- mance of model-based methods heavily relies on the accuracy of model parameters. This dependency renders them less effective in real-world applications where model information is typically noisy [7]. To the best of the authors’ knowledge, most commercial software does not use model-based methods for anomaly detection. In addition, these approaches do not provide useful information for further labeling purposes. As a consequence, a lot more research efforts paid attention to the model-free analyses. The model-free methods can be further categorized into two classes, which are signal process- ing based approaches and machine learning based approaches. The key idea of signal processing based approaches is to moni- tor the coefficients of certain basis functions obtained through, for example, wavelet analysis [8], [9], [10]. A system event is detected if the ranges of certain coefficients (or some indices calculated from these coefficients) exceed the thresholds. This mechanism can work well with proper selection of wavelets, time and frequency resolution, threshold, etc. However, power system events are complex and diverse, making it difficult to select proper settings for different scenarios. Still, many researchers are working on these methods because they extract features from the raw event data that can be very useful for further clustering and labeling (classification). The machine learning based approaches can be further divided into supervised learning methods and unsupervised learning methods. Typical supervised learning methods used in anomaly detection include decision trees [11], K-nearest neighbor [12], support vector machines [13], extreme learning machines [14], and artificial neural networks [15]. These works usually discover systems events and classify them into different groups, which require a sufficient amount of labeled training data. In practice, however, there are two obstacles. First, most PMU datasets are still weakly labeled since manually labeling them would yield considerable labor cost. Moreover, the low frequency of system events over the entire time horizon makes the training data extremely unbalanced. These two issues need to be addressed before supervised learning methods can be effectively employed in real-world applications. The unsupervised learning methods,