Machine Learning-Based Clustering of Load Profiling to Study the Impact of Electric Vehicles on Smart Meter Applications Saeed Ahmed, Zafar Ali Khan Mirpur University of Science & Technology, Pakistan saeed.ahmed@must.edu.pk Noor Gul University of Peshawar, Pakistan noor@uop.edu.pk Junsu Kim, Su Min Kim Korea Polytechnic University, Korea suminkim@kpu.ac.kr AbstractThe data collected from advanced metering infrastructure enables the electric utilities to develop a deep insight about the energy consumption behavior of the consumer. However, the load signature and consumption pattern varies due to addition of multiple types of new loads, such as electric vehicles (EVs). Therefore, it becomes imminent to further dig down these variations. To this end, this paper investigates the impacts of insertion of EV profiles in the household level smart meter data. The Irish CER dataset and EV data from the NREL residential PEV are utilized in this study to classify the users with and without EVs’ loads. The results show that change in the cluster membership can help to separate the consumers with the EV load from the stand-alone consumers without the EV load. Keywords— Data clustering; electric vehicles; load profiling; smart meter I. INTRODUCTION The smart meters deliver meticulous knowledge about the individual consumers’ load patterns that can be further utilized to control the loads even at individual household [1, 2]. The challenges faced by the curse of dimensionality of the data can be managed by classifying the consumers into different classes to extract typical load profiles using machine learning-based techniques. Extraction of load patterns from smart meter data is a cumbersome process and it can be tackled by supervised or unsupervised machine learning (ML) techniques [3]. Multiple studies have been carried out to classify the pattern in the unlabelled smart meter data, however, impact of integration of electric vehicles (EVs) on consumer classification is a promising area. Information of EV charging may help the electric utilities to predict load and also to comprehend temporal and spatial aspects for: 1) load scheduling and 2) Evade distribution network renovations [4]. The consumers with EVs hide the purchase of EV usually from utility resulting in shift in their energy consumption pattern without the knowledge of utility, leading in wrong categorization of such consumers. Broadly, non-intrusive load monitoring is employed to disaggregate load for EV detection. However, it is an complicated technique that requires high granularity of data at frequency of seconds [5]. In this paper, we investigate the impact of inclusion of EVs at consumer level considering different diffusion levels of EV charging profiles. The smart meter data from Irish CER dataset [6] with 30 minutes resolution is interpolated to 10 minutes resolution to embed EV charging profiles with 10 minutes resolution. The profiles with and without EVs are clustered and changed in a cluster membership due to the inclusion of EVs. Accordingly, the impact of EVs are investigated in this paper. The rest of the paper is arranged as follows. Section II explains the proposed scheme. The case studies and results with their analysis are presented in Section III. The paper is concluded in Section IV. II. METHODOLOGY This paper aims to investigate the clustering of the load profiles inclusive of EVs. The Irish CER smart meter dataset employed in this work [6] contains data snapshots with a frequency of 30 minutes for more than 5,000 residential and small business consumers for a period of 18 months. The EV charging data used in this case is from 2009 RECS data set provided by NREL [7]. 200 random customers are selected from the smart meter dataset and similarly, 30 EV charging profiles are selected from the NREL dataset for case studies. A flowchart of the proposed scheme adopted for the case studies is given in Figure 1. Figure 1: Flowchart Proposed scheme The proposed scheme is explained as follows: A. Data Pre-processing In the data pre-processing stage, the first step is to ensure the quality of the data. To ensure high data quality, the outliers are removed, and data is cleansed by removing the erroneous values. Potential hardware failures in the first month can lead to zero kWh readings, therefore, all such readings are removed 444 978-1-7281-6476-2/21/$31.00 ©2021 IEEE ICUFN 2021