1 Low Voltage Customer Phase Identiﬁcation Methods Based on Smart Meter Data Alexander Hoogsteyn * , Marta Vanin *† , Arpan Koirala *† , Dirk Van Hertem *† * KU Leuven, Dept. of Electrical Engineering (ESAT), Kasteelpark Arenberg 10, 3001, Heverlee, Belgium † EnergyVille, Thor Park 8310, 3600, Genk, Belgium Abstract—The increased deployment of distributed energy generation and the integration of new, large electric loads such as electric vehicles and heat pumps challenge the correct and reliable operation of low voltage distribution systems. To tackle potential problems, active management solutions are proposed in the literature, which require distribution system models that include the phase connectivity of all the consumers in the network. However, information on the phase connectivity is in practice often unavailable. In this work, several voltage and power measurement-based phase identiﬁcation methods from the literature are implemented. A consistent comparison of the methods is made across different smart meter accuracy classes and smart meter penetration levels using publicly available data. Furthermore, a novel method is proposed that makes use of ensemble learning and that can combine data from different measurement campaigns. The results indicate that generally better results are obtained with voltage data compared to power data from smart meters of the same accuracy class. If power data is available too, the novel ensemble method can improve the accuracy of the phase identiﬁcation obtained from voltage data alone. Index Terms—clustering, ensemble learning, low voltage dis- tribution system, phase identiﬁcation, smart meter I. I NTRODUCTION T HE shift to more distributed generation [1] and integra- tion of new large loads such as electric vehicles and heat pumps, can lead to various problems in distribution systems (DS), such as increased imbalance and overvoltages. Historically, imbalance was not considered a problem since DS were typically underutilized and characterized by modest and unidirectional power ﬂows. The emergence of PV installations and larger loads implies more utilization and unpredictability, which might push the DS state beyond the acceptable opera- tional limits. To cope with the aforementioned issues, recent research on DS advocates for an improved system exploitation and control, through the so-called Active DS Management. Examples are optimal voltage control by means of tap changes [2] or resilient restoration [3]. A distribution system operator (DSO) could also preventively reconﬁgure the phase connection of its grid, such that a more balanced conﬁguration is obtained. For such applications, it is necessary to identify the present phase connectivity. Phase identiﬁcation methods try to classify to which phase(s) a consumer is connected, usually with the help of ﬁeld measurements. Manual determination of the phase labeling is often not viable because of the cost of this labour- intensive approach, especially since the majority of the cables are underground. Furthermore, cables are not color-coded. In literature, a wide variety of data-driven approaches are proposed. In [4], [5], ﬁeld measurement-based methods are described that use communicating phasor measurement units (PMU) between the consumer and the transformer. However, these devices are quite expensive and unlikely to be widely available in DS in the near future, especially at the low voltage levels. Smart meters, which are already being rolled out, are a more realistic option, at least in the near future [6]–[8]. Developing practically applicable methods that rely on realistic assumptions on available smart meter data is still an open research topic. The remainder of the paper is structured as follows: In Section II, the current literature on smart meter (SM) data based phase identiﬁcation methods is reviewed. In Section III, the mathematical formulation of the phase identiﬁcation methods used in the present work is given, including our novel ensemble learning method. In Section IV, the performance of the different methods is illustrated under a broad range of scenarios. Finally, conclusions are drawn. II. RELATED WORK Phase identiﬁcation methods that use SM data can be cate- gorized into 3 groups based on whether they use: a) mixed-in- teger programming (MIP) approaches typically using power data, b) machine learning, with voltage data (MLV), or c) ma- chine learning, with power data (MLP). In [9]–[13], a MIP ap- proach is used to solve the phase identiﬁcation problem. These optimization-based implementations need measurements of the consumers’ demand and the distribution transformer’s supply, and use the principle of conservation of power to determine which consumer is connected to which phase. The algorithm optimizes the allocation of the set of consumers to the three phases by minimizing the difference between the total feeder demand and the transformer supply. These MIP methods are inherently inferior at handling missing data or incomplete datasets, and typically assume that all users have a smart meter and there are no electricity thefts or similar phenomena [11]. MIP-based methods that require both voltage and power data have also been proposed [12]. The authors propose a method which allocates the phases ﬁrstly using voltage correlation, afterwards optimization techniques are used that is initiated with the voltage-based solution and then iteratively solved. To make these methods more rigorous, line losses should be modelled. However, this requires knowledge of line impedance arXiv:2204.06372v1 [eess.SY] 13 Apr 2022