KNN Regression as Geo-Imputation Method for Spatio-Temporal Wind Data Jendrik Poloczek, Nils Andr´ e Treiber, Oliver Kramer Computational Intelligence Group Carl von Ossietzky University 26111 Oldenburg, Germany Abstract. The shift from traditional energy systems to distributed sys- tems of energy suppliers and consumers and the power volatileness in re- newable energy imply the need for effective short-term prediction models. These machine learning models are based on measured sensor informa- tion. In reality, sensors might fail for several reasons. The prediction mod- els cannot naturally cope with missing data and a bias is introduced. The objective of this work is to propose the kNN regression as geo-imputation preprocessing step for pattern-label-based short-term wind prediction of spatio-temporal wind data sets. The approach is compared to three other methods. The evaluation is based on four turbines with neighbors of the NREL Western Wind Data Set and the values are missing uniformly distributed. The results show that kNN regression is the most superior method for imputation. Keywords: missing data, imputation, k-nearest-neighbor-regression, short term wind prediction, spatio-temporal 1 Introduction In recent years, there has been a significant increase in sustainable wind power plants. While these renewable energy resources are very appealing from an en- vironmental perspective, the volatileness renders the integration of the overall energy system difficult. Effective forecast systems allow balancing and integra- tion of multiple volatile power sources, see [1]. One field of forecast systems are short-term wind prediction systems. An overview of various approaches is given by [2]. Generally, machine learning models for short-term wind prediction are based on sensor information, see [3–7]. In reality, sensors might fail for several reasons. Usually, machine learning methodogies cannot naturally cope with miss- ing data, hence it is possible that a bias in the training set is introduced, see [8]. The objective of this work is to propose kNN regression as geo-imputation preprocessing step for pattern-label-based short-term wind prediction of spatio- temporal wind data sets. The approach is evaluated by comparing it with three different imputation methods: last observation carried forward (LOCF), linear interpolation and multiple linear regression. The evaluation is based on artifi- cially damaged time series, where the missing of the values is uniformly dis- tributed. This type of missing data is also known as missing at random (MAR).