Similarity indices of meteo-climatic gauging stations for missing data handling: definition and comparison with the MICE method E. Barca 1,* , G. Passarella 1 1 Water Research Institute of the National Research Council, Department of Bari, Viale F. De Blasio, 5 70123 Bari, Italy; emanuele.barca@ba.irsa.cnr.it; giuseppe.passarella@ba.irsa.cnr.it ∗ Corresponding author Abstract. The meteo-climatic datasets are at the basis of a great deal of studies on environmental state and its consequent management. In this frame, the completeness of meteo-climatic datasets is required for accurate and reliable analysis. Unfortunately, completeness is a rare in practice and, consequently, a preliminary treatment for filling in all gaps is needed. In this work, two intuitive and easy procedures for handling missing data are presented based on the “similarity station” concept. Finally, a comparison between the proposed methods and the Multiple Imputation Chained Equations, which is the state of the art in the field of missing data handling, has been carried out. Keywords. Missing data; Time series; Multiple Imputation Chained Equations; Similarity methods. 1 Introduction Cimatic series are rarely complete, usually because of malfunctioning, effects of extreme events on the probes, etc. Consequently, a preliminary formal treatment of the time series is needed in order to fill all the gaps in. Such a treatment is very critical mostly because it is (i) inherently time consuming, particularly for long time series and large amount of missing data; (ii) affected by a high level of uncertainty, particularly for variables irregularly distributed in space and time; (iii) strongly dependent on the missing data mechanism ([2]); (iv) a blind estimation and only a global reliability can be assessed by means of population statistics. At present, a number of robust and powerful methods exist for missing data handling such as the Multiple Imputation Chained Equations (MICE) ([3]) and the Expectation-Maximization (EM) ([1]), which have been designed so that the estimation takes into account the available numerical and distributional information. Such methods revealed their efficacy also in cases where the missing data percentage is particularly severe, overcoming the critical threshold of 15/20%; nevertheless, some authors still claim the need of further investigations to definitively state their reliability ([4]). Furthermore, these methods are practically difficult to be implemented and not very intuitive. In a previous work ([2]) a methodological proposal was presented for a quick and reliable estimation of climatic missing data based on the concept of twin gauging stations. The proposed method is based on the intuitive concept of persistence, in time, of the spatial continuity of the climatic processes. On this basis, a refined and improved methodology is presented for determining similar gauging stations through which estimating missing values. Statistical and topographic properties are combined in order to determine a “similarity matrix”. Given a gauging station whose time -series is affected by missing values, these are assessed “combining” the corresponding values of the n most similar stations. The proposed method and MICE were both applied to the rainfall gauging network of the Apulia Region (South-Eastern Italy). Statistical tests on both the estimated time series confirmed a substantial identity between the results of both the methods.