A fuzzy clustering approach to improve the accuracy of Italian students’data. An experimental procedure to correct the impact of the outliers on assessment test scores Claudio Quintano 1 , Rosalia Castellano 2 , Sergio Longobardi 3 1 Univerisity of Naples “Parthenope”, e-mail: claudio.quintano@uniparthenope.it 2 Univerisity of Naples “Parthenope”, e-mail : lia.castellano@uniparthenope.it 3 Univerisity of Naples “Parthenope”, e-mail: sergio.longobardi@uniparthenope.it Abstract The paper describes an experimental procedure for improving the accuracy of data collected by the Italian National Evaluation Institute of the Ministry of Education (INVALSI). The INVALSI’s survey is a national standardised assessment that aims to evaluate, every year, the student’s knowledge of reading, mathematics and science at primary and secondary level. The paper focuses on the presence of outlier units, at class level, that may introduce an upward bias in the distribution of the average scores by class. Then we propose a two-stage method for evaluating and correcting the overestimation of children ability that has been found at the primary classes. At the first stage, classes of students with both very high average score and the within variability close to zero have been detected through a factorial analysis. The second stage consists in implementing a weighting system that assigns a weight to every class based on the probability of belonging to the set of outlier units which is calculated by a fuzzy clustering algorithm. The final output of this procedure is a modified distribution that shows a decrease in the mean, median and mode with respect to the original one. Moreover, the correction factor is able to improve the skewness and to smooth the data distribution. Finally, the main features of units with high probability to be classified as outliers are analyzed in order to evaluate a relationship between the geographical distribution of classes and the presence of outliers. Keywords: correction of outlier data, data accuracy, assessment test scores 1. Introduction Outliers are generally identified as observations which appear to be inconsistent with the remaining of the data (Barnett and Lewis, 1994). Many studies focus on detection of outlier units (Hawkins, 1980; Hodge and Austin, 2004) and propose several methods to deal with this problem (Iglewicz and Hoaglin, 1993). In this paper, we introduce a new approach to outlier analysis in which the detection is carried out on data with a hierarchical structure and a complex pattern of variability, e.g. pupils in classes, employees in firms, etc. In particular, we analyze students’ data in which the