* Corresponding author address: Giovanni Raimondo, Polytechnic of Turin, Dept. of Electronics, C.so Duca Degli Abruzzi, 24, Torino (Italy), e-mail: giovanni.Raimondo@polito.it A MACHINE LEARNING TOOL TO FORECAST PM 10 LEVEL Giovanni Raimondo*, Alfonso Montuori, Walter Moniaci, Eros Pasero and Esben Almkvist Polytechnic of Turin, Italy and Earth Science Centre of Gothenburg, Sweden ABSTRACT The research activity described in this paper concerns the study of the phenomena responsible for the urban and suburban air pollution. The analysis carries on the work already developed by the NeMeFo (Neural Meteo Forecasting) research project for meteorological data short-term forecasting, Pasero (2004). The study analyzed the air-pollution principal causes and identified the best subset of features (meteorological data and air pollutants concentrations) for each air pollutant in order to predict its medium-term concentration (in particular for the PM10). The selection of the best subset of features was implemented by means of a backward selection algorithm which is based on the information theory notion of relative entropy. The final aim of the research is the implementation of a prognostic tool able to reduce the risk for the air pollutants concentrations to be above the alarm thresholds fixed by the law. The implementation of this tool will be carried out using the most wide- spread statistical data-learning techniques (Artificial Neural Networks, ANN, and Support Vector Machines, SVM). 1. INTRODUCTION The respect of the European laws concerning urban and suburban air pollution requires the analysis and implementation of automatic operating procedures in order to prevent the risk for the principal air pollutants to be above the alarm thresholds. The aim of the analysis is the medium-term forecasting of the air-pollutants mean and maximum values by means of meteorological actual and forecasted data. Critical air pollution events frequently occur where the geographical and meteorological conditions do not permit an easy circulation of air and a large part of the population moves frequently between distant places of a city. These events require drastic measures such as the closing of the schools and factories and the restriction of vehicular traffic. The forecasting of such phenomena with up to two days in advance would allow to take more efficient countermeasures to safeguard citizens health. In all the cases in which we can assume that the air pollutants emission and dispersion processes are stationary, it is possible to solve this problem by means of statistical learning algorithms that do not require the use of an explicit prediction model. The definition of a prognostic dispersion model is necessary when the stationarity conditions are not verified. It may happen for example when it is needed to forecast the air-pollutant concentration variation due to a large variation of the emission of a source or to the presence of a new source, or when it is needed to evaluate a prediction in an area where there are not measurement points. The Artificial Neural Networks (ANN) and the Support Vector Machines (SVM) have been often used as a prognostic tool for air pollution, Benvenuto (2000), Perez (2000), Božnar (2004). In particular SVMs are a recent statistical learning technique, based on the computational learning theory, which implements a simple idea and can be considered as a method to minimize the structural risk, Vapnik (1995). Even if we refer to these approaches as black-box methods, in as much as they are not based on an explicit model, they have generalization capabilities that make possibile their application to not-stationary situations. In particular, the combination of the predictions of a set of models to improve the final prediction represents an important research topic, known in the literature as stacking. A general formalism that describes such a technique can be found in Wolpert (1992). This approach consists in iterating a procedure that combines measurements data and data which are obtained by means of prediction algorithms, in order to use them all as the input to a new prediction algorithm. This technique was used in Canu (2001), where the prediction of the ozone maximum concentration with 24 hours in advance, for the urban area of Lyon (France), was implemented by means of a set of not linear models identified by different SVMs. The choice of the proper model was based on the meteorological conditions (geopotential label). The forecasting of ozone mean concentration for a specific day was carried out, for each model, taking as input variables the maximum ozone concentration and the maximum value of the air temperature observed in the previous day together with the maximum forecasted value of the air temperature for that specific day. The first step for the implementation of a prognostic neural network or SVM is the selection of the best subset of features that are going to be used as the input to the forecasting tool. The potential benefits of the features selection process are many: facilitating data visualization and understanding, reducing the measurement and storage requirements, reducing training and