P 3.7 AIRBORNE POLLEN FORECASTING: EVALUATION OF ARIMA AND NEURAL NETWORK MODELS Bachisio Arca* a , Grazia Pellizzaro a , Annalisa Canu a , Giuseppe Vargiu b a CNR - IBIMET, Institute of Biometeorology – Section of Monitoring Agroecosystems, Sassari, Italy b Osservatorio Aerobiologico SS1, Sassari, Italy 1. INTRODUCTION Early forecasting of pollen concentration in the atmosphere is very important for medical applications due to the increasing occurrence of allergic diseases induced by allergenic pollen. Moreover, flowering and pollen dispersion are of great interest for agronomic studies of plant productions. Several statistical techniques have been used to forecast pollen concentration in the air and concern the prediction of the start of the pollen season, the maximum airborne pollen concentration, and the date when this occurs. Some forecasting techniques are based on the analysis of airborne pollen time series, due to the the autocorrelation of daily pollen count and, in addition, of meteorological variables involved in the phenomena (Moseholm et al, 1987; Katial, 1997). A time series is a set of measurements of a variable taken over time at equally spaced time intervals. The most frequently used time series models include the autoregressive integrated moving average (ARIMA) models (Box and Jenkins, 1976); recently, artificial neural networks (ANN) have been applied in time series modeling and forecasting (Arizmendi, 1993; Patterson, 1996; Luk et al., 2000), due to their good performances with complex and non-linear phenomena (Smith, 1992). The aims of this study are (I) to develop both the ARIMA and neural network models for forecasting the daily values of airborne pollen and the day of the maximum pollen concentration, (II) to analyze and compare the performance of these models and (III) to improve the accuracy of airborne pollen forecasting for the principal allergenic plants of the Mediterranean area. 2. MATERIALS AND METHODS The study was carried out on aerobiological and meteorological data collected from 1986 to 2001 in the urban area of Sassari (40° 44’ lat. N, 8° 32’ lon E, 150 m a.s.l.), Italy; the pollen sampling device was a Burkard seven-day recording volumetric spore trap. The meteorological data collected by a weather station of the Sardinian Agrometeorological Service (S.A.R) located near th spore trap were air temperature, air relative humidity, wind speed and direction, and rain intensity. The data set was divided into two sections: the first section, composed * Corresponding author address: Bachisio Arca, CNR, Institute of Biometeorology, Section Monitoring Agroecosystems, Via Funtana di Lu Colbu 4A, 07100 Sassari, Italy; e-mail : arca@imaes.ss.cnr.it of thirteen years of data was used to calibrate (1986-1995) and to test (1996-1998) the models, the second, composed of three years (1999-2001), was used to validate the models. The analysis was performed on Graminacee and Oleaceae, the most important allergenic families in Sardinia (Atzei and Vargiu, 1990; Atzei et al., 1993). The main pollen season was determined by calculation of the time interval between the dates when the sum of daily concentration reaches 5 % and 95 % of the total annual sum; daily pollen count was then normalized to the annual sum of pollen count and a 5 days moving average was calculated. In addition, in order to stabilize the variance, the Oleaceae daily pollen count was transformed into natural logarithmic values. For each species, Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) were used to calculate the significant autocorrelation existing in the airborne pollen data and to identify the components of the ARIMA models. ANN models were realized using a three-layer feed-forward topology and the backpropagation learning optimization algorithm. The time series used for learning, testing and validation the ANN were transformed into a series of seven day vectors by the embedding method. The best ARIMA and ANN models were identified calculating the t statistic (t) and the correlation coefficient (r) between observed and predicted values. 3. RESULTS AND DISCUSSION The analysis of the ACF and PACF indicated that the time series of pollen follows a seasonal pattern with time lag of 365 days; a strong correlation at a lag of one day was also discovered. Therefore, the seasonal ARIMA(1,1,1)365 and ARIMA(2,1,1)365 models were tested; in this identification phase the model ARIMA(1,1,1)365 furnished the best results respect to the lower number of parameters used. The time series of pollen data was used to forecast the daily values of airborne pollen concentration for three years (1999-2001). The ARIMA model showed a good accuracy on Graminacee, where the difference between observed and expected dates of maximum airborne pollen concentration (peak dates) range from 1 to 4 days (Table 1); the model performed less well on Oleaceae (5-7 days) in all years of the validation data set. The Table 2 shows the statistics for the values of daily pollen count predicted by the ARIMA model. The Graminacee pollen count was very highly correlated