ELSEVIER Aerobiologia 14 (1998) 179-184 Aerobiologia lnlernatloul Jourrml of/m'obiology Predictive models in aerobiology: data transformation Francisco Javier Toro, Marta Recio, Maria del Mar Trigo *, Baltasar Cabezudo Departamento de Biologia Vegetal, Universidad de M~laga, Apartado de Correos 59 E-29080 Mfilaga, Spain Received 5 November 1996; recewed in revised form 14 July 1997; accepted 10 April 1998 Abstract This paper attempts to evaluate the effect of mathematical transformations of pollen and meteorogical data used in aerobiological forecasting models. Stepwise multiple regression equations were developed in order to facilitate short term forecasts during the pre-peak period. The daily mean pollen data (x,) expressed as number of pollen grains per cubic metre of air were used directly and transformed into different scales: log(x, + 1), ln((x~ 1000/Y p)+ 1) and x/x,, where Y~ p is the sum of the daily mean values throughout the season. Thirteen meteorological parameters and the variable time were used as forecasting variables. The most reliable forecasts were obtained with data transformed by 'square root' and with untransformed data. Based on the results obtained, we recommend that the data be transformed by means of the square root if they do not show a normal distribution and that non-linear statistics be used in this kind of study. 9 1998 Elsevier Science IreIand Ltd. All rights reserved. Keywords: Aerobiology; Pollen; Forecasting; Mathematical transformation of data I. Introduction One of the aims of aerobiology is to obtain statistical models to forecast short or medium term airborne pollen concentrations so that doctors and susceptible patients can be warned of the severity of the pollen season. The prediction models usually developed in aerobio- logical studies are based on receptor type models (Nor- ris-Hill, 1995). Forecasts can be made on a short term basis, one or two days ahead; or for longer periods in an attempt to forecast other events, such as the date of the beginning and end of the season or the peak day therein (Larsson, 1993). In aerobiology, most studies related with forecasting models were concentrated on the development of long-term models. The most repre- sentative works along these lines might include those of Davies and Smith (1973), Bringfelt et al. (1982), Frenguelli et al. (I989), Driessen et al. (1989, 1990), Emberlin et al. (1990, 1993), Arobba et al. (1992), *Corresponding author. Tel.: +34 95 2131912; fax: +34 95 2131944; e-mail: aerox@uma.es Galfin et al. (1995). The models themselves range from those based on correlations (Ljungkvist et al., 1977; Mfikinen, 1977) to those involving non-linear univari- ants (Ant6para et al., 1995; Norris-Hill, 1995), although most use linear and parametric statistics (usually step- wise multiple regression). In general, the models so far developed formulate a forecasting equation by taking data from a series of years and applying these equations, in order to show the goodness of the algorithm obtained. Such models provide excellent forecasts, which are usually repre- sented graphically. They involve the previous manipula- tion of pollen registered by submission to various mathematical transformations in order to normalize the data, to use parametric statistics or to obtain better fits to the equation to be written. Once the forecast has been made, the results are transformed to the number of pollen grains per cubic metre of air. However, it may be asked whether a higher value of the determination coefficient (adj. R 2) better reflects the actual situation. The aim of this study is to evaluate the different transformations usually applied to pollen data in the construction of forecasting models and to attempt to 0393-5965/98/$ - see front matter 9 1998 Elsevier Science Ireland Ltd. All rights reserved. PII S0393-5965(98100025-0