Prediction of CO concentrations based on a hybrid Partial Least Square and Support Vector Machine model B. Yeganeh a, * , M. Shaﬁe Pour Motlagh b , Y. Rashidi b , H. Kamalan c a Faculty of Civil Engineering, K. N. Toosi University of Technology, No.1346, Vali Asr Street, Mirdamad Intersection, Tehran 1996715433, Iran b Faculty of Environment, University of Tehran, Tehran, Iran c Faculty of Civil Engineering, Islamic Azad University, Pardis Branch, Tehran, Iran article info Article history: Received 30 November 2011 Received in revised form 24 February 2012 Accepted 27 February 2012 Keywords: CO concentration Machine learning Support Vector Machine Partial Least Square Hybrid models abstract Due to the health impacts caused by exposures to air pollutants in urban areas, monitoring and fore- casting of air quality parameters have become popular as an important topic in atmospheric and envi- ronmental research today. The knowledge on the dynamics and complexity of air pollutants behavior has made artiﬁcial intelligence models as a useful tool for a more accurate pollutant concentration predic- tion. This paper focuses on an innovative method of daily air pollution prediction using combination of Support Vector Machine (SVM) as predictor and Partial Least Square (PLS) as a data selection tool based on the measured values of CO concentrations. The CO concentrations of Rey monitoring station in the south of Tehran, from Jan. 2007 to Feb. 2011, have been used to test the effectiveness of this method. The hourly CO concentrations have been pre- dicted using the SVM and the hybrid PLSeSVM models. Similarly, daily CO concentrations have been predicted based on the aforementioned four years measured data. Results demonstrated that both models have good prediction ability; however the hybrid PLSeSVM has better accuracy. In the analysis presented in this paper, statistic estimators including relative mean errors, root mean squared errors and the mean absolute relative error have been employed to compare performances of the models. It has been concluded that the errors decrease after size reduction and coefﬁcients of determination increase from 56 to 81% for SVM model to 65e85% for hybrid PLSeSVM model respectively. Also it was found that the hybrid PLSeSVM model required lower computational time than SVM model as expected, hence supporting the more accurate and faster prediction ability of hybrid PLSeSVM model. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction In establishing ambient air quality standards, regulations have been introduced in order to set limits on the emissions of pollutants. To achieve these limits, considerations have been given to mathe- matical and computer modeling of air pollution. Therefore, accurate models for air pollutant prediction are needed for such models to allow forecasting and diagnosing potential compliance or non- compliance in both short- and long-term applications. Hence, it is universally agreed that air quality models are indispensable tools for assessing the impact of air pollutants on human health and the urban environment (Chan and Chan, 2000; Gokhale and Khare, 2004; Sánchez et al., 2011). A large number of atmospheric dispersion models that aim to simulate the physical and chemical processes in the atmosphere have been used for air pollutant forecasting (Moussiopoulos et al., 1995; Yi and Prybutok, 1996). However, such models are unsuit- able in many operational settings because they require signiﬁcant computational efforts and a large volume of different input data. Owing to these inherent difﬁculties, stochastic models have been widely employed as an alternative to deterministic models to forecast air pollutant concentrations. Many linear (Robeson and Steyn, 1990) and non-linear regression models for concentration forecasting have also been reported (Chaloulakou et al., 2003; He et al., 2009). In recent years, based on the emission and meteorological data collected from air-monitoring networks round the world, the derivation of Soft Computing Models (SCMs) by using techniques, such as Artiﬁcial Neural Network (ANN), Mixture Model and Support Vector Machine (SVM) became popular for air quality prediction (Heo and Kim, 2004; Lu and Wang, 2008). Artiﬁcial neural network methods are regarded as cost-effective methods to achieve the prediction of air pollutants in time series that have become very popular in recent years (Gómez-Sanchis * Corresponding author. Tel.: þ98 9122583286; fax: þ98 21 88261079. E-mail address: bijan.yeganeh@yahoo.com (B. Yeganeh). Contents lists available at SciVerse ScienceDirect Atmospheric Environment journal homepage: www.elsevier.com/locate/atmosenv 1352-2310/$ e see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.atmosenv.2012.02.092 Atmospheric Environment 55 (2012) 357e365