1. INTRODUCTION Air quality monitoring is carried out to detect any significant pollutant concentrations, which have possible adverse effects on human health. However, such analysis is interrupted by the frequently large proportions of observations missing from the data due to mechanical failure, routine maintenance, changes in the siting of monitors, human error and other factors. There are three major problems that may arise when dealing with incomplete data. First, there is a loss of information and, as a consequence, a loss of effi- ciency. Second, there are several complications re- lated to data handling, computation and analysis, due to the irregularities in data structure and the impos- sibility of using standard software. Thirdly, and more important, the further analysis may be bias due to systematic differences between observed and un- observed data. At present, there are certain statistical softwares that can perform limited replacement of missing values such as SPSS (SPSS Incorporated, 2000). One approach to solve incomplete data problems is the adoption of imputation techniques (Little and Rubin, 1987). Therefore, this study focuses on sev- eral interpolation techniques to determine the best technique to replace missing values. 2. MATERIALS AND METHOD 2.1 Data Annual hourly monitoring records for PM 10 in Sebe- rang Perai, Pulau Pinang were selected to carry out the simulation of missing data. The test dataset con- sisted of particulate matter (PM 10 ) concentration on a time-scale of one per hour (hourly averaged) for one year. Table 1 below gives the summary of the particulate matter (PM 10 ). 8757 hourly concentra- tions are available with 0.03 percent (3 observations) of missing values. The standard deviation value (58.5 μg/m 3 ) shows some variability of PM 10 con- centration. This is confirmed with the range of val- ues from 8μg/m 3 to 718 μg/m 3 . The data is skewed to the right showing some occurrence of high con- centrations of PM 10 . ESTIMATION OF MISSING VALUES FOR AIR POLLUTION DATA USING INTERPOLATION TECHNIQUE 1 M.N. Norazian & 2 A. Mohd Mustafa Al Bakri 1 School of Environmental Engineering, 2 School of Material Engineering, Kolej Universiti Kejuruteraan Utara Malaysia, Perlis Y. Ahmad Shukri & R. Nor Azam School of Civil Engineering, Universiti Sains Malaysia ABSTRACT: Air pollution data such as PM 10 , sulphur dioxide, ozone and carbon monoxide are usually obtained using automated machines located at different sites. These are usually due to mechanical failure, rou- tine maintenance, changes in siting monitors and human error. The occurrence of missing values requires spe- cial attention on analyzing the data. Incomplete datasets can cause bias due to systematic differences between observed and unobserved data. Therefore, the need to find the best way in estimating missing values is very important so that the data analyzed is ensured of high quality. In this study, four types of imputation tech- niques that are linear, quadratic, cubic and nearest neighbour interpolations were used to replace the missing values. Annual hourly monitoring data for PM 10 were used to generate missing values. Five randomly simu- lated missing data were evaluated in order to test the efficiency of the methods used. They are 5%, 10%, 15%, 25% and 40%. Four types of performance indicators that are mean absolute error (MAE), root mean square er- ror (RMSE), coefficient of determination (R 2 ) and prediction accuracy (PA) were calculated to describe the goodness of fit for all the method. From all the method applied, it was found that linear interpolation method is the best method for estimating data for all percentages of simulated missing values. . Key words: air pollution, interpolation, performance indicators.