Contents lists available at ScienceDirect Environmental Research journal homepage: www.elsevier.com/locate/envres Enhanced data validation strategy of air quality monitoring network Mohamed-Faouzi Harkat a , Majdi Mansouri b, , Mohamed Nounou a , Hazem Nounou b a Chemical Engineering Program, Texas A & M University at Qatar, Doha, Qatar b Electrical and Computer Engineering Program, Texas A & M University at QATAR, Doha, Qatar ARTICLE INFO Keywords: Data validation Air quality monitoring network Exponentially weighted moving average Generalized likelihood ratio test Midpoint-radii Principal component analysis ABSTRACT Quick validation and detection of faults in measured air quality data is a crucial step towards achieving the objectives of air quality networks. Therefore, the objectives of this paper are threefold: (i) to develop a modeling technique that can be used to predict the normal behavior of air quality variables and help provide accurate reference for monitoring purposes; (ii) to develop fault detection method that can eectively and quickly detect any anomalies in measured air quality data. For this purpose, a new fault detection method that is based on the combination of generalized likelihood ratio test (GLRT) and exponentially weighted moving average (EWMA) will be developed. GLRT is a well-known statistical fault detection method that relies on maximizing the de- tection probability for a given false alarm rate. In this paper, we propose to develop GLRT-based EWMA fault detection method that will be able to detect the changes in the values of certain air quality variables; (iii) to develop fault isolation and identication method that allows dening the fault source(s) in order to properly apply appropriate corrective actions. In this paper, reconstruction approach that is based on Midpoint-Radii Principal Component Analysis (MRPCA) model will be developed to handle the types of data and models as- sociated with air quality monitoring networks. All air quality modeling, fault detection, fault isolation and reconstruction methods developed in this paper will be validated using real air quality data (such as particulate matter, ozone, nitrogen and carbon oxides measurement). 1. Introduction Maintaining high air quality is a major environmental concern that has a profound impact on human health and the ecosystem. Various industrial euents, human activities, and meteorological factors con- tribute to the pollution of air by pollutants, such as carbon oxides, ni- trogen oxides, ozone, and particulate matter. Air quality monitoring networks are usually used to monitor the quality of air, not only to make sure that air quality standards are maintained, but also to allow taking any necessary preventive or corrective measures to minimize the eect of possible undesirable changes in some of these pollutants. Proper data validation of air quality networks is crucial to achieve their intended purpose. Therefore, the objective of this paper is to develop a general framework technique that aims at enhancing the data valida- tion of air quality networks by developing: modeling technique that can accurately predict the behavior of air quality monitoring networks and any changes in pollution and/or meteorological conditions using dierent types of air quality data, monitoring technique that can quickly detect sensor faults or serious anomalies in air quality data, fault isolation method that can identify the root cause(s) of the detected fault(s), and fault estimation and data correction methods that allow providing meaningful information about the detected fault(s) that can ulti- mately be shared with the public. Modeling and monitoring of air quality networks are crucial to ensure safety and protection of humans and the environment. In gen- eral, monitoring approaches (Venkatasubramanian et al., 2003a, 2003b) can be classied as: model-based or data-driven approaches. Model-based monitoring approaches utilize predictions of process models to make decisions regarding the existence or absence of faults (Kinnaert, 2003; Nyberg and Nyberg, 1999). Hence, the eectiveness of such approaches is greatly inuenced by the quality of the process models. In the case where the dierence between the model prediction and process measurement is relatively small, this indicates that the process is operating normally and no fault exists. However, when such a dierence is relatively large, this is an indication that a fault has oc- curred (Kinnaert, 2003; Nyberg and Nyberg, 1999). Several model- based monitoring approaches have been developed, such as the parity space approaches (Staroswiecki, 2001; Ding and Frank, 1990; Patton http://dx.doi.org/10.1016/j.envres.2017.09.023 Received 15 March 2017; Received in revised form 19 September 2017; Accepted 20 September 2017 Corresponding author. E-mail address: majdi.mansouri@qatar.tamu.edu (M. Mansouri). Environmental Research 160 (2018) 183–194 0013-9351/ © 2017 Elsevier Inc. All rights reserved. MARK