Research Article Environmetrics Received: 5 May 2015, Revised: 21 October 2015, Accepted: 18 December 2015, Published online in Wiley Online Library: 17 January 2016 (wileyonlinelibrary.com) DOI: 10.1002/env.2382 New control chart for monitoring and classification of environmental data Christian Paroissin a , Laura Penalva b , Agnès Pétrau b and Ghislain Verdier a * The on-line monitoring of water quality is of crucial interest. Control charts are well suited to perform this monitoring. However, these statistical methods need to be adapted to the particularity of the environmental data studies. In this paper, new control charts are developed to treat the case of a French river for which the parameter of interest, the Dissolved Oxy- gen Concentration (DOC), is characterized by a non-stationary and seasonal time evolution. The principle is to construct a test statistic, not directly based on the variable of interest, but rather on its seasonal regularity when the system is under control. The methods are studied through numerical simulation and is applied on real data. The ability of the control chart to be used as a classifier in a retrospective way is also studied. Copyright © 2016 John Wiley & Sons, Ltd. Keywords: control charts; statistical process control; water quality monitoring 1. INTRODUCTION Nowadays, process control is paramount, especially in environmental applications. The development of techniques able to supervise dynamic systems and thus to detect problems or anomalies, such as pollution, is of great interest. The tools provided by the Statistical Process Control theory, and notably the control charts (see Montgomery (1996) for an overview), are well suited to treat this type of issues and are often used for water-quality monitoring (Lee et al., 2013; Maurer et al., 1999; Zimmerman et al., 1996). The aim of this paper is to study a new statistical approach for the continuous monitoring of the physico-chemical quality of aquatic environments (rivers, lakes, etc.). Such a monitoring aims to highlight potential impacts of urban effluents (due to the industry, waste water, etc.), which could have consequences on the environment: production of drinking water, human activities (such as bathing), and so on. Several parameters can be the subject to a monitoring as water temperature, DOC, pH, conductivity, turbidity, and so on. (Gonçalves and Alpuim, 2011). All these parameters are characterized (see later) by a non-stationary dynamic evolution with a seasonal component. This characteristic must be taken into consideration by the statistical approaches, whose objective is twofold: on one hand the procedures must be able to detect, on-line and as rapidly as possible, an abrupt change in the time evolution of the variable of interest (passage from a considered “normal” state to an “abnormal” state) and on the other hand, the procedure must be retrospectively used to analyse the data on a given period (for example annually). A control chart (Oakland (2007) for a comprehensive review) could be seen as a sequential statistical test between two hypothesis: H 0 : “the system is under control” against H 1 : “the system is out of control”. Starting from a learning sample of observations collected during a “normal” functioning mode, the principle is to construct a test statistic (i.e., a random variable), which is compared, on-line, with a control region (typically two control limits, an upper and a lower limit) in order to detect a change in the distribution of the observations. The objective is to detect, as rapidly as possible, a change while maintaining an acceptable false alarm rate. The first control charts were proposed by Shewhart (1931) with the objective of detecting a change in the mean or the variance in a series of independent Gaussian observations. Rapidly, control charts were developed to treat more complex problems: non-Gaussian data (see Chakraborti et al. (2011) for an overview), multivariate observations (with the famous Hotelling T 2 control chart (Hotelling, 1947)), or autocorrelated processes. Concerning the latter problem, two main approaches are developed in the literature. The first one is a residual-based control chart (see Alwan and Roberts (1988), Montgomery and Mastrangelo (1991), and Pan and Chen (2008) for example) in which a model (an ARMA model or other) is fitted to the data. The monitoring is then applied on the residuals, which are identically distributed if the model is exact, and can be treated by traditional methods. The second approach (see Apley and Tsung (2002) for an example) consists of forming a p-dimensional vector constituted of the present observation and the p 1 last observations, and applying multivariate control chart (like the Hotelling T 2 ) to this vector. Most of the time, such methods require a stationarity assumption on the observations. Few works have been conducted for non-stationary processes (Raza et al. (2015)), as it is the case for the real data treated in this paper. * Correspondence to: Ghislain Verdier, Laboratoire de Mathématiques et de leurs Applications - UMR CNRS 5142, Université de Pau et des Pays de l’Adour, Avenue de l’Université, 64013 Pau cedex, France. E-mail: ghislain.verdier@univ-pau.fr a Laboratoire de Mathématiques et de leurs Applications - UMR CNRS 5142, Université de Pau et des Pays de l’Adour, Avenue de l’Université, 64013 Pau cedex, France b Rivages Pro Tech SUEZ Eau France - LDE, 2 Allée Théodore Monod, Bâtiment Hanami, Technopôle Izarbel, 64210 Bidart, France Environmetrics 2016; 27: 182–193 Copyright © 2016 John Wiley & Sons, Ltd. 182