Special Issue Paper Environmetrics Received: 30 April 2012, Revised: 21 October 2012, Accepted: 23 October 2012, Published online in Wiley Online Library: 20 November 2012 (wileyonlinelibrary.com) DOI: 10.1002/env.2185 Functional clustering of water quality data in Scotland R. A. Haggarty a * , C. A. Miller a , E. M. Scott a , F. Wyllie b and M. Smith b Assessing quality and quantity of water is of crucial importance to identify risks to the environment, society and human health. The European Community Water Framework Directive establishes guidelines for the classification of all water bodies across Europe and requires that all sites attain ‘good’ status by 2015. Classifications are made on the basis of a range of chemical and biological determinands. Within the directive, standing waters can be grouped, and the classifications of all members of the group are then based on the classification of a single representative lake within that grouping. Classification is based on different chemical and biological determinands. A key question is therefore how to determine ‘appropriate’ groups. We investigate and develop univariate and multivariate functional clustering models to investigate the spatiotemporal structure of determinands in a set of 21 Scottish lakes. These approaches enable sites to be grouped on the basis of one or more determinands; however, unlike with standard clustering methods, the temporal dynamics of the determinands are also taken into account in the formation of the groups. Copyright © 2012 John Wiley & Sons, Ltd. Keywords: cluster analysis; functional data analysis; monitoring; Water Framework Directive 1. INTRODUCTION Regular assessment of water quality is required to protect the environment, society and human health. A deterioration in water quality through issues such as eutrophication and cyanobacterial blooms presents substantial risk to human and animal health and plant and animal life and can have detrimental effects on the local economy. Long-term data records across multiple sites can be used to investigate water quality and risk factors statistically (Ferguson et al., 2008; Carvalho et al., 2011). However, logistically and financially, it is not plausible to employ continuous monitoring of all lakes. This is something of particular importance given the most recent Intergovernmental Panel on Climate Change technical report on water (Bates et al., 2008), which highlighted the importance of monitoring data but acknowledged that shrinkage of some observational networks was occurring. The European Union Water Framework Directive (WFD) (European Parliament, 2000) was introduced in 2003 to set compliance standards for water bodies across Europe, with an aim to prevent deterioration, and ensure that all sites reach ‘good’ status by 2015. It is a wide ranging piece of legislation and has several implications for how monitoring networks are defined and implemented. The status of a surface water body is determined by the poorer of its chemical or ecological status. Chemical status describes whether or not the concentration of any pollutant exceeds European Commission standards. Ecological status is principally a measure of the cumulative effects of human activities on river, lake, estuary or coastal water ecosystems. Each ecological status class (high, good, moderate, poor and bad) defined by the Directive represents a different level of disturbance from a reference state. One of the features of the WFD is that lakes can be grouped together and the classifications of all members of the group can then be based on the classification of a single representative site, enabling water quality to be predicted without monitoring. Consequently, before any monitoring is carried out, lake groupings have to be established and representative sites identified. Grouping sites that are similar in terms of these determinands is therefore of great importance, as wrongly specifying either the groups or the representative site within each group could potentially result in misclassification of all members and hence could miss potential environmental risks. In addition to deciding upon a group structure, there is also some question as to the optimal number of groups. If there is a large degree of variability amongst the lakes, then many groups may be required to capture this. If, however, the lakes are similar, there may be scope to reduce the number of groups, which would consequently reduce the amount of monitoring required to adequately reflect differences between lakes. The potential misclassification of sites precipitates the need for a group structure, which is based on the observed determinands used for classification. The aim of this paper is therefore to demonstrate and develop a statistical methodology, which is based on clustering sites on the basis of similar temporal patterns of the chemical determinands of interest. We propose using functional data analysis (Ramsay and Silverman, 1997) and in particular functional clustering techniques. Both nonprobabilistic-based and model-based functional clustering * Correspondence to: R. A. Haggarty, School of Mathematics and Statistics, University of Glasgow, 15 University Gardens, Glasgow G12 8QW, U.K. E-mail: r.haggarty.1@research.gla.ac.uk a School of Mathematics and Statistics, University of Glasgow, 15 University Gardens, Glasgow G12 8QW, U.K. b Scottish Environment Protection Agency, Clear Water House, Heriotwatt Research Park, Edinburgh EH14 4AP, U.K. This article is published in Environmetrics as a special issue on Modern quantitative methods for environmental risk assessment, edited by Lelys Bravo de Guenni, Cómputo Científico y Estadística, Universidad Simón Bolívar, Valle de Sartenejas. Carretera Baruta-Hoyo de La Puerta, Caracas, Miranda 1080-A, Venezuela, and Susan J. Simmons, Mathematics and Statistics, UNCW, 601 South College Road, Wilmington, NC 28403, U.S.A. Environmetrics 2012; 23: 685–695 Copyright © 2012 John Wiley & Sons, Ltd. 685