Development of models for predicting toxicity from sediment chemistry by partial least squares-discriminant analysis and counter-propagation artiﬁcial neural networks Manuel Alvarez-Guerra a , Davide Ballabio b , Jose ´ Manuel Amigo c , Rasmus Bro c , Javier R. Viguri a, * a Department of Chemical Engineering and Inorganic Chemistry, ETSIIT, University of Cantabria, Avda. de los Castros s/n, 39005 Santander, Spain b Milano Chemometrics and QSAR Research Group, Department of Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy c Department of Food Science, Quality and Technology, Faculty of Life Sciences, University of Copenhagen, Rolighedsvej 30, 1958 Frederiksberg C, Denmark Models for predicting toxicity based on amphipod tests, derived using PLS-DA and CP-ANN, can be useful aids for screening-level sediment quality assessment. article info Article history: Received 29 May 2009 Received in revised form 12 August 2009 Accepted 16 August 2009 Keywords: Sediment Quality assessment Toxicity Prediction Mathematical models abstract There is strong interest in developing tools to link chemical concentrations of contaminants to the potential for observing sediment toxicity that can be used in initial screening-level sediment quality assessments. This paper presents new approaches for predicting toxicity in sediments, based on 10-day survival tests with marine amphipods, from sediment chemistry, by means of the application of Partial Least Squares-Discriminant Analysis (PLS-DA) and Counter-propagation Artiﬁcial Neural Networks (CP-ANNs) to large historical databases of chemical and toxicity data. The exploration of the internal structure of the developed models revealed inherent limitations of predicting toxicity from common chemical analyses of bulk contaminant concentrations. However, the results obtained in the validation of these models combined relevant values of non-error classiﬁcation rate, sensitivity and speciﬁcity of, respectively, 76, 87 and 73% with PLS-DA and 92, 75 and 97% with CP-ANNs, outperforming the results reported for previous approaches. Ó 2009 Elsevier Ltd. All rights reserved. 1. Introduction Sediments are an important environmental resource (Adams et al., 1992). However, they are also complex matrices that act as sinks of multiple chemicals, making risk assessment and sustain- able management difﬁcult and challenging (Apitz et al., 2005a). The assessment of sediment quality should thus be carried out through tiered decision-making frameworks in sequential steps of increasing complexity and cost (Chapman and Anderson, 2005; Chapman, 2007). Therefore, the development of tools to link chemical concentrations of contaminants to the potential for observing toxicity is a task of great interest, since these tools can be very useful in initial screening-level assessments that involve prioritizing samples and deciding an efﬁcient allocation of limited resources for subsequent management steps. According to this interest, different approaches that link toxicity to chemical concentrations in sediments have been developed. For example, diverse sediment quality guidelines (SQGs) (Wenning et al., 2005; Alvarez-Guerra et al., 2007), which relate the concen- trations of contaminants in sediments to some predicted frequency or intensity of biological effects, are widely used, and their predictive ability has been evaluated in numerous studies (e.g. Long et al., 1998; Fairey et al., 2001; Vidal and Bay, 2005; McCready et al., 2006a). Other works have also conducted correlation analyses between toxicological results and sediment contamination, using comparisons of contaminant concentrations to SQGs (Thompson et al., 1999; McCready et al., 2006b). A further approach is based on the development of individual chemical logistic regression models (LRMs) that relate chemical concentration to a predicted proba- bility of toxicity (Field et al., 1999, 2002). Models based on multiple logistic regression have also been developed (Smith et al., 2003), in which chemicals were combined using Principal Component Analysis or stepwise logistic regression to form an overall concentration, and this weighted average of chemicals was then used to estimate the probability of toxicity, although the effects of interactions between individual chemicals were not included. In fact, current tools like SQGs or LRMs focus on estimating toxicity based on individual-contaminant models, and this can be an * Corresponding author. Tel.: þ34 942 201589; fax: þ34 942 201591. E-mail address: vigurij@unican.es (J.R. Viguri). Contents lists available at ScienceDirect Environmental Pollution journal homepage: www.elsevier.com/locate/envpol 0269-7491/$ – see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.envpol.2009.08.007 Environmental Pollution 158 (2010) 607–614