Environ Monit Assess (2012) 184:845–875 DOI 10.1007/s10661-011-2005-y On the use of multivariate statistical methods for combining in-stream monitoring data and spatial analysis to characterize water quality conditions in the White River Basin, Indiana, USA Andrew Gamble · Meghna Babbar-Sebens Received: 27 October 2010 / Accepted: 14 March 2011 / Published online: 1 April 2011 © Springer Science+Business Media B.V. 2011 Abstract Mechanistic hydrologic and water qual- ity models provide useful alternatives for esti- mating water quality in unmonitored streams. However, developing these elaborate models for large watersheds can be time-consuming and ex- pensive, in addition to challenges that arise during calibration when there is limited spatial and/or temporal monitored in-stream water quality data. The main objective of this research was to inves- tigate different approaches for developing mul- tivariate analysis models as alternative methods for rapidly assessing relationships between spatio- temporal physical attributes of the watershed and water quality conditions in monitored streams, and then using the developed relationships for estimating water quality conditions in unmoni- tored streams. The study compares the use of var- ious statistical estimates (mean, geometric mean, trimmed mean, and median) of monitored water quality variables to represent annual and sea- sonal water quality conditions. The relationship between these estimates and the spatial data is then modeled via linear and non-linear multivari- ate methods. Overall, the non-linear techniques for classification outperformed the linear tech- A. Gamble · M. Babbar-Sebens (B ) Department of Earth Science, Indiana University Purdue University Indianapolis, 723 W. Michigan St., Indianapolis, IN 46202, USA e-mail: mbabbars@iupui.edu niques with an average cross-validation accuracy of 79.7%. Additionally, the geometric mean based models outperformed models based on other sta- tistical indicators with an average cross-validation accuracy of 80.2%. Dividing the data into annual and quarterly datasets also offered important in- sights into the behavior of certain water quality variables impacted by seasonal variations. The research provides useful guidance on the use and interpretation of the various statistical estimates and statistical models for multivariate water qual- ity analyses. Keywords Water quality · Principal component analysis · Linear discriminant analysis · Kohonen self-organizing map · Support vector machine · Cluster analysis Introduction In-stream water quality monitoring is expensive, and it can be impractical to install monitoring stations on every stream in a watershed. In 2009, the U.S. Environmental Protection Agency spent 52% of their budget on their Clean and Safe Water Program. Geographic information systems (GIS) and remote sensing technology create means to readily assess various spatial characteristics— e.g., land cover, climate, geology, and ecology, etc.—that affect the hydrology and the fate and