Environ Monit Assess (2012) 184:845–875
DOI 10.1007/s10661-011-2005-y
On the use of multivariate statistical methods
for combining in-stream monitoring data and spatial
analysis to characterize water quality conditions
in the White River Basin, Indiana, USA
Andrew Gamble · Meghna Babbar-Sebens
Received: 27 October 2010 / Accepted: 14 March 2011 / Published online: 1 April 2011
© Springer Science+Business Media B.V. 2011
Abstract Mechanistic hydrologic and water qual-
ity models provide useful alternatives for esti-
mating water quality in unmonitored streams.
However, developing these elaborate models for
large watersheds can be time-consuming and ex-
pensive, in addition to challenges that arise during
calibration when there is limited spatial and/or
temporal monitored in-stream water quality data.
The main objective of this research was to inves-
tigate different approaches for developing mul-
tivariate analysis models as alternative methods
for rapidly assessing relationships between spatio-
temporal physical attributes of the watershed and
water quality conditions in monitored streams,
and then using the developed relationships for
estimating water quality conditions in unmoni-
tored streams. The study compares the use of var-
ious statistical estimates (mean, geometric mean,
trimmed mean, and median) of monitored water
quality variables to represent annual and sea-
sonal water quality conditions. The relationship
between these estimates and the spatial data is
then modeled via linear and non-linear multivari-
ate methods. Overall, the non-linear techniques
for classification outperformed the linear tech-
A. Gamble · M. Babbar-Sebens (B )
Department of Earth Science, Indiana University
Purdue University Indianapolis, 723 W. Michigan St.,
Indianapolis, IN 46202, USA
e-mail: mbabbars@iupui.edu
niques with an average cross-validation accuracy
of 79.7%. Additionally, the geometric mean based
models outperformed models based on other sta-
tistical indicators with an average cross-validation
accuracy of 80.2%. Dividing the data into annual
and quarterly datasets also offered important in-
sights into the behavior of certain water quality
variables impacted by seasonal variations. The
research provides useful guidance on the use and
interpretation of the various statistical estimates
and statistical models for multivariate water qual-
ity analyses.
Keywords Water quality · Principal component
analysis · Linear discriminant analysis · Kohonen
self-organizing map · Support vector machine ·
Cluster analysis
Introduction
In-stream water quality monitoring is expensive,
and it can be impractical to install monitoring
stations on every stream in a watershed. In 2009,
the U.S. Environmental Protection Agency spent
52% of their budget on their Clean and Safe
Water Program. Geographic information systems
(GIS) and remote sensing technology create means
to readily assess various spatial characteristics—
e.g., land cover, climate, geology, and ecology,
etc.—that affect the hydrology and the fate and