Dimensionality reduction in drought modelling
João Filipe Santos,
1
*
Maria Manuela Portela
2
and Inmaculada Pulido-Calvo
3
1
Departamento Engenharia, ESTIG, Instituto Politécnico de Beja, Rua Afonso III, 7800-050 Beja, Portugal
2
Departamento Engenharia Civil, SHRH, Instituto Superior Técnico (Lisboa), Portugal, Avda. RoviscoPais, 1049-001 Lisboa, Portugal
3
Departamento Ciencias Agroforestales, Escuela Técnica Superior de Ingeniería, Campus La Rábida, Universidad de Huelva, 21819 Palos de la
Frontera, Huelva, Spain
Abstract:
For monitoring hydrological events characterized by high spatial and temporal variability, the number and location of recording
stations must be carefully selected to ensure that the necessary information is collected. Depending on the characteristics of each
natural process, certain stations may be spurious or redundant, whereas others may provide most of the relevant data. With the
objective of reducing the costs of the monitoring system and, at the same time, improving its operational effectiveness, three
procedures were applied to identify the minimum network of rain gauge stations able to capture the characteristics of droughts in
mainland Portugal. Drought severity is characterized by the standardized precipitation index applied to the timescales of 1, 3, 6
and 12 consecutive months. The three techniques used to reduce the dimensionality of the network of rain gauges were as
follows: (i) artificial neural networks with sensitivity analysis, (ii) application of the mutual information criterion and (iii)
K-means cluster analysis using Euclidean distances. The results demonstrated that the best dimensionality reduction method was
case dependent in the three regions of Portugal (northern, central and southern) previously identified by cluster analysis. All the
reduction techniques lead to the selection of a subset of rain gauges capable of reproducing the original temporal patterns of
drought. For specific severe drought events in Portugal in the past, the comparison between drought spatial patterns obtained with
the original stations and the selected subset indicated that the subset produced statistically satisfactory results (correlation
coefficients higher than 0.6 and efficiency coefficients higher than 0.5). Copyright © 2012 John Wiley & Sons, Ltd.
KEY WORDS rain gauge network; drought monitoring; standardized precipitation index; mutual information; sensitivity analysis;
artificial neural network; Portugal
Received 27 October 2011; Accepted 2 March 2012
INTRODUCTION
Drought is a recurrent natural phenomenon that can lead
to a severe reduction in the availability of fresh water for
a certain period and can affect wide areas. Although
complex, attempts are made to characterize this
phenomenon, and the process generally involves the
calculation of drought indices, derived from meteoro-
logical and hydrological records. These provide informa-
tion about historical droughts and therefore can also be
used to monitor current conditions. According to Tsakiris
(2008), such indices are useful for planning and
management purposes as they provide standardized
measures of the deviation of the water availability from
normal conditions. Among these drought indices, the
most widely used is the standardized precipitation index
(SPI), which is based on precipitation data, as implied by
the name (McKee et al. 1993; Santos et al. 2010).
Various authors, such as Bonaccorso et al. (2003), have
emphasized that the monitoring of meteorological and
hydrological events characterized by high spatial and
temporal variability, such as droughts, requires careful
selection of the optimal number of gauge stations able to
describe the phenomenon within the area under study.
One of the most widely used criteria for network design is
based on geostatistical techniques (Bastin et al. 1984;
Bogardi and Bardossy 1985; Pardo-Igúzquiza 1998, and
more recently Chen et al. 2008) the aim of being able to
improve the operational performance of a monitoring
network with fewer gauges by selecting just the most
important stations. In the case of drought monitoring, to
achieve effective modelling of the phenomenon, most
common methods still depend on a relatively large
network of meteorological stations with long time series.
For forecasting purposes, the size and the quality of the
input network of a model are crucial, as reported by many
authors including Murphy (1991), Zheng and Billings
(1996), Maier and Dandy (2000), Back and Trappenberg
(2001) and, more recently, Guest and Smith-Genut
(2010). If relevant inputs (independent variables) are
omitted, the model cannot fully capture the input–output
pattern (i.e. the model is underspecified). On the other
hand, if the model includes redundant or unnecessary
inputs (i.e. the model is overspecified), one or more of the
following may occur: (i) the size, computational complexity
and memory requirements of the model increase; (ii) the
calibration of the model becomes more difficult due to an
increase in the size of the search space and the greater
number of local optima; (iii) the interpretation of the
physical meaning of results from calibrated models becomes
more difficult; and (iv) more data are needed to efficiently
estimate the optimal values of the model parameters.
*Correspondence to: João Filipe Santos, Departamento Engenharia, ESTIG,
Instituto Politécnico de Beja, Rua Afonso III, 7800-050 Beja, Portugal.
E-mail: joaof.santos@estig.ipbeja.pt
HYDROLOGICAL PROCESSES
Hydrol. Process. 27, 1399–1410 (2013)
Published online 17 April 2012 in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/hyp.9300
Copyright © 2012 John Wiley & Sons, Ltd.