Estimation and potential improvement of the quality of legacy soil
samples for digital soil mapping
F. Carré
a,
⁎
, Alex B. McBratney
b
, B. Minasny
b
a
European Commission, DG Joint Research Centre, Institute of Environment and Sustainability, Land Management Unit, TP 280, 21020 Ispra (Va), Italy
b
Australian Centre for Precision Agriculture, Faculty of Agriculture, Food & Natural Resources, The University of Sydney, NSW 2006, Australia
Received 19 January 2006; received in revised form 24 August 2006; accepted 24 January 2007
Available online 29 May 2007
Abstract
Legacy soil data form an important resource for digital soil mapping and are essential for calibration of models for predicting soil properties
from environmental variables. Such data arise from traditional soil survey. Methods of soil survey are generally empirical and based on the mental
development of the surveyor, correlating soil with underlying geology, landforms, vegetation and air-photo interpretation. There are no statistical
criteria for traditional soil sampling, and this may lead to biases in the areas being sampled. The challenge is to test the use of legacy data for large-
area mapping (e.g. national or continental extents) in order to limit the funds of field survey for large-area mapping. The problem is then to assess
the reliability and quality of the legacy soil databases that have been mainly populated by traditional soil survey, and if there is a possibility of
additional funding for sampling, to determine where new sampling units should be located. This additional sampling can be used to improve and
validate the prediction model.
Latin hypercube sampling (LHS) has been proposed as a sampling design for digital soil mapping when there is no prior sample. We use the
principle of hypercube sampling to assess the quality of existing soil data and guide us to locations that need to be sampled.
First an area is defined and the empirical environmental data layers or covariates are identified on a regular grid. The existing soil data are
matched with the environmental variables. The HELS algorithm is used to check the occupancy of the legacy sampling units in the hypercube of
the quantiles of the covarying environmental data. This is to determine whether legacy soil survey data occupy the hypercube uniformly or if there
is over- or under-observation in the partitions of the hypercube. It also allows posterior estimation of the apparent probability of sample units being
surveyed. From this information we can design further sampling. The methods are illustrated using legacy soil samples from Edgeroi, New South
Wales, Australia, and from a large part of the Danube Basin. One third of the total number of sampling units are added to the original dataset.
These new sampling units improve the representation of the feature space of the covariate. The standard deviation of the overall density is
consequently smaller.
© 2007 Published by Elsevier B.V.
Keywords: Legacy soil data; Soil sampling; Hypercube sampling; Pedometrics; Soil survey; Digital soil mapping
1. Introduction
Legacy soil data arise from traditional soil survey (Bui and
Moran, 2001). Methods of soil survey are generally empirical
and based on the mental development of the surveyor, cor-
relating soil with underlying geology, landforms, vegetation and
air-photo interpretation. There are no statistical criteria for
traditional soil sampling, this may lead to bias in the areas being
sampled.
de Gruijter et al. (2006) offer some very thoughtful defi-
nitions in relation to sampling which we paraphrase here and
use subsequently. Sampling sensu lato comprises selecting parts
from a universe with the purpose of taking observations on
them. The selected parts may be observed in situ, or material
may be taken out from them for subsequent measurement in a
laboratory. It is the collection of selected parts that is referred to
as the sample. A single part that is, or could be, selected, is
referred to as a sampling unit. The total number of sampling
Geoderma 141 (2007) 1 – 14
www.elsevier.com/locate/geoderma
⁎
Corresponding author. Tel.: +39 0332 78 65 46; fax: +39 0332 78 63 94.
E-mail address: Florence.Carre@jrc.it (F. Carré).
0016-7061/$ - see front matter © 2007 Published by Elsevier B.V.
doi:10.1016/j.geoderma.2007.01.018