Mapping numerically classified soil taxa in Kilombero Valley, Tanzania using
machine learning
Boniface H.J. Massawe
a,b,
⁎, Sakthi K. Subburayalu
a
, Abel K. Kaaya
b
, Leigh Winowiecki
c
, Brian K. Slater
a
a
School of Environment and Natural Resources, The Ohio State University, 210 Kottman Hall, 2021 Coffey Road, Columbus, OH 43210, USA
b
Department of Soil and Geological Sciences, Sokoine University of Agriculture, PO Box 3008, Morogoro, Tanzania
c
World Agroforestry Centre, United Nations Avenue, Gigiri, Nairobi, Kenya
abstract article info
Article history:
Received 21 March 2016
Received in revised form 11 November 2016
Accepted 14 November 2016
Available online xxxx
Inadequacy of spatial soil information is one of the limiting factors to making evidence-based decisions to im-
prove food security and land management in the developing countries. Various digital soil mapping (DSM) tech-
niques have been applied in many parts of the world to improve availability and usability of soil data, but less has
been done in Africa, particularly in Tanzania and at the scale necessary to make farm management decisions. The
Kilombero Valley has been identified for intensified rice production. However the valley lacks detailed and up-to-
date soil information for decision-making. The overall objective of this study was to develop a predictive soil map
of a portion of Kilombero Valley using DSM techniques. Two widely used decision tree algorithms and three
sources of Digital Elevation Models (DEMs) were evaluated for their predictive ability. Firstly, a numerical classi-
fication was performed on the collected soil profile data to arrive at soil taxa. Secondly, the derived taxa were spa-
tially predicted and mapped following SCORPAN framework using Random Forest (RF) and J48 machine learning
algorithms. Datasets to train the model were derived from legacy soil map, RapidEye satellite image and three
DEMs: 1 arc SRTM, 30 m ASTER, and 12 m WorldDEM. Separate predictive models were built using each DEM
source. Mapping showed that RF was less sensitive to the training set sampling intensity. Results also showed
that predictions of soil taxa using 1 arc SRTM and 12 m WordDEM were identical. We suggest the use of RF algo-
rithm and the freely available SRTM DEM combination for mapping the soils for the whole Kilombero Valley. This
combination can be tested and applied in other areas which have relatively flat terrain like the Kilombero Valley.
© 2016 Elsevier B.V. All rights reserved.
Keywords:
Kilombero Valley
Numerical classification
Machine learning
Soil mapping
Decision tree analysis
DEM
1. Introduction
The Kilombero Valley in Tanzania presents great potential for the ex-
pansion and intensification of rice production. This valley, covering an
area of about 11,600 km
2
(Kato, 2007), has been identified by the Gov-
ernment of Tanzania for financial and technological investments to ex-
pand and intensify rice production (TIC, 2013). Rice is the second
most important cereal crop in Tanzania after maize (Bucheyeki et al.,
2011), and its demand has been increasing following shift in preference
by local population from traditional staples to rice, and increased mar-
ket demands from neighboring countries. To develop and promote sus-
tainable rice production intensification; farmers and policy makers need
to identify the most suitable areas and respective management options.
However, updated and detailed soil information to this support deci-
sion-making process is currently lacking.
Accurate soil information is crucial for informing management rec-
ommendations aimed to increase agricultural productivity and overall
food security, especially in developing countries where the GDP is
heavily dependent on the agricultural sector (Cook et al., 2008;
Msanya et al., 2002). Relatively longer time is required to gather such in-
formation through conventional soil inventory and generally, larger
amount of resources are required for such exercises (McBratney et al.,
2003). Recent developments in remote and proximal sensing, computa-
tional methods and information technology, have provided means by
which soil information can be collected, shared, communicated and up-
dated more efficiently (Malone, 2013; McBratney et al., 2003; Scull et
al., 2003; Vågen et al., 2013; Vågen et al., 2016; Winowiecki et al.,
2016a, 2016b). Predictive soil landscape model frameworks such as
the SCORPAN approach (McBratney et al., 2003) could be used to pre-
dict continuous soil classes and soil attributes that better represent
soil spatial variability. The increased availability of high resolution digi-
tal elevation models (DEMs) that provide predictive variables in digital
soil mapping together with the advances in machine learning tech-
niques add to the ease of generating spatial soil information and
depicting uncertainty (Hansen et al., 2009; Haring et al., 2012;
Subburayalu and Slater, 2013; Subburayalu et al., 2014).
Geoderma xxx (2016) xxx–xxx
⁎ Corresponding author at: Department of Soil and Geological Sciences, Sokoine
University of Agriculture, PO Box 3008, Morogoro, Tanzania.
E-mail addresses: bonmass@yahoo.com (B.H.J. Massawe),
L.A.WINOWIECKI@CGIAR.ORG (L. Winowiecki).
GEODER-12542; No of Pages 6
http://dx.doi.org/10.1016/j.geoderma.2016.11.020
0016-7061/© 2016 Elsevier B.V. All rights reserved.
Contents lists available at ScienceDirect
Geoderma
journal homepage: www.elsevier.com/locate/geoderma
Please cite this article as: Massawe, B.H.J., et al., Mapping numerically classified soil taxa in Kilombero Valley, Tanzania using machine learning,
Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.11.020