Hyper-scale digital soil mapping and soil formation analysis Thorsten Behrens a, , Karsten Schmidt a , Leonardo Ramirez-Lopez a , John Gallant b , A-Xing Zhu c,d , Thomas Scholten a a Department of Geosciences, Physical Geography and Soil Science, University of Tübingen, D-72074 Tübingen, Germany b CSIRO Land and Water Black Mountain, Canberra, ACT 2601, Australia c State Key Laboratory of Environment and Resources Information System, Institute of Geographical Science and Resources Research, Chinese Academy of Sciences, Beijing 100101, China d Department of Geography, University of Wisconsin-Madison, Madison, WI 53706, USA abstract article info Article history: Received 15 June 2012 Received in revised form 27 June 2013 Accepted 27 July 2013 Available online 7 October 2013 Keywords: Hyper-scale analysis Digital soil mapping Soil formation Digital terrain analysis Pedology Geomorphic signature Data mining ConStat Landscape characteristics show local, regional and supra-regional components. As a result pedogenesis and the spatial distribution of soil properties are both inuenced by features emerging at multiple scales. To account for this effect in a predictive model, descriptors of the geomorphic signature are required at multiple scales. In this study, we present a new hyper-scale terrain analysis approach, referred to as Contextual Statistical Mapping (ConStat), which is based on statistical neighborhood measures derived for growing sparse circular neighbor- hoods. The statistical measures tested comprise basic descriptors such as the minimum, maximum, mean, stan- dard deviation, and skewness, as well as statistical terrain attributes and directional components. We propose a data mining framework to determine the relevant statistical measures at the relevant scales to analyze and inter- pret the inuence of these statistical measures and to map the geomorphic structures inuencing soil formation and the regions where a statistical measure shows inuence. We introduce ConStat on two landscape-scale DSM examples with different soil genesis regimes where the ConStat terrain features serve as proxies for multi-scale variations of climate and parent material conditions. The results show that ConStat provides high predictive power. The cross-validated R 2 values range from 0.63 for predicting topsoil clay content in the Piracicaba area (Brazil) to 0.68 for topsoil silt content in the Rhine-Hesse area (Germany). The results obtained from data mining analysis allow for interpretations beyond conventional concepts and approaches to explain soil formation. As such it overcomes the trade-off between accuracy and interpretability of soil property predictions. © 2013 Elsevier B.V. All rights reserved. 1. Introduction 1.1. Landscape characteristics and digital soil mapping Due to the economic and ecological pressure to estimate and handle the impacts of global climate change, population growth, food security, and bio energy, the demand for ne-resolution soil property data for large areas is strong and growing (Banwart, 2011; Hartemink, 2008). Hence, new and powerful approaches are needed to regionalize soil infor- mation as accurately as possible. This comprises the generation of new covariates covering all relevant landscape characteristics to describe soil formation (e.g., Behrens et al., 2010a; McBratney et al., 2003). Such new environmental covariates are needed because, in pedology, soilscapes are characterized by spatial and taxonomic relations between soils, as well as by the relation between landform and landscape characteristics and the soils (Gerrard, 1981; Hole, 1978). These landscape characteristics, as driving forces for soil formation, show local, regional and supra- regional components. As a result of these different components the soil forming factors inuence pedogenesis at different scales. Therefore, the spatial distribution of soil properties can also vary at different scales and in different directions (Kerry and Oliver, 2011), a fact, which is not accounted for in traditional qualitative and quantitative state factor con- cepts so far but often described as relevant in pedological and pedometrical studies (e.g., Behrens et al., 2010a,b; Gerrard, 1981; Hole, 1978; Jenny, 1941, 1961; Kerry and Oliver, 2011; McBratney et al., 2003). In most cases complex associations between soils and landscapes can only be described approximately because important data on land- scape characteristics are too scarce and incomplete to provide accurate predictions of soils and their properties and because appropriate methods that allow for integrating over multiple scales are largely miss- ing (Behrens et al., 2010a,b; Lagacherie, 2008; MacMillan, 2004). Such multi- or hyper-scale approaches of landscape description are rarely documented but can be regarded as the missing counterpart to the cur- rent data explosion we are facing due to new hyper-spectral remote sensing data (e.g. Hyperion) as well as traditional map sources (geology, terrain attributes, etc.) which are currently becoming digitally available for each point of a landscape. What is required are operational methods that provide measures of the entire physical landscape. Pike (1988) calls these the geomorphic signature. Terrain analysis generally provides a subset of the geomor- phic signature the geometric signature(Pike, 1988). Pike (1988) Geoderma 213 (2014) 578588 Corresponding author. Tel./fax: +49 7071 29 78943. E-mail address: thorsten.behrens@uni-tuebingen.de (T. Behrens). 0016-7061/$ see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.geoderma.2013.07.031 Contents lists available at ScienceDirect Geoderma journal homepage: www.elsevier.com/locate/geoderma