The power of Random Forest for the identication and quantication of technogenic substrates in urban soils on the basis of DRIFT spectra * Jannis Heil a, * , Xandra Michaelis b , Bernd Marschner b , Britta Stumpe a a Department of General Geography/Human-Environment Research, Institute of Geography, University of Wuppertal, 42119 Wuppertal, Germany b Department of Soil Science/Soil Ecology, Institute of Geography, Ruhr-University Bochum, 44780 Bochum, Germany article info Article history: Received 13 January 2017 Received in revised form 19 May 2017 Accepted 27 June 2017 Keywords: Urban soils Technogenic substrates Diffuse reectance spectroscopy Data mining Random Forest 1. Introduction In today's industrialized world, urban settlements produce large amounts of domestic waste and unwanted by-products from in- dustrial processes. Two examples of these waste products are municipal solid waste incineration ashes and slags from metallurgy processes. The incineration of municipal solid wastes has been increasingly adopted around the world as a practice to deal with the ever-growing amounts of waste, especially in regions where land availability is scare or environmental regulations encourage incin- eration (Santos et al., 2013). Although incineration reduces the volume of waste by up to 90%, substantial amounts of residual ashes are generated. Those ashes are sinks for numerous toxic constituents, such as heavy metals or salts (Dabo et al., 2009). Slags, on the other hand, are generated as solid by-products during metal production, with annual worldwide production reaching over 50 million tons (Navarro et al., 2010). Depending on the type of ore and the pyrometallurgical process being applied, slags contain different amounts of heavy metals and therefore vary in environmental concern (Proctor et al., 2000; Rawlins et al., 2005; Navarro et al., 2010; Stumpe et al., 2012). In this study, we studied zinc furnace slags (ZFS), a slag type generally linked with high heavy metal contents. Since about 150 years, such technogenic materials were deposited unregulated into landlls, or when open space became limited, it was common practice to use technogenic substrates for construction and landscaping (Proctor et al., 2000). Consequently, technogenic substrates were brought into urban soils without any record, so that worldwide up to 35% of slag material in soils is of unknown origin (Motz and Geiseler, 2001; Mansfeldt and Dohrmann, 2004). Being shaped by heavy industry in the past, the Rhine-Ruhr metropolitan area has a long history of technogenic substrate contamination in soils. In a comprehensive study, Meuser (1993) found that out of 240 sampled soils in the area, 71% con- tained technogenic additions. As there is a great range in risk po- tentials from different technogenic substrates, it is mandatory for an appropriate risk assessment to identify the source of contami- nation. Hazardous effects, e.g., heavy metal concentration, are closely correlated to the type of substrate (Proctor et al., 2000; Mansfeldt and Dohrmann, 2004; Rawlins et al., 2005; Navarro et al., 2010). Therefore, it is necessary to develop a method for the accurate identication of technogenic materials in urban soils. Spectroscopic methods, such as Fourier transform infrared spectroscopy (FTIR), show a high potential to overcome the chal- lenge of identify different substrate types in soils. Especially, diffuse reectance FTIR spectroscopy (DRIFT) is becoming increasingly popular in soil science, as it is more rapid, cost-effective, and requires minimal sample preparation compared to traditional laboratory methods such as acid digestion (e.g., McCarty et al., 2002; Reeves, 2010; Bellon-Maurel and McBratney, 2011; Soriano-Disla et al., 2014). Spectroscopic methods allow for a higher sample throughput and/or higher spatial resolution for possible soil mapping (Viscarra Rossel and Behrens, 2010). Spectroscopic methods have been used estimate qualitative as well as quantitative soil properties. For instance, spectroscopy has been used to assess the composition of soil organic matter (SOM) in different soils (Baes and Bloom, 1989; Demyan et al., 2012; Heller et al., 2015), but also to predict multi- ple soil chemical, physical, and biological properties from one spectrum, e.g., by Viscarra Rossel et al. (2006) and as thoroughly reviewed by Soriano-Disla et al. (2014). Spectroscopy was also used to distinguish between different substrate groups (Stumpe et al., * This paper has been recommended for acceptance by B. Nowack. * Corresponding author. E-mail address: jheil@uni-wuppertal.de (J. Heil). Contents lists available at ScienceDirect Environmental Pollution journal homepage: www.elsevier.com/locate/envpol http://dx.doi.org/10.1016/j.envpol.2017.06.086 0269-7491/© 2017 Elsevier Ltd. All rights reserved. Environmental Pollution 230 (2017) 574e583