1 INTRODUCTION Multi-variate soil-landscape modeling has been widely used to investigate soil carbon patterns and processes as well as upscale site-specific observa- tions (McBratney et al., 2003; Grunwald, 2009). Soil carbon variation is complex to model because it is governed by various soil forming, environmental and anthropogenic factors that operate at distinct scales (Vasques et al., 2012). However, limited re- search has been conducted on systematically study- ing which covariates should be used in soil carbon modeling at a specific scale and region. With the advent of the information age, the inno- vative delivery of remote sensing and proximal sens- ing products has been increasingly providing the soil science community with more accessible dataset characterizing environmental soil-landscapes and even internal soil information directly (Grunwald, 2009). Meanwhile, the rapid development and intro- duction of data mining and machine learning tech- niques equip pedometricians with powerful tools that can deal with huge volume of data. Tree-based modeling techniques and Support Vector Machine are among the most widely used in ecology and soil science (De’ath and Fabricius, 2000; Prasad et al., 2006; Vasques et al., 2008). Therefore, the objective of this study was to iden- tify optimal sets of soil, environmental and anthro- pogenic covariates that can make the best prediction of soil carbon at regional scale (Florida, USA, about 150,000 km 2 ) from a huge pool of covariates, as well as reveal the dominating properties/processes controlling soil carbon in Florida. 2 MATERIALS AND METHODS 2.1 Study area The study area is the State of Florida, located in the southeastern United States, with latitudes from 24°27′ N to 31° N and longitudes from 80°02′ W to 87°38′ W. The whole of Florida covers approximate- ly 150,000 km 2 . The climate of North and Central Florida is hu- mid subtropical. South Florida has a tropical climate according to the Koppen Classification Map. Domi- nant soil orders of Florida are: Spodosols (29%), Entisols (20%), Ultisols (17%), Alfisols (12%) and Histosols (10%). Overall, soils in Florida are sandy in texture. Land use/land cover consists mainly of Open Water (18%), Pinelands (16%), High Impact Urban (7%), Improved Pasture (7%), and Freshwater Marsh and Wet Prairie (5%). The topography con- sists of gentle slopes varying from 0 to 5% in almost the whole state. Elevation ranges from sea level up to approximately 114 m at the Panhandle. 2.2 Soil data and environmental covariates A total of 1,192 soil samples in the topsoil (0-20 cm) across Florida were taken between 2008 and 2010 based on random design stratified by the combina- tion of soil suborder and land cover / land use (Fig- ure 1). Total carbon (TC) was analyzed by combus- Which soil, environmental and anthropogenic covariates for soil carbon models in Florida are needed? X. Xiong & S. Grunwald Department of Soil and Water Science, University of Florida, Gainesville, Florida, USA D.B. Myers USDA-ARS-Cropping Systems and Water Quality Unit, Columbia, Missouri, USA J. Kim & W.G. Harris Department of Soil and Water Science, University of Florida, Gainesville, Florida, USA N.B. Comerford North Florida Research and Education Center, University of Florida, Quincy, Florida, USA ABSTRACT: In soil-landscape modeling, how to select environmental covariates has always been a critical question facing modelers. However, this topic has not been fully investigated. In this study a total of 1,192 soil samples in the topsoil (0-20 cm) across Florida were taken between 2008 and 2010 and a comprehensive pool of 212 environmental covariates covering all STEP - AWBH variables representing - S: soils, T: topo- graphy, E: ecology, P: parent materials, A: atmosphere/climate, W: water, B: biota, and H: human - were compiled. Data mining and machine learning techniques were used to develop models to predict total soil carbon (TC) stocks. Results showed that soil-water properties, biota, human and parent material were the do- minating factors controlling TC variation in Florida. A simplified model with approximate 50 predictors per- formed comparable to the exhaustive model with all 212 predictors.