land Article Using Machine Learning Algorithms to Estimate Soil Organic Carbon Variability with Environmental Variables and Soil Nutrient Indicators in an Alluvial Soil Kingsley JOHN 1, *, Isong Abraham Isong 2 , Ndiye Michael Kebonye 1 , Esther Okon Ayito 2 , Prince Chapman Agyeman 1 and Sunday Marcus Afu 2 1 Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food, and Natural Resources, Czech University of Life Sciences, Kamýcká 129, 16500 Prague, Czech Republic; kebonye@af.czu.cz (N.M.K.); agyeman@af.czu.cz (P.C.A.) 2 Department of Soil Science, Faculty of Agriculture, University of Calabar, Calabar P.M.B. 1115, Nigeria; eneaki1@unical.edu.ng (I.A.I.); irenotobong@unical.edu.ng (E.O.A.); sunnymarcus@unical.edu.ng (S.M.A.) * Correspondence: johnk@af.czu.cz Received: 4 November 2020; Accepted: 30 November 2020; Published: 2 December 2020 Abstract: Soil organic carbon (SOC) is an important indicator of soil quality and directly determines soil fertility. Hence, understanding its spatial distribution and controlling factors is necessary for efficient and sustainable soil nutrient management. In this study, machine learning algorithms including artificial neural network (ANN), support vector machine (SVM), cubist regression, random forests (RF), and multiple linear regression (MLR) were chosen for advancing the prediction of SOC. A total of sixty (n = 60) soil samples were collected within the research area at 30 cm soil depth and measured for SOC content using the Walkley–Black method. From these samples, 80% were used for model training and 21 auxiliary data were included as predictors. The predictors include effective cation exchange capacity (ECEC), base saturation (BS), calcium to magnesium ratio (Ca_Mg), potassium to magnesium ratio (K_Mg), potassium to calcium ratio (K_Ca), elevation, plan curvature, total catchment area, channel network base level, topographic wetness index, clay index, iron index, normalized difference build-up index (NDBI), ratio vegetation index (RVI), soil adjusted vegetation index (SAVI), normalized difference vegetation index (NDVI), normalized difference moisture index (NDMI) and land surface temperature (LST). Mean absolute error (MAE), root-mean-square error (RMSE) and R 2 were used to determine the model performance. The result showed the mean SOC to be 1.62% with a coefficient of variation (CV) of 47%. The best performing model was RF (R 2 = 0.68) followed by the cubist model (R 2 = 0.51), SVM (R 2 = 0.36), ANN (R 2 = 0.36) and MLR (R 2 = 0.17). The soil nutrient indicators, topographic wetness index and total catchment area were considered an indicator for spatial prediction of SOC in flat homogenous topography. Future studies should include other auxiliary predictors (e.g., soil physical and chemical properties, and lithological data) as well as cover a broader range of soil types to improve model performance. Keywords: geostatistic; machine learning; geospatial modeling; predictive mapping; soil fertility indices; environmental covariates 1. Introduction Globally, soils of the humid tropics have received overwhelming acceptance for agriculture. However, these soils in southeastern Nigeria have the potential that could be exploited for crop production. Unfortunately, they are both highly weathered and leached soils formed on alluvial Land 2020, 9, 487; doi:10.3390/land9120487 www.mdpi.com/journal/land