RESEARCH PAPER Mapping landslide susceptibility in the Zagros Mountains, Iran: a comparative study of different data mining models Mohammad Fallah-Zazuli 1 & Alireza Vafaeinejad 2 & Ali Asghar Alesheikh 3 & Mahdi Modiri 4 & Hossein Aghamohammadi 1 Received: 16 April 2019 /Accepted: 20 June 2019 # Springer-Verlag GmbH Germany, part of Springer Nature 2019 Abstract In recent years, increasing efforts have been made to predict the time, location, and magnitude of future landslides. This study explores the potential application of four state-of-the-art data mining models (logistic regression, random forest, support vector machine, and Naïve Bayes tree) for the spatially explicit prediction of landslide susceptibility across a landslide-prone landscape in the Zagros Mountains, Iran. Fifteen conditioning factors and 272 historical landslide events were used to develop a geospatial database for the study area. A two-step factor analysis procedure based on the multicollinearity analysis and the Gain Ratio technique was performed to measure the predictive utility of the factors and to quantify their contribution to landslide occurrences across the study region. Once the models were successfully trained and validated using several performance metrics (i.e., ROC-AUC, sensitivity, specificity, accuracy, RMSE, and Kappa), they were applied to the entire study region to generate distribution maps of landslide susceptibilities. Overall, the random forest model demonstrated the highest training performance (AUC = 0.971; accuracy = 99%; RMSE = 0.120) and ability to predict future landslides (AUC = 0.942; accuracy =87%; RMSE = 0.312), followed by the support vector machine, Naïve Bayes tree, and logistic regression models. The Wilcoxon signed-rank test further proved the superiority of the random forest model for mapping landslide susceptibility in the Zagros region. The insights obtained from this research could be useful for the spatially explicit assessment of landslide-prone landscapes and obtaining a better understanding of the capability of different predictive models. Keywords Landslide . Susceptibility mapping . GIS . Data mining Introduction Landslides are a frequently recurring natural hazard in many parts of the world. Every year, several landslide events occur worldwide often resulting in significant loss of life and economic consequences (Glade et al. 2006; Chen et al. 2019). As a response to this concern, engineers desire to delineate existing susceptibilities to reduce potential impacts from future landslides. To this end, they need powerful and reliable approaches and tools that enable them to deal with manifold information of various and long-term historical landslide events (Hong et al. 2015, 2017; Tien Bui et al. 2016a, b; Jaafari et al. 2015a, 2017; Pham et al. 2016, 2018a, b; Pourghasemi and Rahmati 2018; Wang et al. 2019; Dou et al. 2019). The advancement of geographical informa- tion system (GIS) and remote sensing (RS) techniques (Amade et al. 2018; Ahmouda et al. 2018) has intro- duced various spatially explicit models for the purpose of landslide prediction. Among these models, logistic regression (LR) and multi-criteria decision analysis (MCDM) have been repeatedly employed as the basic modeling approaches and acknowledged as the most popular techniques for estimating the likelihood of * Alireza Vafaeinejad a_vafaei@sbu.ac.ir 1 Department of GIS and RS, Faculty of Natural Resources and Environment, Science and Research Branch, Islamic Azad University, Tehran, Iran 2 Faculty of Civil, Water, and Environmental Engineering, Shahid Beheshti University, Tehran, Iran 3 Department of Geospatial Information Systems, Faculty of Geodesy and Geomatics Engineering, K. N. Toosi University of Technology, Tehran, Iran 4 Department of Urban Planning, Malek-e-Ashtar University of Technology, Tehran, Iran Earth Science Informatics https://doi.org/10.1007/s12145-019-00389-w