Proceedings of the 19 th International Conference on Soil Mechanics and Geotechnical Engineering, Seoul 2017 Data-driven Classification Approaches for Stability Condition Prediction of Soil Cutting Slopes Approches de classification de données pour la prévision des conditions de stabilité de pentes d'excavation Joaquim Tinoco & António Gomes Correia ISISE - Institute for Sustainability and Innovation in Structural Engineering, School of Engineering, University of Minho, Portugal, jtinoco@civil.uminho.pt Paulo Cortez ALGORITMI Research Center/Department of Information Systems, University of Minho, Portugal David Toll School of Engineering and Computing Sciences, University of Durham, UK ABSTRACT: For transportation infrastructures, one of the greatest challenges today is to keep large-scale transportation networks, such as railway networks, operational under all conditions. In this paper we present a tool aimed at helping in management tasks related to maintenance and repair works for a particular component of these infrastructures, the slopes. For that, the high and flexible learning capabilities of artificial neural networks and support vector machines were applied in the development of a tool able to identify the stability condition of soil cutting slopes, keeping in mind the use of information usually collected during routine inspection activities (visual information) to feed the models. This task was addressed following two different strategies: nominal classification and regression. Moreover, to overcome the problem of imbalanced data, three training sampling approaches were explored: no resampling, SMOTE and Oversampling. The achieved results are presented and discussed, comparing both algorithms performance as well as the effect of the sampling approaches. A comparison between nominal classification and regression strategies is also carried out. These achieved results can give a valuable contribution for practical applications at network level. RÉSUMÉ : Pour les infrastructures de transport, l'un des plus grands défis d’aujourd'hui est de maintenir les réseaux de transport à grande échelle, tels que les réseaux ferroviaires, opérationnels dans toutes les conditions. Dans cet article, on présente un outil d’aide à la gestion liée aux travaux de maintenance et de réparation d’une composante géotechnique de ces infrastructures, les pentes. Pour cela, les capacités d'apprentissage élevées et flexibles des réseaux de neurones artificiels et des machines à vecteurs de support ont été appliquées dans l'élaboration d'un outil capable d'identifier l'état de stabilité des pentes d’excavation du sol en gardant à l'esprit l'utilisation des informations recueillies habituellement lors des activités d'inspection de routine visuelles pour alimenter les modèles. Cette tâche a été abordée selon deux stratégies différentes: la classification nominale et la régression. De plus, pour surmonter le problème des données déséquilibrées, trois méthodes d'échantillonnage ont été explorées: non ré-échantillonnage, SMOTE et sur-échantillonnage. Les résultats obtenus sont présentés et discutés, en comparant les performances des algorithmes ainsi que l'effet des approches d'échantillonnage. Une comparaison entre les stratégies de classification nominale et de régression est également réalisée. Les résultats obtenus peuvent apporter une contribution précieuse aux applications pratiques au niveau du réseau. KEYWORDS: slope stability condition, soil cutting slopes, railway, soft computing, data mining, imbalanced data. 1 INTRODUCTION For a good optimization of the available budgets it is important to have a set of tools to help decision makers to take the best decisions. In the framework of transportations networks, in particular for a railway, slopes are perhaps the element for which their failure can have the strongest impact at several levels. Therefore, it is important to develop ways to identify potential problems before they result in failures. Although there are some models and systems to detect slope failures, most of them were developed for natural slopes, presenting some constraints when applied to engineered (human-made) slopes. They have limited applicability as most of the existing systems were developed based on particular case studies or using small databases. Furthermore, another aspect that can limit its applicability is related with the information required to feed them, such as data taken from complex tests or from expensive monitoring systems. Some approaches found in the literature for slope failure detection are identified below. Pourkhosravani and Kalantari (2011) summarize the current methods for slope stability evaluation, which were grouped into Limit Equilibrium (LE) methods, Numerical Analysis methods, Artificial Neural Networks and Limit Analysis methods. There are also approaches based on finite elements methods (Suchomel et al., 2010), reliability analysis (Husein Malkawi et al., 2000), as well as some methods making use of data mining (DM) algorithms (Cheng and Hoang, 2014; Ahangar-Asr et al., 2010; Yao et al., 2008). More recently, a new flexible statistical system was proposed by Pinheiro et al. (2015), based on the assessment of different factors that affect the behavior of a given slope. By weighting the different factors, a final indicator of the slope stability condition is calculated. As mentioned above, the main limitations of almost approaches so far proposed are related with its applicability domain or dependency on information that is difficult to obtain. Indeed, the prediction of whether a slope will fail or not is a multi-variable problem characterized by a high dimensionality. In this work we take advantage of the learning capabilities of flexible data mining classification algorithms, such as the Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs). These data mining algorithms were used to fit a large database of soil cutting slopes in order to predict the stability condition of a given slope according to a pre-defined