Testing Occam’s razor to characterize high-order connectivity in pore net- works of granular media: Feature selection in machine learning Joost van der Linden 1 , Antoinette Tordesillas 2,⋆ , and Guillermo Narsilio 1 1 Department of Infrastructure Engineering, The University of Melbourne, Australia 2 School of Mathematics and Statistics, School of Earth Sciences, The University of Melbourne, Australia Abstract. A perennial challenge for the characterization and modelling of phenomena involving granular media is that the internal connectivity of, and interactions between, the pores and the particles exhibit hallmarks of complexity: multi-scale and nonlinear interactions that lead to a plethora of patterns at the mesoscale, including fluid flow patterns that ultimately render a permeability of the granular media at the macroscale. A multitude of physical parameters exist to characterize geometry and structure, including pore/particle shape, volume and surface area, while a rich class of complex network parameters quantifies internal connectivity of the pore and particles in the material. A large collection of such variables is likely to exhibit a high degree of redundancy. Here we demonstrate how to use feature selection in machine learning theory to identify the most informative and non-redundant, yet parsimonious set of features that optimally characterizes the interstitial flow properties of porous, granular media, e.g., permeability, from high resolution data. 1 Introduction Porous, granular materials are complex systems that em- body rich patterns and dynamics. Applications involving physical flow, such as hydrocarbon recovery and geother- mal energy, rely on estimations of the (coupled) hydraulic, thermal and mechanical material properties. For instance, significant experimental evidence has shown that transport properties of materials, such as permeability, are strongly influenced by the presence of concentrated zones of inelas- tic deformation such as shear bands, compaction bands, fractures and joints (e.g., [1]). Localized deformation can strongly influence flow pathways, potentially becoming ei- ther barriers or conduits for flow depending on the atten- dant evolution of local pores (e.g., see [3] and references therein). Many of these patterns occur naturally in fric- tional soft materials, especially in rocks and soil, due to material discontinuities and heterogeneity [2]. Microstruc- tural grain rearrangements can alter permeability not only through changes in the geometry and size distribution of individual pores but also their connectivity [3]. Almost invariably, these complex and iterrelated pro- cesses can only be captured in a high-dimensional mul- tivariate parameter space. Such a dataset is generated in this work to characterize permeability using the data- driven framework introduced in [4]. The framework fuses proven finite-element and discrete-element methodology with modern advances in statistics (machine learning) and complex systems (complex networks), towards a data- driven 3D analysis of multiscale and nonlinear phenomena in granular, porous media. Furthermore, the framework e-mail: atordesi@unimelb.edu.au can be used to address the coupled evolution of the solid grain and interstitial pore phases through a study of two classes of interdependent networks in a single platform, viz. one that represents the grain contacts while the other represents the pores, thus advancing the approach in [3] for planar systems. As shown in [4], the high-dimensional parameter space of variables related to permeability is likely to exhibit a high degree of redundancy. Occam’s razor dictates all but the most relevant and least redundant of these variables should be retained to explain a given phenomenon of inter- est. To this end, a ranking of the variables in order of rel- evance and redundancy is crucial for predictive modelling and ultimately control. We apply two such algorithms in this paper, quantify the redundancy, and highlight several highly relevant parameters for the permeability. 2 Methods The methods and dataset used here are based on the methodology outlined in [4]. In this paper, we summa- rize the corresponding framework. For a comprehensive discussion of all model equations, parameters and assump- tions involved, we refer to the aformentioned study. The employed data generation process is shown in Figure 1. 2.1 Discrete element modeling Our discrete-element model is a simple proxy for sands and sedimentary rocks (i.e., Ottawa sand and sandstone) [2, 5, 6]. Batches of 400 soft spheres are dropped in a rect- angular container with periodic boundary conditions for DOI: 10.1051/ , 714012006 140 EPJ Web of Conferences epjconf/201 Powders & Grains 2017 12006 (2017) © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/).