Testing Occam’s razor to characterize high-order connectivity in pore net-
works of granular media: Feature selection in machine learning
Joost van der Linden
1
, Antoinette Tordesillas
2,⋆
, and Guillermo Narsilio
1
1
Department of Infrastructure Engineering, The University of Melbourne, Australia
2
School of Mathematics and Statistics, School of Earth Sciences, The University of Melbourne, Australia
Abstract. A perennial challenge for the characterization and modelling of phenomena involving granular media
is that the internal connectivity of, and interactions between, the pores and the particles exhibit hallmarks of
complexity: multi-scale and nonlinear interactions that lead to a plethora of patterns at the mesoscale, including
fluid flow patterns that ultimately render a permeability of the granular media at the macroscale. A multitude
of physical parameters exist to characterize geometry and structure, including pore/particle shape, volume and
surface area, while a rich class of complex network parameters quantifies internal connectivity of the pore and
particles in the material. A large collection of such variables is likely to exhibit a high degree of redundancy.
Here we demonstrate how to use feature selection in machine learning theory to identify the most informative
and non-redundant, yet parsimonious set of features that optimally characterizes the interstitial flow properties
of porous, granular media, e.g., permeability, from high resolution data.
1 Introduction
Porous, granular materials are complex systems that em-
body rich patterns and dynamics. Applications involving
physical flow, such as hydrocarbon recovery and geother-
mal energy, rely on estimations of the (coupled) hydraulic,
thermal and mechanical material properties. For instance,
significant experimental evidence has shown that transport
properties of materials, such as permeability, are strongly
influenced by the presence of concentrated zones of inelas-
tic deformation such as shear bands, compaction bands,
fractures and joints (e.g., [1]). Localized deformation can
strongly influence flow pathways, potentially becoming ei-
ther barriers or conduits for flow depending on the atten-
dant evolution of local pores (e.g., see [3] and references
therein). Many of these patterns occur naturally in fric-
tional soft materials, especially in rocks and soil, due to
material discontinuities and heterogeneity [2]. Microstruc-
tural grain rearrangements can alter permeability not only
through changes in the geometry and size distribution of
individual pores but also their connectivity [3].
Almost invariably, these complex and iterrelated pro-
cesses can only be captured in a high-dimensional mul-
tivariate parameter space. Such a dataset is generated
in this work to characterize permeability using the data-
driven framework introduced in [4]. The framework fuses
proven finite-element and discrete-element methodology
with modern advances in statistics (machine learning) and
complex systems (complex networks), towards a data-
driven 3D analysis of multiscale and nonlinear phenomena
in granular, porous media. Furthermore, the framework
⋆
e-mail: atordesi@unimelb.edu.au
can be used to address the coupled evolution of the solid
grain and interstitial pore phases through a study of two
classes of interdependent networks in a single platform,
viz. one that represents the grain contacts while the other
represents the pores, thus advancing the approach in [3]
for planar systems.
As shown in [4], the high-dimensional parameter space
of variables related to permeability is likely to exhibit a
high degree of redundancy. Occam’s razor dictates all but
the most relevant and least redundant of these variables
should be retained to explain a given phenomenon of inter-
est. To this end, a ranking of the variables in order of rel-
evance and redundancy is crucial for predictive modelling
and ultimately control. We apply two such algorithms in
this paper, quantify the redundancy, and highlight several
highly relevant parameters for the permeability.
2 Methods
The methods and dataset used here are based on the
methodology outlined in [4]. In this paper, we summa-
rize the corresponding framework. For a comprehensive
discussion of all model equations, parameters and assump-
tions involved, we refer to the aformentioned study. The
employed data generation process is shown in Figure 1.
2.1 Discrete element modeling
Our discrete-element model is a simple proxy for sands
and sedimentary rocks (i.e., Ottawa sand and sandstone)
[2, 5, 6]. Batches of 400 soft spheres are dropped in a rect-
angular container with periodic boundary conditions for
DOI: 10.1051/ , 714012006 140 EPJ Web of Conferences epjconf/201
Powders & Grains 2017
12006 (2017)
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (http://creativecommons.org/licenses/by/4.0/).