The Usage of Modern Data Science in Segmentation and Classification: Machine
Learning and Microscopy
Matthew Andrew
1
, Sreenivas Bhattiprolu
1
, Daniel Butnaru
2
and Joaquin Correa
1
1.
Carl Zeiss X-ray Microscopy, 4385 Hopyard Rd, Pleasanton, CA
2.
Carl Zeiss Microscopy, Kistlerhofstrasse 75, Munich, Germany
One of the most challenging stages in any microscopy workflow is the ability to transform images into
rich digital models of segmented data. These models enable quantification of features of interest and
power data-driven analysis. Frequently, the greyscale output from detectors carry both a variety of
modality-specific artifacts and noise that cause, as resulting images become more complex, the failure of
threshold-based segmentation approaches [1]. When visually inspecting such images, the brain acts to
smooth out such noise and recognize patterns in the data to extract information through the artefacts, but
such a process has frequently proved hard to automate and capture in a computational form. Instead,
traditionally science has relied on the hard work of researchers to manually segment such artefact ridden
images, however, as datasets grow larger, more complex and more multidimensional, such manual
approaches become more and more challenging to scale. Also, if such imaging technologies are going to
be used at an industrial scale, manual segmentation is unsustainable.
The last 20 years has seen a transformation in a wide range of fields, widely grouped together under the
umbrella of “Machine Learning”. While these technologies have transformed many areas of data science
ranging from medical diagnosis to stock market analysis, frequently image analysis for microscopy
(outside some specific areas of application) has lagged behind developments in other fields. The power
of such algorithms, when applied to segmentation and classification problems in microscopy lie in their
ability to create arbitrary classifiers which operate in much higher dimensional space than simply the
image output from a specific microscope detector. These higher dimensional spaces may be (spatially
and / or temporally correlated) images acquired in different imaging modalities (i.e. using different
detectors, energies or techniques to extract different properties from the sample) or derivative images
derived by applying a range of filters to the sample (e.g. gradient, smoothing or textural filters to extract
different local and non-local features from the image).
High dimensional feature extraction is often computationally expensive, we propose a solution that
reduces significantly time-to-knowledge turnaround on commodity workstations. It overcomes these
challenges by leveraging the success of widely adopted methods in the data science field with out-of-
core algorithms to achieve the much-needed performance in time sensitive analysis. In this paper, we
show results enabled by the aforementioned solution for two applications in the geological analysis
space, lithological classification of heterogeneous rocks and 3D mineralogy. In the first application we
attempt to address one of the most challenging issues when analyzing subsurface geological samples
using X-ray microscopy; that of heterogeneity [2]. Frequently the resolution required to image the
fundamental pore structure of real rocks comes at the expense of a field of view representative of true
subsurface heterogeneity. Multi-scale approaches must be used, first to characterize heterogeneity at low
resolution before finally zooming to image specific locations based on the macroscopic map. As the
microstructure is not explicitly resolved in the large field of view scan, each voxel represents some
average of local pore and grain. Machine learning can be used to classify the rock into high and low
porosity region based not only on local greyscale, but on non-local greyscale averages and gradients
156
doi:10.1017/S1431927617001465
Microsc. Microanal. 23 (Suppl 1), 2017
© Microscopy Society of America 2017
https://doi.org/10.1017/S1431927617001465
Downloaded from https://www.cambridge.org/core. IP address: 54.205.159.194, on 04 Oct 2021 at 10:38:43, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.