The Usage of Modern Data Science in Segmentation and Classification: Machine Learning and Microscopy Matthew Andrew 1 , Sreenivas Bhattiprolu 1 , Daniel Butnaru 2 and Joaquin Correa 1 1. Carl Zeiss X-ray Microscopy, 4385 Hopyard Rd, Pleasanton, CA 2. Carl Zeiss Microscopy, Kistlerhofstrasse 75, Munich, Germany One of the most challenging stages in any microscopy workflow is the ability to transform images into rich digital models of segmented data. These models enable quantification of features of interest and power data-driven analysis. Frequently, the greyscale output from detectors carry both a variety of modality-specific artifacts and noise that cause, as resulting images become more complex, the failure of threshold-based segmentation approaches [1]. When visually inspecting such images, the brain acts to smooth out such noise and recognize patterns in the data to extract information through the artefacts, but such a process has frequently proved hard to automate and capture in a computational form. Instead, traditionally science has relied on the hard work of researchers to manually segment such artefact ridden images, however, as datasets grow larger, more complex and more multidimensional, such manual approaches become more and more challenging to scale. Also, if such imaging technologies are going to be used at an industrial scale, manual segmentation is unsustainable. The last 20 years has seen a transformation in a wide range of fields, widely grouped together under the umbrella of “Machine Learning”. While these technologies have transformed many areas of data science ranging from medical diagnosis to stock market analysis, frequently image analysis for microscopy (outside some specific areas of application) has lagged behind developments in other fields. The power of such algorithms, when applied to segmentation and classification problems in microscopy lie in their ability to create arbitrary classifiers which operate in much higher dimensional space than simply the image output from a specific microscope detector. These higher dimensional spaces may be (spatially and / or temporally correlated) images acquired in different imaging modalities (i.e. using different detectors, energies or techniques to extract different properties from the sample) or derivative images derived by applying a range of filters to the sample (e.g. gradient, smoothing or textural filters to extract different local and non-local features from the image). High dimensional feature extraction is often computationally expensive, we propose a solution that reduces significantly time-to-knowledge turnaround on commodity workstations. It overcomes these challenges by leveraging the success of widely adopted methods in the data science field with out-of- core algorithms to achieve the much-needed performance in time sensitive analysis. In this paper, we show results enabled by the aforementioned solution for two applications in the geological analysis space, lithological classification of heterogeneous rocks and 3D mineralogy. In the first application we attempt to address one of the most challenging issues when analyzing subsurface geological samples using X-ray microscopy; that of heterogeneity [2]. Frequently the resolution required to image the fundamental pore structure of real rocks comes at the expense of a field of view representative of true subsurface heterogeneity. Multi-scale approaches must be used, first to characterize heterogeneity at low resolution before finally zooming to image specific locations based on the macroscopic map. As the microstructure is not explicitly resolved in the large field of view scan, each voxel represents some average of local pore and grain. Machine learning can be used to classify the rock into high and low porosity region based not only on local greyscale, but on non-local greyscale averages and gradients 156 doi:10.1017/S1431927617001465 Microsc. Microanal. 23 (Suppl 1), 2017 © Microscopy Society of America 2017 https://doi.org/10.1017/S1431927617001465 Downloaded from https://www.cambridge.org/core. IP address: 54.205.159.194, on 04 Oct 2021 at 10:38:43, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.