TOWARDS BETTER SEGMENTATION OF LARGE FLOATING POINT 3D ASTRONOMICAL DATA SETS: FIRST RESULTS U. Moschini, P. Teeninga, M.H.F. Wilkinson Johann Bernoulli Institute University of Groningen The Netherlands N. Giese, D. Punzo, J.M. van der Hulst, S.C. Trager Kapteyn Astronomical Institute University of Groningen The Netherlands ABSTRACT In any image segmentation task, noise must be separated from the actual information and the relevant pixels grouped into ob- jects of interest, on which measures can later be applied. This should be done efficiently on large astronomical surveys with floating point datasets with resolution of the order of Gigapix- els. We illustrate in this paper how the combination of two techniques presented in previous works can help in this task. We summarise the benefits and initial outcomes of combin- ing together a parallel algorithm to build max-trees of float- ing point data sets and a connected attribute filter that uses a statistical approach to identify structures due to noise and to perform segmentation on 3D radio cubes. 1 Index Terms— radio astronomy, attribute filter, max-tree 1. INTRODUCTION Big data from space spans many different application fields. Examples are single-band or multi-band astronomical surveys of regions of the sky, astronomical radio surveys that often produce three-dimensional data volumes, and remote sensing images from satellites. Such data are considered big with a double meaning: their resolution (or simply the number of separate observations) is high and so is the bit depth of the data type they carry. The focus of this work is on the process- ing of radio astronomical spectral line data. Radio astronomy studies the radio emission from astronomical objects, which is not absorbed by dust clouds in galaxies nor affected by the Earth’s atmosphere. In particular radio spectral line emission of galaxies is captured as 3D volumes and contains important information for investigating the distribution and kinematics of gas in galaxies. Such radio cubes carry floating point val- ues and have high resolution in the order of Gigavoxels. A better segmentation of the objects means that better and more meaningful measures and statistics can be computed. With upcoming large surveys of the sky, the size of a 3D 1 Part of this work was funded by the Netherlands Organisation for Sci- entific Research (NWO) under project number 612.001.110 and by the Eu- ropean Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement nr. 291531. (a) (b) (b) Fig. 1: [1] Two merging galaxies. The segmentation per- formed (a) by SExtractor; (b) by the method in [1] with the background estimated by SExtractor and (c) by the method in [1] with our background estimate. The filament is identified. data cube will increase to the order of Terapixels. A man- ual extraction of possible sources by hand is not feasible any more and automatic segmentation methods are needed. Max- trees [2] are a powerful image representation that can help in this task. A max-tree is a tree structure that represents the hierarchy of the connected sets (components) of any image or volume. Each node corresponds to a connected set. Sev- eral attributes can be computed efficiently for every node and many filtering strategies based on them can be applied on the tree to perform segmentation of the objects of interest. In the next section, to separate efficiently objects from noise in big volumes, the combination of a new parallel al- gorithm to build max-tree of floating point 3D volumes [3] with a connected statistical attribute filter [1] is introduced. An example from [1] is reported to illustrate the filter in the case of a two-dimensional astronomical image taken from the Sloan Digital Sky Survey DR7 [4]. In the other sections, the extension of the filter to 3D radio cubes is discussed together with the results compared with the output from SoFiA [5], a source finder used with this kind of data. Big Data Processing Proc. of the 2014 conference on Big Data from Space (BiDS’14) doi: 10.2788/1823 218 European Space Agency-ESRIN Frascati, Italy, 12–14 November 2014