Astronomy and Computing 25 (2018) 195–202 Contents lists available at ScienceDirect Astronomy and Computing journal homepage: www.elsevier.com/locate/ascom Full length article TensorFit a tool to analyse spectral cubes in a tensor mode H. Farias , C. Nuñez, M. Solar Universidad Técnica Federico Santa María, Chile article info Article history: Received 31 March 2018 Received in revised form 8 October 2018 Accepted 18 October 2018 Available online 25 October 2018 abstract As it is already known, modern observatories like the Atacama Large Millimeter/submillimeter Array (ALMA) and the Very Long Baseline Array (VLBA) generate large-scale data, which will be accentuated with the incorporation of new observatories, such as the Square Kilometre Array (SKA). It is projected by 2020 to obtain an archived astronomical data in a PB-scale (60 Petabyte). The Chilean Virtual Observatory (ChiVO) has stored the spectral cubes of ALMA and seeks to offer these data openly to the community, but downloading and processing these data should be done in its facilities. To this end, our proposal considers the cubes as a high order tensor, specifically 3-way tensor with 2 spatial dimensions (galactic latitude and longitude), and a velocity dimension. This opens a new approach and opportunity for computational prohibitive massive analysis of these cubes. Based on this premise, we propose TensorFit, a natural and scalable library to handle spectral cubes in a tensor mode. The implementation is built on parallel oriented frameworks, and distributed processing of n-arrays on PyTorch (GPU and CPU). To verify the impact of this proposal, our focus is on showing the benefits of tensor compression, in particular to Tucker implementations. These have demonstrated outstanding results of dimensionality reduction of multidimensional data in other scientific domains. © 2018 Elsevier B.V. All rights reserved. 1. Introduction As it is already known, modern observatories like the Atacama Large Millimeter/ submillimeter Array (ALMA) and the Very Long Baseline Array (VLBA) generate large-scale data, which will be ac- centuated with the incorporation of new observatories such as the Square Kilometre Array (SKA). It is projected by 2020 to obtain an archived astronomical data in a PB-Scale (60 Petabyte) (Berriman and Groom, 2011). Within the diversity of this vast volume of data generated, we will focus on the data cubes or spectral cubes. These cubes are the scientific result of the spectroscopy observations that aim to obtain greater precision and depth. However, this sacrifices field observation, unlike photometry that aims to cover more field, but less depth. The spectral cubes are formed by two physical coordinates, Right ascension(Ra)/Declination(Dec), and a third coordinate that in the case of ALMA, is wavelength or velocity. In addition to ALMA, there are other instruments such as The Multi Unit Spectroscopic Explorer (MUSE) in the Very Large Telescope (VLT) that also generates an Integral Field Unit (IFU) cube in the visible wavelength range. Working with cubes of astronomical data is complex. On the one hand, we have the problem of data size that has been ex- tensively studied in recent times (Araya et al., 2016; Law et al., 2016; Hassan et al., 2013, 2011); but there is another equally Corresponding author. E-mail address: humberto.farias@usm.cl (H. Farias). relevant problem that has not had the same scientific attention, as the dimensionality of these cubes. This problem in computer science is known as the curse of dimensionality, a term coined by Bellman (1961). It essentially indicates that the number of samples needed to estimate an arbitrary function with a given level of precision grows exponentially with the number of dimensions. To contextualize the problem in astronomy let consider the cubes of ALMA. Here, we can find some cubes whose dimension in the physical axis is 5120. ALMA can deliver data cubes with up to 7680 channels of frequency, corresponding to the velocity axis. This means, almost 80 million elements in a single data cube to be processed. Deep Learning (Akeret et al., 2017; Ma et al., 2017) and advanced models (Kremer et al., 2017; Polsterer et al., 2015) of search and classification in astronomy, is a growing field, given the outstanding results in many fields of knowledge. The use of these spectra’s cubes as input for these models imply a superlative computational complexity, so the objective of the present work is to prepare these cubes for the use of machine learning. These problems were identified by the Chilean Virtual Observa- tory (ChiVO). ChiVO seeks to offer the data of the main observato- ries located in the Chilean territory; specifically, in ChiVO’s Data Centre are stored the observation cycles 0, 1, 2, and 3 of ALMA. This process will continue until completing the total public data of this observatory. These data are stored under the standards of the International Virtual Observatory Alliance (IVOA), so they can be downloaded by the scientific community but also be re-processed in the Data Centre. The aim is to look for techniques that will https://doi.org/10.1016/j.ascom.2018.10.007 2213-1337/© 2018 Elsevier B.V. All rights reserved.