Astronomy and Computing 25 (2018) 195–202
Contents lists available at ScienceDirect
Astronomy and Computing
journal homepage: www.elsevier.com/locate/ascom
Full length article
TensorFit a tool to analyse spectral cubes in a tensor mode
H. Farias
∗
, C. Nuñez, M. Solar
Universidad Técnica Federico Santa María, Chile
article info
Article history:
Received 31 March 2018
Received in revised form 8 October 2018
Accepted 18 October 2018
Available online 25 October 2018
abstract
As it is already known, modern observatories like the Atacama Large Millimeter/submillimeter Array
(ALMA) and the Very Long Baseline Array (VLBA) generate large-scale data, which will be accentuated with
the incorporation of new observatories, such as the Square Kilometre Array (SKA). It is projected by 2020
to obtain an archived astronomical data in a PB-scale (≈60 Petabyte). The Chilean Virtual Observatory
(ChiVO) has stored the spectral cubes of ALMA and seeks to offer these data openly to the community, but
downloading and processing these data should be done in its facilities. To this end, our proposal considers
the cubes as a high order tensor, specifically 3-way tensor with 2 spatial dimensions (galactic latitude
and longitude), and a velocity dimension. This opens a new approach and opportunity for computational
prohibitive massive analysis of these cubes. Based on this premise, we propose TensorFit, a natural
and scalable library to handle spectral cubes in a tensor mode. The implementation is built on parallel
oriented frameworks, and distributed processing of n-arrays on PyTorch (GPU and CPU). To verify the
impact of this proposal, our focus is on showing the benefits of tensor compression, in particular to
Tucker implementations. These have demonstrated outstanding results of dimensionality reduction of
multidimensional data in other scientific domains.
© 2018 Elsevier B.V. All rights reserved.
1. Introduction
As it is already known, modern observatories like the Atacama
Large Millimeter/ submillimeter Array (ALMA) and the Very Long
Baseline Array (VLBA) generate large-scale data, which will be ac-
centuated with the incorporation of new observatories such as the
Square Kilometre Array (SKA). It is projected by 2020 to obtain an
archived astronomical data in a PB-Scale (≈60 Petabyte) (Berriman
and Groom, 2011). Within the diversity of this vast volume of data
generated, we will focus on the data cubes or spectral cubes. These
cubes are the scientific result of the spectroscopy observations
that aim to obtain greater precision and depth. However, this
sacrifices field observation, unlike photometry that aims to cover
more field, but less depth. The spectral cubes are formed by two
physical coordinates, Right ascension(Ra)/Declination(Dec), and a
third coordinate that in the case of ALMA, is wavelength or velocity.
In addition to ALMA, there are other instruments such as The Multi
Unit Spectroscopic Explorer (MUSE) in the Very Large Telescope
(VLT) that also generates an Integral Field Unit (IFU) cube in the
visible wavelength range.
Working with cubes of astronomical data is complex. On the
one hand, we have the problem of data size that has been ex-
tensively studied in recent times (Araya et al., 2016; Law et al.,
2016; Hassan et al., 2013, 2011); but there is another equally
∗
Corresponding author.
E-mail address: humberto.farias@usm.cl (H. Farias).
relevant problem that has not had the same scientific attention,
as the dimensionality of these cubes. This problem in computer
science is known as the curse of dimensionality, a term coined by
Bellman (1961). It essentially indicates that the number of samples
needed to estimate an arbitrary function with a given level of
precision grows exponentially with the number of dimensions. To
contextualize the problem in astronomy let consider the cubes
of ALMA. Here, we can find some cubes whose dimension in the
physical axis is 5120. ALMA can deliver data cubes with up to
7680 channels of frequency, corresponding to the velocity axis.
This means, almost 80 million elements in a single data cube to
be processed. Deep Learning (Akeret et al., 2017; Ma et al., 2017)
and advanced models (Kremer et al., 2017; Polsterer et al., 2015)
of search and classification in astronomy, is a growing field, given
the outstanding results in many fields of knowledge. The use of
these spectra’s cubes as input for these models imply a superlative
computational complexity, so the objective of the present work is
to prepare these cubes for the use of machine learning.
These problems were identified by the Chilean Virtual Observa-
tory (ChiVO). ChiVO seeks to offer the data of the main observato-
ries located in the Chilean territory; specifically, in ChiVO’s Data
Centre are stored the observation cycles 0, 1, 2, and 3 of ALMA.
This process will continue until completing the total public data of
this observatory. These data are stored under the standards of the
International Virtual Observatory Alliance (IVOA), so they can be
downloaded by the scientific community but also be re-processed
in the Data Centre. The aim is to look for techniques that will
https://doi.org/10.1016/j.ascom.2018.10.007
2213-1337/© 2018 Elsevier B.V. All rights reserved.