Use of Machine Learning for Anomaly Detection
Problem in Large Astronomical Databases
Konstantin Malanchev
1,2,*
, Alina Volnova
3
, Matwey Kornilov
1,2,+
, Maria Pruzhin-
skaya
1,++
, Emille Ishida
4
, Florian Mondon
4
, and Vladimir Korolev
5,6
1
Lomonosov Moscow State University, Sternberg Astronomical Institute, Universi-
tetsky pr. 13, Moscow, 119234, Russia
* malanchev@sai.msu.ru
+ matwey@sai.msu.ru
++ pruzhinskaya@gmail.com
2
National Research University Higher School of Economics, 21/4 Staraya Basmannaya
Ulitsa, Moscow, 105066, Russia
3
Space Research Institute of the Russian Academy of Sciences (IKI), 84/32 Profsoyuznaya
Street, Moscow, 117997, Russia
4
Université Clermont Auvergne, CNRS/IN2P3, LPC, F-63000 Clermont-Ferrand, France
5
Central Aerohydrodynamic Institute, 1 Zhukovsky st, Zhukovsky, Moscow Region,
140180, Russia
6
Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow
Region, 141701, Russia
Abstract. In this work, we address the problem of anomaly detection in large
astronomical databases by machine learning methods. The importance of such
study is justified by the presence of a large amount of astronomical data that can-
not be processed only by human resource. We focus our attention on finding
anomalous light curves in the Open Supernova Catalog. Few types of anomalies
are considered: the artifacts in the data, the cases of misclassification and the
presence of previously unclassified objects. On a dataset of ~ 2000 supernova
(SN) candidates, we found several interesting anomalies: one active galactic nu-
cleus (SN2006kg), one binary microlensing event (Gaia16aye), representatives of
rare classes of SNe such as super-luminous supernovae, and highly reddened ob-
jects.
Keywords: Machine learning; Isolation forest; Gaussian processes; Superno-
vae; Transients
1 Introduction
During the last couple of decades, astronomy eventually became the source of huge
amounts of data produced by different dedicated surveys and experiments, which re-
quire careful processing to extract valuable information. Gigabytes of data are collected
daily in every domain of electromagnetic spectrum: in high-energy range [1], optics [2,
3], and radio [4], as well as in cosmic particles window [5] and gravitational waves [6,
7]. The search for yet unknown statistically significant features of astronomical objects,
205
Copyright © 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).