Use of Machine Learning for Anomaly Detection Problem in Large Astronomical Databases Konstantin Malanchev 1,2,* , Alina Volnova 3 , Matwey Kornilov 1,2,+ , Maria Pruzhin- skaya 1,++ , Emille Ishida 4 , Florian Mondon 4 , and Vladimir Korolev 5,6 1 Lomonosov Moscow State University, Sternberg Astronomical Institute, Universi- tetsky pr. 13, Moscow, 119234, Russia * malanchev@sai.msu.ru + matwey@sai.msu.ru ++ pruzhinskaya@gmail.com 2 National Research University Higher School of Economics, 21/4 Staraya Basmannaya Ulitsa, Moscow, 105066, Russia 3 Space Research Institute of the Russian Academy of Sciences (IKI), 84/32 Profsoyuznaya Street, Moscow, 117997, Russia 4 Université Clermont Auvergne, CNRS/IN2P3, LPC, F-63000 Clermont-Ferrand, France 5 Central Aerohydrodynamic Institute, 1 Zhukovsky st, Zhukovsky, Moscow Region, 140180, Russia 6 Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, 141701, Russia Abstract. In this work, we address the problem of anomaly detection in large astronomical databases by machine learning methods. The importance of such study is justified by the presence of a large amount of astronomical data that can- not be processed only by human resource. We focus our attention on finding anomalous light curves in the Open Supernova Catalog. Few types of anomalies are considered: the artifacts in the data, the cases of misclassification and the presence of previously unclassified objects. On a dataset of ~ 2000 supernova (SN) candidates, we found several interesting anomalies: one active galactic nu- cleus (SN2006kg), one binary microlensing event (Gaia16aye), representatives of rare classes of SNe such as super-luminous supernovae, and highly reddened ob- jects. Keywords: Machine learning; Isolation forest; Gaussian processes; Superno- vae; Transients 1 Introduction During the last couple of decades, astronomy eventually became the source of huge amounts of data produced by different dedicated surveys and experiments, which re- quire careful processing to extract valuable information. Gigabytes of data are collected daily in every domain of electromagnetic spectrum: in high-energy range [1], optics [2, 3], and radio [4], as well as in cosmic particles window [5] and gravitational waves [6, 7]. The search for yet unknown statistically significant features of astronomical objects, 205 Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).