Astroinformatics Proceedings IAU Symposium No. 325, 2016 M. Brescia, S.G. Djorgovski, E. Feigelson, G. Longo & S. Cavuoti, eds. c International Astronomical Union 2017 doi:10.1017/S1743921317000266 Automatic Source Classification in Digitised First Byurakan Survey Martin Topinka 1 , Areg Mickaelian 2 , Roberto Nesci 3 and Corinne Rossi 3 1 Dublin Institute for Advanced Studies, 31 Fitzwilliam place, Dublin 2, Ireland email: martin.topinka@gmail.com 2 Byurakan Astrophysical Observatory, Byurakan, Aragatzotn, AM 0213, Armenia 3 Universita di Roma ‘La Sapienza’, Piazzale A. Moro 2, 00185 Roma, Italy Abstract. The Digitised First Byurakan Survey (DFBS) provides low dispersion optical spectra for about 24 million sources. A two-step machine learning algorithm based on similarities to pre- defined templates is applied to select different classes of rare objects in the dataset automatically, for example late type stars, quasars and white dwarves. Identifying outliers from the groups of common astrophysical objects may lead to discovery of rare objects, such as gamma-ray burst afterglows. Keywords. methods: statistical, astronomical data bases: surveys 1. Digitised First Byurakan Survey The First Byurakan Survey (FBS) was the first systematic objective prism survey of the extragalactic sky initiated by Markarian, Lipovetski and Stepanian in between the years 1965–1980 at the Byurakan Astrophysical Observatory with the 1m Schmidt telescope and 1.5 ◦ prism (Mickaelian et al., 2007). It contains 2050 Kodak photographic plates 4 ◦ × 4 ◦ fields, covering 17 000 deg 2 of the Northern and part fo the Southern sky δ> −15 ◦ at high galactic latitudes |b| > 15 ◦ . Each FBS plate contains about 15,000 - 20,000 low-dispersion optical spectra, yielding more than 24,000,000 objects in the whole survey. For comparison, SDSS contains about 4,000,000 spectra (SDSS R12). The spectral range is 340 − 690 nm, with a sensitivity gap near 530 nm, dividing the spectra into red and blue parts. The plates have been scanned and digitised which resulted in Digitised First Byurakan Survey (DFBS). The spectra have been extracted in a catalog-driven way, using object positions obtained from the USNO-A2 catalogue used as a reference down to the plate limit (17 m ). The astrometric solution is within the positional error of 1 ′′ or less. The 2D spectral boxes have been identified and integrated to yield 1D low dispersion spectra, 141 pixels each. The DFBS catalogue is available online in the form of a searchable SQL database (DFBS website). 2. Classification Only a fraction of the USNO-A2 objects found in the DFBS plates has been spectrally classified so far. The machine learning (ML) object classification method proposed in this work is based on finding similarities (similarity measures) between an unknown 186 https://doi.org/10.1017/S1743921317000266 Published online by Cambridge University Press