Classification of variable objects in LINEAR using Hipparcos variable stars M. S¨ uveges 1,2* , L. Rimoldini 1,2 , F. Barblan 2 , M. Spano 2 , R. I. Anderson 2 , L. Eyer 2 , L. Palaversa 2 , M. Beck 1,2 , P. Dubath 1,2 , L. Guy 1,2 , I. Lecoeur-Ta¨ ıbi 1,2 , N. Mowlavi 1,2 , K. Nienartowicz 1,2 , D. Ord´ o˜ nez-Blanco 1,2 , ˇ Z. Ivezi´ c 3 , B. Sesar 4 , A. C. Becker 3 , J. S. Stuart 5 1 ISDC Data Centre for Astrophysics, University of Geneva, Switzerland 2 Department of Astronomy, University of Geneva, Switzerland 3 University of Washington, Department of Astronomy, USA 4 Division of Physics, Mathematics and Astronomy, Caltech, Pasadena, USA 5 Lincoln Laboratory, MIT, Lexington, USA Abstract One fundamental pre-processing step in the analysis of the data sets produced by astro- nomical surveys is the classification of objects. We classify a set of visually selected variable sources from the Lincoln Near-Earth Asteroid Research (LINEAR) survey into variability types using a supervised machine learning algorithm, Random Forest (Breiman, 2001). An important limitation was the absence of a sufficiently well-known training set in LINEAR, which prompted us to use data from the Hipparcos satellite survey. The different charac- teristics of the two surveys induce biases between regions occupied by the same variability types in the attribute space, unequal occurrence of types, different aliases that influence the period recovery, and bad coverage in the attribute space. We present the classification scheme, some simple strategies to avoid the effects of sample selection and attribute bias, and show our results on the class regions in the attribute space. We present a few examples for the different types of variable stars found in LINEAR. Keywords : methods: data analysis – methods:statistical – stars:variables – surveys. 1 Introduction Variable stars are of high importance in many branches of astronomy and astrophysics such as asteroseismology, stellar or galactic evolution, fundamental parameters of stellar physics, cosmology. Nowadays, large surveys investigate the sky to an unprecedented depth and produce huge volumes of data for astrophysical studies. We need rapid, efficient, and most importantly automated ways to extract the desired kind of objects from these data sets. One way to achieve this is to use a machine-learning algorithm for supervised classification of the objects. Supervised classification methods estimate the type for an object of unknown class based on some attributes (for example period, amplitude, colours, etc.), by constructing a model based * e-mail: Maria.Suveges@unige.ch 1