arXiv:1103.6034v1 [astro-ph.IM] 30 Mar 2011 Mon. Not. R. Astron. Soc. 000, 1–15 (2011) Printed 1 April 2011 (MN L A T E X style file v2.2) Semi-supervised Learning for Photometric Supernova Classification Joseph W. Richards 1,2 , Darren Homrighausen 3 , Peter E. Freeman 3 , Chad M. Schafer 3 , and Dovi Poznanski 1,4 1 Department of Astronomy, University of California, Berkeley, CA, 94720-7450, USA 2 Department of Statistics, University of California, Berkeley, CA, 94720-7450, USA 3 Department of Statistics, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA 4 Computational Cosmology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA Accepted . Received 2011 March 28 ; in original form 2011 March 28 ABSTRACT We present a semi-supervised method for photometric supernova typing. Our approach is to first use the nonlinear dimension reduction technique diffusion map to detect structure in a database of supernova light curves and subsequently employ random forest classification on a spectroscopically confirmed training set to learn a model that can predict the type of each newly observed supernova. We demonstrate that this is an effective method for supernova typing. As supernova numbers increase, our semi-supervised method efficiently utilizes this information to improve classification, a property not enjoyed by template based methods. Applied to supernova data simu- lated by Kessler et al. (2010b) to mimic those of the Dark Energy Survey, our methods achieve (cross-validated) 96% Type Ia purity and 86% Type Ia efficiency on the spec- troscopic sample, but only 56% Type Ia purity and 48% efficiency on the photometric sample due to their spectroscopic followup strategy. To improve the performance on the photometric sample, we search for better spectroscopic followup procedures by studying the sensitivity of machine learned supernova classification on the specific strategy used to obtain training sets. With a fixed amount of spectroscopic followup time, we find that, despite collecting data on a smaller number of supernovae, deeper magnitude-limited spectroscopic surveys are optimal for producing training sets. For supernova Ia (II-P) typing, we obtain a 37% (1%) increase in purity and 28% (270%) increase in efficiency of the sample using a 25th magnitude-limited survey instead of the shallower spectroscopic sample used in the original simulations. When redshift information is available, we incorporate it into our analysis using a novel method of altering the diffusion map representation of the SNe. Incorporating host redshifts leads to a 7% improvement in Type Ia purity and 19% improvement in Type Ia efficiency. Key words: methods: data analysis – methods: statistical – techniques: photometric – supernovae: general – surveys 1 INTRODUCTION Novel approaches to photometric supernova (SN) classifi- cation are in high demand in the astronomical community. The next generation of survey telescopes, such as the Dark Energy Survey (DES; Annis et al. 2011) and the Large Syn- optic Survey Telescope (LSST; Ivezic et al. 2008), are ex- pected to observe light curves for a few hundred thousand supernovae (SNe), far surpassing the resources available to A web service for the supernova classification method used in this paper can be found at TBA E-mail: jwrichar@stat.berkeley.edu (JWR) spectroscopically confirm the type of each. To fully exploit these large samples, it is imperative to develop methods that can accurately and automatically classify large samples of SNe based only on their photometric light curves. In order to use Type Ia supernovae as cosmological probes, it is imperative that pure and efficient Type Ia sam- ples are constructed. Yet, classifying SNe from their light curves is a challenging problem. The light flux measurements are often noisy, nonuniform in time, and incomplete. In par- ticular, it is difficult to discern the light curves of Type Ia SNe from those of Type Ib or Ic supernovae, explosive events which result from the core collapse of massive stars. This dif- ficulty can have dire effects on the subsequent cosmological