Y. Shi et al. (Eds.): ICCS 2007, Part I, LNCS 4487, pp. 1130–1137, 2007. © Springer-Verlag Berlin Heidelberg 2007 Active Learning with Support Vector Machines for Tornado Prediction Theodore B. Trafalis 1 , Indra Adrianto 1 , and Michael B. Richman 2 1 School of Industrial Engineering, University of Oklahoma, 202 West Boyd St, Room 124, Norman, OK 73019, USA ttrafalis@ou.edu, adrianto@ou.edu 2 School of Meteorology, University of Oklahoma, 120 David L. Boren Blvd, Suite 5900, Norman, OK 73072, USA mrichman@ou.edu Abstract. In this paper, active learning with support vector machines (SVMs) is applied to the problem of tornado prediction. This method is used to predict which storm-scale circulations yield tornadoes based on the radar derived Mesocyclone Detection Algorithm (MDA) and near-storm environment (NSE) attributes. The main goal of active learning is to choose the instances or data points that are important or have influence to our model to be labeled and in- cluded in the training set. We compare this method to passive learning with SVMs where the next instances to be included to the training set are randomly selected. The preliminary results show that active learning can achieve high performance and significantly reduce the size of training set. Keywords: Active learning, support vector machines, tornado prediction, ma- chine learning, weather forecasting. 1 Introduction Most conventional learning methods use static data in the training set to construct a model or classifier. The ability of learning methods to update the model dynamically, using new incoming data, is important. One method that has this ability is active learning. The objective of active learning for classification is to choose the instances or data points to be labeled and included in the training set. In many machine learning tasks, collecting data and/or labeling data to create a training set is costly and time- consuming. Rather than selecting and labeling data randomly, it is better if we can la- bel the data that are important or have influence to our model or classifier. In tornado prediction, labeling data is considered costly and time consuming since we need to verify which storm-scale circulations produce tornadoes in the ground. The tornado events can be verified from facts in the ground including photographs, videos, damage surveys, and eyewitness reports. Based on tornado verification, we then determine and label which circulations produce tornadoes or not. Therefore, ap- plying active learning for tornado prediction to minimize the need for the instances and use the most informative instances in the training set in order to update the classi- fier would be beneficial.