Y. Shi et al. (Eds.): ICCS 2007, Part I, LNCS 4487, pp. 1130–1137, 2007.
© Springer-Verlag Berlin Heidelberg 2007
Active Learning with Support Vector Machines for
Tornado Prediction
Theodore B. Trafalis
1
, Indra Adrianto
1
, and Michael B. Richman
2
1
School of Industrial Engineering, University of Oklahoma, 202 West Boyd St, Room 124,
Norman, OK 73019, USA
ttrafalis@ou.edu, adrianto@ou.edu
2
School of Meteorology, University of Oklahoma, 120 David L. Boren Blvd, Suite 5900,
Norman, OK 73072, USA
mrichman@ou.edu
Abstract. In this paper, active learning with support vector machines (SVMs) is
applied to the problem of tornado prediction. This method is used to predict
which storm-scale circulations yield tornadoes based on the radar derived
Mesocyclone Detection Algorithm (MDA) and near-storm environment (NSE)
attributes. The main goal of active learning is to choose the instances or data
points that are important or have influence to our model to be labeled and in-
cluded in the training set. We compare this method to passive learning with
SVMs where the next instances to be included to the training set are randomly
selected. The preliminary results show that active learning can achieve high
performance and significantly reduce the size of training set.
Keywords: Active learning, support vector machines, tornado prediction, ma-
chine learning, weather forecasting.
1 Introduction
Most conventional learning methods use static data in the training set to construct a
model or classifier. The ability of learning methods to update the model dynamically,
using new incoming data, is important. One method that has this ability is active
learning. The objective of active learning for classification is to choose the instances
or data points to be labeled and included in the training set. In many machine learning
tasks, collecting data and/or labeling data to create a training set is costly and time-
consuming. Rather than selecting and labeling data randomly, it is better if we can la-
bel the data that are important or have influence to our model or classifier.
In tornado prediction, labeling data is considered costly and time consuming since
we need to verify which storm-scale circulations produce tornadoes in the ground.
The tornado events can be verified from facts in the ground including photographs,
videos, damage surveys, and eyewitness reports. Based on tornado verification, we
then determine and label which circulations produce tornadoes or not. Therefore, ap-
plying active learning for tornado prediction to minimize the need for the instances
and use the most informative instances in the training set in order to update the classi-
fier would be beneficial.