Anytime Query-Tuned Kernel Machine Classifiers Via Cholesky Factorization Dennis DeCoste Machine Learning Systems Group Jet Propulsion Laboratory / California Institute of Technology 4800 Oak Grove Drive, Pasadena, CA 91109, USA dennis. decoste@jpl.nasa.gov Abstract We recently demonstrated 2 to 64-fold query-time speedups of Sup- port Vector Machine and Kernel Fisher classifiers via a new com- putational geometry method for anytime output bounds (DeCoste, 2002). This new paper refines our approach in two key ways. First, we introduce a simple linear algebra formulation based on Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on benchmark datasets. . 1 Introduction Support vector machines (SVMs) and other kernel methods have shown much recent promise (Scholkopf & Smola, 2002). However, wide-spread use on large-scale tasks remains hindered by query-time costs often much higher than others, such as deci- sion trees and neural networks. For example, an SVM recently achieved the lowest error rates on the MNIST benchmark digit recognition task (DeCoste & Scholkopf, 2002), but classified much more slowly than the previous best (a neural network), due to many SVs for each digit recognizer (around 20,000). It is also troubling that classification costs are identical for each query example, even for “easy” examples that other methods (e.g. decision trees) can classify relatively quickly. We recently demonstrated 2 to 64fold query-time speedups of Support Vector Ma- chine and Kernel Fisher Discriminant classifiers based on a new computational geometry method for anytime output bounds (DeCoste, 2002). Unlike related ap- proximation methods such as “reduced sets” (eg. (Burges, 1996; Scholkopf et al., 1999; Scholkopf et al., 1998; Burges & Scholkopf, 1997; Romdhani et al., 2001)), our approach guarantees preservation of all classifications of the original kernel ma- chine. Furthermore, unlike related “exact simplification” methods (Downs et al., 2001) we also achieve “proportionality to difficulty” - our classification time tends to be inversely proportional to a query’s distance from the discriminant hyperplane. This new paper improves upon (DeCoste, 2002) in two key ways. First, Section 3 introduces a simple linear algebra formulation based on Cholesky factorization,