Proceedings of ACL-08: HLT, Short Papers (Companion Volume), pages 237–240, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications Yoav Goldberg and Michael Elhadad Ben Gurion University of the Negev Department of Computer Science POB 653 Be’er Sheva, 84105, Israel {yoavg,elhadad}@cs.bgu.ac.il Abstract We present a fast, space efficient and non- heuristic method for calculating the decision function of polynomial kernel classifiers for NLP applications. We apply the method to the MaltParser system, resulting in a Java parser that parses over 50 sentences per sec- ond on modest hardware without loss of accu- racy (a 30 time speedup over existing meth- ods). The method implementation is available as the open-source splitSVM Java library. 1 Introduction Over the last decade, many natural language pro- cessing tasks are being cast as classification prob- lems. These are then solved by of-the-shelf machine-learning algorithms, resulting in state-of- the-art results. Support Vector Machines (SVMs) have gained popularity as they constantly outper- form other learning algorithms for many NLP tasks. Unfortunately, once a model is trained, the de- cision function for kernel-based classifiers such as SVM is expensive to compute, and can grow lin- early with the size of the training data. In contrast, the computational complexity for the decisions func- tions of most non-kernel based classifiers does not depend on the size of the training data, making them orders of magnitude faster to compute. For this rea- son, research effort was directed at speeding up the classification process of polynomial-kernel SVMs (Isozaki and Kazawa, 2002; Kudo and Matsumoto, 2003; Wu et al., 2007). Existing accelerated SVM solutions, however, either require large amounts of memory, or resort to heuristics – computing only an approximation to the real decision function. This work aims at speeding up the decision func- tion computation for low-degree polynomial ker- nel classifiers while using only a modest amount of memory and still computing the exact function. This is achieved by taking into account the Zipfian nature of natural language data, and structuring the compu- tation accordingly. On a sample application (replac- ing the libsvm classifier used by MaltParser (Nivre et al., 2006) with our own), we observe a speedup factor of 30 in parsing time. 2 Background and Previous Work In classification based NLP algorithms, a word and its context is considered a learning sample, and en- coded as Feature Vectors. Usually, context data in- cludes the word being classified (w 0 ), its part-of- speech (PoS) tag (p 0 ), word forms and PoS tags of neighbouring words (w -2 ,...,w +2 , p -2 ,...,p +2 , etc.). Computed features such as the length of a word or its suffix may also be added. A feature vec- tor (F ) is encoded as an indexed list of all the fea- tures present in the training corpus. A feature f i of the form w +1 = dog means that the word follow- ing the one being classified is ‘dog’. Every learning sample is represented by an n = |F | dimensional binary vector x. x i =1 iff the feature f i is active in the given sample, 0 otherwise. n is the number of different features being considered. This encod- ing leads to vectors with extremely high dimensions, mainly because of lexical features w i . SVM is a supervised binary classifier. The re- sult of the learning process is the set SV of Sup- 237