The Effects of Quantization on Support Vector Machines Davide Anguita, Giovanni Bozza DIBE – Department of Biophysical and Electronic Engineering University of Genoa Via Opera Pia 11A, 16145 Genoa, Italy E-mail: anguita@dibe.unige.it, giovanni_bozza@yahoo.it Abstract— We apply here a probabilistic method to predict the effect of quantizing the parameters of a Support Vector Machine. Thank to the particular structure of the SVM, the dependency of the output from the quantization noise can be predicted with good accuracy, and a simple closed–form formula can be derived, without imposing any hard–to–verify assumption. I. I NTRODUCTION In the last years, several hardware implementations of the Support Vector Machine (SVM) have appeared in the literature [1], [2]. The purpose of these implementations is the development of embedded systems, especially targeting applications where the use of general–purpose processors is not effective. An example of this kind of systems is reported in [3], where a SVM–based hardware for voice–recognition tasks, with very low–power requirements, is detailed. One of the main constraints of application speciﬁc hardware, is the limited precision used to store the variables of the implemented algorithm. In general-purpose computing sys- tems, the variables are usually stored as 32 or 64 bit ﬂoating point numbers, which seldom pose any practical constraint on the implementation. On the contrary, in specialised digital hardware, the use of ﬁxed–point numbers and limited precision registers (i.e. 8 or 16 bit) is more commonplace, due to speed and silicon area concerns. The ﬁnite register length and low precision computations give rise to some artifacts, which can be considered a quantization noise and can severely modify the expected system behavior. Similar considerations apply also to analog devices, where the quantization effect is replaced by thermal noise and other unavoidable effects [4], which limit the variables precision. In the neural network community, this phenomenon has been already studied in the past, when several hardware implementation of Multi-Layer Perceptrons (MLPs) and other networks have been proposed. The main goal is ﬁnding a relation between the amount of quantization noise and the de- viation of the network output from the theoretical one (without quantization). There are mainly two approaches to perform such analysis: the ﬁrst one considers the quantization effect as a noise affecting the behavior of the network and which can be studied by means of its stochastic properties (e.g. mean and variance). In this case, the main underlying assumption is that the noise is of some orders of magnitude smaller than the signals carrying the information inside the network. Fur- thermore, additional assumptions must be introduced to ﬁnd closed–form formulas expressing the effect of the quantization on the network output [5], [6]. The second approach, instead, relies on a worst-case analysis, performed by propagating the quantization effect from the input to the output of the network and bounding its value using, for example, interval arithmetic [7]. The advantage over the probabilistic approach is the avoidance of any a–priori assumption but, on the other hand, the result can be somewhat pessimistic due to the difﬁculty of ﬁnding very tight bounds. This last method has been applied to SVM and similar kernel machines in the past [1], while, to the best knowledge of the authors, the probabilistic approach has never been used for this purpose. In this paper we apply this method to study the quantization effect in SVMs. The following section details the main result and in Section III some experimental results are reported, which compare the theoretical results to the actual effects of quantization noise on the SVM output. II. QUANTIZATION EFFECT ANALYSIS Let us consider a Support Vector Machine with Gaussian kernel y(x)= np  i=1 y i α i e −γx−x i  2 (1) where x =(x 1 ,...,x ni ) T is the input vector, n p is the number of patterns (or the number of support vectors if the solution is sparse), x i = ( x i 1 ,...,x i ni ) T and y i ∈ {−1, 1} are the i-th pattern and the i-th target, respectively. The parameters of the network are the Lagrange multipliers α i and γ is the kernel width. There are several variables that can be affected by the quan- tization of their values: the parameters α, the support vectors x i and the result of the exponential function computation. The value of γ is of no concern because it can be easily constrained to be a power of two, so that it can be represented exactly [1]. The support vectors are, in effect, a subset of the entire training set: therefore they can be already quantized before the learning phase, giving rise to an exact representation of their values. For the same reason, we do not address here the quantization of the exponential function. Note, in fact, that any kernel can be quantized before the learning phase of the