Linear Parametric Noise Models for Least Squares Support Vector Machines Tillmann Falck, Johan A.K. Suykens and Bart De Moor Abstract—In the identification of nonlinear dynamical models it may happen that not only the system dynamics have to be modeled but also the noise has a dynamic character. We show how to adapt Least Squares Support Vector Machines (LS- SVMs) to take advantage of a known or unknown noise model. We furthermore investigate a convex approximation based on overparametrization to estimate a linear auto regressive noise model jointly with a model for the nonlinear system. Consider- ing a noise model can improve generalization performance. We discuss several properties of the proposed scheme on synthetic data sets and finally demonstrate its applicability on real world data. I. I NTRODUCTION The objective in system identification [1] of nonlinear systems [2], [3] is to estimate a model for a dynamical system from observational data. In linear as well as in nonlinear systems, model structures are of particular interest as they are crucial for the flexibility of the model to explain data. In nonlinear systems NARX and NFIR structures are most used as the corresponding estimation problems are linear in the parameters. Then the estimation is convex, if a conex objective is used. Generalizations of more advanced model structures like ARMAX or Box-Jenkins (BJ) to nonlinear systems exist but even in a linear setting the identification is a non convex problem. In this paper we consider NARX models extended by a linear ARMA model for the noise. This structure is depicted in Figure 1. We will denote this hybrid structure as ARMA-NARX. Note that in a NARMAX model the estimated noise is used as an additional input to the nonlinear system and thus can have nonlinear dynamics. The ARMA-NARX model is simply tailored towards colored noise instead of assuming a white spectrum as in NARX models. We consider two cases: In the first case we assume that the noise model is known. This information can be easily integrated into the estimation problem and can improve the perfor- mance of the resulting model. This approach has already been explored in [4]. In [4] the noise model is tuned as hyperparameters of the nonlinear model, if it is not known a priori. In this part, we restrict ourselves to generalize the results from AR to ARMA models. The second case jointly estimates an AR noise model and the NARX part. This is a nonconvex, nonlinear problem. The main contribution of this paper is to Tillmann Falck, Johan Suykens and Bart De Moor are with the SCD group of the Department of Electrical Engineering (ESAT), Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven (Heverlee), Belgium. Email: {tillmann.falck,johan.suykens,bart.demoor}@esat.kuleuven.be u t NARX ˆ y t + H(z) + e t r t y t z 1 y t1 Fig. 1: Block diagram of a nonlinear model (Du =0, Dy =1) consisting of a NARX part and a linear noise model. Here denoted as ARMA-NARX. propose a convex relaxation to this problem. This com- plements [4] with an effective way to obtain estimates for unknown noise models. The relaxation is based on the overparametrization technique [11], [12]. It was introduced for a special class of structured nonlinear systems called Hammerstein systems. The idea is to relax non-convex bilinear products by replacing them with new independent variables. This leads to a convex formu- lation and in the context of identification of Hammerstein systems using LS-SVMs has been successfully applied in [13], [14]. To model the nonlinear system we employ Least Squares Support Vector Machines (LS-SVMs) which is based on the methodology of Support Vector Machines (SVMs) [5], [6]. Both belong to the class of kernel based models, which also includes e.g. Splines [7] and Gaussian Processes [8]. In LS-SVM the inequality constraints of SVMs are replaced by equality constraints and the L 1 -loss on the residuals by the sum of squares. For regression problems this has the advantage that it can be solved by a linear system instead of a QP. Disadvantages of this scheme are the non-sparse solution and no inherent robustness. Especially for large scale data sets sparsity can be obtained by approximating the feature map on a subsample and then solving the primal problem. This is called Fixed-Size LS-SVM [9]. If needed robustness can be achieved by reweighting the residuals [10]. This paper is structured as follows. In Section II we show how to integrate a known ARMA noise model with a LS- SVM based nonlinear model. The joint convex estimation of an AR(P) noise model with the nonlinear model based on overparametrization is covered in Section III. Experimental results on synthetic as well as real data illustrating the pro-