1 Tuning parameter selection for underdetermined Reduced-Rank Regression Magnus O. Ulfarsson, Member, IEEE, Victor Solo, Fellow, IEEE. Abstract—Multivariate regression is one of the most widely applied multivariate statistical methods with many uses across a range of disciplines. But the number of parameters increases exponentially with dimension and reduced-rank regression (RRR) is a well known approach to dimension reduction. But traditional RRR applies only to an overdetermined system. For increasingly common undetermined systems this issue can be managed by regularization e.g. with a quadratic penalty. A significant problem is then the choice of the two tuning parameters: one discrete i.e. the rank; the other continuous i.e. the Tikhonov penalty parameter. In this paper we resolve this problem via Stein’s unbiased risk estimator (SURE). We compare SURE to cross- validation and apply it on both simulated and real data sets. Index Terms—Reduced-rank regression, model selection, Stein’s unbiased risk estimation (SURE). I. I NTRODUCTION The multivariate regression model (MRM) [1] is given by Y = XB + ǫ where Y =[y T t ] is a known T × M y matrix containing T observations on M y response variables, X =[x T t ] is a known T × M x matrix of T observations on M x predictor variables, B is an unknown M x × M y regression coefficient matrix, and ǫ =[ǫ T t ] is a noise matrix where ǫ t ∼ N (0,σ 2 I My ). MRM has a long history in statistics [2], [1] and a frequency domain version has been developed [3]. MRM occurs widely in signal processing [4] often in the context of ill-conditioned inverse problems e.g. Magnetoencephalography (MEG) [5]. In ill-conditioned inverse or underdetermined problems some regularization is needed to get a stable solution. Tikhonov regularization [6] accomplishes this by imposing a quadratic penalty on the regression coefficients. LASSO [7] is an alternative method that uses an l 1 penalty to encourage a sparse solution. It can often yield a more interpretable solution than ridge regression. Another type of sparseness regularization is the l 0 penalty [8] that gives maximum spar- sity. However the resulting cost function is not convex. A compromise between LASSO and l 0 penalty is the l q penalty, 0 <q< 1 [9] which can, in some settings, perform better than other sparsity promoting methods. Currently there is much research interest in penalized least squares methods, both for developing extensions of the pre- viously mentioned methods and fast algorithms for solving them. Notable examples are [10], [11], [12]. M.O. Ulfarsson is with the Department of Electrical and Computer Engi- neering, University of Iceland, Reykjavik, 111 Iceland (e-mail: mou@hi.is). V. Solo is with the School of Electrical Engineering, University of New South Wales , Sydney, NSW 2052, Australia (e-mail: v.solo@unsw.edu.au). In many applications, the number of parameters in B can be large relative to the number of observed data points. Multivariate reduced-rank regression (RRR) [2], [1], [3], [13] deals with this issue by allowing the rank of B to be reduced, i.e. rank(B)= r< min(M x ,M y ). In that case the regression matrix B can be written as a product of an M x × r matrix G and an M y × r matrix F , i.e. B = GF T . Thus the RRR model is written as Y = XGF T + ǫ. (1) RRR has been successfully used in signal processing [14], for instance in frequency estimation [15], array processing [4]. Another way to deal with the reduced-rank is given in [16] where a penalty is imposed on the singular values of the ridge regression matrix. In [17] a sparse version of RRR was proposed where an l 1 penalty was imposed on the G and F matrix in the RRR formulation. Other relevant papers are [18], [19]. The classical RRR framework assumes the overdetermined scenario where the number of samples T exceed the number of variables M x . In [20] RRR was reformulated based on append- ing a Tikhonov regularization penalty to the RRR cost func- tion. We call such methods Tikhonov reduced-rank regression (TRRR). The paper [21] focused on the problem where there is collinearity among the predictor variables and introduced a method called ridge RRR (rRRR) which is identical to TRRR. It compared rRRR with other multivariate methods such as Ordinary Least Squares, Principal Component Regression [22], Curds and Whey [23], and Partial Least Squares [24]. It showed that the Reduced ridge rank regression outperformed these methods in terms of MSE for most settings. Additionally, [21] presented a kernel version of rRRR. A nonlinear kernel version of RRR was proposed in [25]. An important problem associated with any model is the selection of tuning parameters. A classical method that can be used for a great variety of models is cross-validation (CV) [26]. However for high dimensional models cross-validation is often impractical due to computational complexity. Two very common methods for tuning parameter selection are the Akaike information criterion (AIC) [27] and the Bayesian information criterion (BIC) [28] methods. These methods often perform well but in some settings e.g. small sample size their performance deteriorates. Various extensions of these methods have been proposed, especially in the random matrix theory regime T /M y → γ, when T → ∞ [29]. Stein’s Unbiased Risk Estimator (SURE) [30] has proven to be a powerful general purpose tool for tuning parameter selection in non-linear ill-conditioned inverse problems [31], [32], [33],