American Journal of Intelligent Systems 2014, 4(4): 142-147
DOI: 10.5923/j.ajis.20140404.03
Finding Optimal Value for the Shrinkage Parameter in
Ridge Regression via Particle Swarm Optimization
Vedide Rezan Uslu
1
, Erol Egrioglu
2,*
, Eren Bas
3
1
Department of Statistics, University of Ondokuz Mayis, Samsun, 55139, Turkey
2
Department of Statistics, Marmara University, Istanbul, 34722, Turkey
3
Department of Statistics, Giresun University, Giresun, 28000, Turkey
Abstract A multiple regression model has got the standard assumptions. If the data can not satisfy these assumptions
some problems which have some serious undesired effects on the parameter estimates arise. One of the problems is called
multicollinearity which means that there is a nearly perfect linear relationship between explanatory variables used in a
multiple regression model. This undesirable problem is generally solved by using methods such as Ridge regression which
gives the biased parameter estimates. Ridge regression shrinks the ordinary least squares estimation vector of regression
coefficients towards origin, allowing with a bias but providing a smaller variance. However, the choice of shrinkage
parameter k in ridge regression is another serious issue. In this study, a new algorithm based on particle swarm optimization is
proposed to find optimal shrinkage parameter.
Keywords Ridge regression, Optimal shrinkage parameter, Particle swarm optimization
1. Introduction
Linear regression method is a classic statistical method.
Linear regression method has a lot of assumptions like other
statistical techniques. These assumptions are not realistic in
the real world application. These assumptions are checked
by statisticians. If they are not suitable for data, advanced
statistical techniques are applied to the data. Ridge
regression is a kind of advanced statistical technique. When
data has multicollinearity problem, ridge regression
technique can give a solution for data. In this study, a new
ridge regression method is introduced.
Consider a linear multiple
X Y (1)
where Y is the 1 n vector of observations of the
dependent variable, X is the ' p n matrix of
observations of explanatory variables with full rank p,
is the 1 ' p vector of unknown parameters and is
the 1 n vector of random error, where 1 ' p p and
p shows the number of explanatory variables in the model. It
is assumed that each random error has zero mean and a
constant variance
2
and that they are uncorrelated.
* Corresponding author:
erole@omu.edu.tr (Erol Egrioglu)
Published online at http://journal.sapub.org/ajis
Copyright © 2014 Scientific & Academic Publishing. All Rights Reserved
Moreover it is assumed that the columns of X should not be
in a linear dependency of each other.
Let us denote the columns of X as
p
X X X , , ,
2 1
. If
there is a relationship
1
0
p
j j
j
tX
(2)
for a set of numbers such as
p
t t t , , ,
2 1
, not all zero, the
relation is called the multicollinearity problem in multiple
regression analysis.
The presence of multicollinearity has a number of
potential serious effects on the ordinary least squares
estimates of the unknown parameters. The most serious one
is that it results in the large variances and covariance of the
least squares estimates of the regression coefficients.
Therefore it implies that different samples taken at the same
level of X’s could lead completely different estimates of the
model parameters.
Multicollinearity can also cause to produce least squares
estimates of ’s which are too large in absolute value.
When the columns of X matrix are centered and scaled the
matrix X X ' becomes the correlation matrix of the
explanatory variables and Y X ' is the vector of the
correlation coefficients of the dependent variable with each
explanatory variable. If the columns X are orthogonal,
X X ' matrix is a unit matrix. In the presence of
multicollinearity X X ' becomes ill-conditioned which
means that it is nearly singular and the determinant of it is