Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1 , Subhashis Ghosal 1* and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department of Mathematics, University of Arizona, Tucson, AZ Received 15 October 2013; revised 1 November 2013; accepted 2 November 2013 DOI:10.1002/sam.11212 Published online in Wiley Online Library (wileyonlinelibrary.com). Abstract: High dimensional data are nowadays encountered in various branches of science. Variable selection techniques play a key role in analyzing high dimensional data. Generally two approaches for variable selection in the high dimensional data setting are considered—forward selection methods and penalization methods. In the former, variables are introduced in the model one at a time depending on their ability to explain variation and the procedure is terminated at some stage following some stopping rule. In penalization techniques such as the least absolute selection and shrinkage operator (LASSO), as optimization procedure is carried out with an added carefully chosen penalty function, so that the solutions have a sparse structure. Recently, the idea of penalized forward selection has been introduced. The motivation comes from the fact that the penalization techniques like the LASSO give rise to closed form expressions when used in one dimension, just like the least square estimator. Hence one can repeat such a procedure in a forward selection setting until it converges. The resulting procedure selects sparser models than comparable methods without compromising on predictive power. However, when the regressor is high dimensional, it is typical that many predictors are highly correlated. We show that in such situations, it is possible to improve stability and computational efficiency of the procedure further by introducing an orthogonalization step. At each selection step, variables potentially available to be selected in the model are screened on the basis of their correlation with variables already in the model, thus preventing unnecessary duplication. The new strategy, called the Selection Technique in Orthogonalized Regression Models (STORM), turns out to be extremely successful in reducing the model dimension further and also leads to improved predicting power. We also consider an aggressive version of the STORM, where a potential predictor will be permanently removed from further consideration if its regression coefficient is estimated as zero at any stage. We shall carry out a detailed simulation study to compare the newly proposed method with existing ones and analyze a real dataset. 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 6: 557–564, 2013 Keywords: forward selection; orthogonalization; high dimensional regression; LASSO 1. INTRODUCTION In modern applications of statistics and data mining, linear regression models with extremely high dimensional regressors are commonly encountered. Typically the dimen- sion of the regressor variable far exceeds the available sample size, posing serious challenges in the analysis of such data. In particular, the data matrix becomes singular and the least squares estimator is not uniquely defined. Usu- ally, the majority of the regressor variables are not relevant, leading to a sparse structure in the model. However, it is not known beforehand which variables are actually relevant for the response variable. This problem is addressed through a variable selection step, which screens the variables before they can enter in the model. The variable selection step Correspondence to: Subhashis Ghosal (sghosal@stat.ncsu.edu) actually allows a fairly accurate estimation of the regres- sion function in such high dimensional low sample size (HDLSS) situations. Variable selection has many other ben- efits, such as the ability to work with a sparse model, which has much better interpretability compared with a regression model having a lot of predictors. Variable selection methods mainly fall in two categories. The first one is a recursive selection method such as forward selection, backward selection, and stepwise selection. In a forward selection procedure, variables are added one by one to build up the model, and a stopping rule is used, based on some criterion such as the mean squared error (MSE), adjusted R 2 , Mallow’s C p -metric, prediction sum of squares (PRESS), Akaike information criterion (AIC), or the Bayesian information criterion (BIC); see ref [1] for their definitions. This strategy often runs into problems in HDLSS situations. For high dimensional data, uhlmann [2] and Wang [3] studied forward regression 2013 Wiley Periodicals, Inc.