Statistics and Its Interface Volume 13 (2020) 151–156 Bi-level variable selection in high dimensional Tobit models Hailin Huang, Jizi Shangguan, Yuanzhang Li, and Hua Liang To study variable selection for high dimensional Tobit models, we formulate Tobit models to single-index mod- els. We hybrid group variable selection procedures for sin- gle index models and univariate regression methods for To- bit models to achieve variable selection for Tobit models with group structures taken into consideration. The proce- dure is computationally efficient and easily implemented. Finite sample experiments show its promising performance. We also illustrate its utility by analyzing a dataset from an HIV/AIDS study. Keywords and phrases: Group structure, Group Lasso, Single-index models, Tobit models. 1. INTRODUCTION With advances in high throughput technologies, many medical studies are complemented with information about biomarkers of each patient. Identification of important biomarkers can lead to better understanding of the mecha- nism behind disease development, and thus facilitates fur- ther clinical diagnosis and prognosis activities. Sometimes the number of biomarkers may be larger than the num- ber of observations, which raises high dimensional prob- lems and brings challenges in data analysis. Even more, we may face the situations that the response is fixed censored due to detection limit (Haab, Dunham and Brown, 2001; Van der Pouw Kraan et al., 1995). For instance, in measur- ing vial load of HIV/ADIS studies, the half maximal in- hibitory concentration (IC50) values in blood serum can not be measured when they are below the detected limita- tion. Conventional Tobit models (Tobin, 1958) for fixed cen- sored responses and associated estimation methods cannot be directly applied. In addition, among the tons of biomark- ers/covariates investigated, maybe only a few are associated with the response variable of interest. Thus, variable selec- tion or dimension reduction is always recommended along with the estimation procedure. To explore the relationship between a fixed censored response variable and a set of high The authors thank a referee for valuable suggestions and comments. Liang’s research was partially supported by NSF grant DMS-1620898. Corresponding author. dimensional covariates, we propose a new method to ac- commodate high dimensional data with fixed censored re- sponses. Among the many variable selection techniques devel- oped, penalized selection methods have attracted extensive attentions. Penalization methods put penalties on the re- gression coefficients, which reduces model complexity and can lead to better model fitting. In the literature, some of the most popular work on penalization methods in- cludes Lasso (Tibshirani, 1996), MCP (Zhang, 2010), and SCAD (Fan and Li, 2001). These methods and their vari- ants have also been widely used in high dimensional data analysis (Fan and Li, 2002; Huang, Breheny and Ma, 2012; Gui and Li, 2005). The above methods tackle variable selec- tion problems at individual covariate levels. However, some prior knowledge may introduce group structures as well. For example, in biomarker analysis, biomarkers belonging to the same functional group may perform similarly. En- lightened by this, it may be more desirable to take into ac- count the grouping structure in the variable selection proce- dure. For this purpose, researchers proposed group variable selection methods and bi-level variable selection methods when the covariates could be grouped, where the former type of methods focuses on selecting important groups, and the latter type of methods targets at selecting important groups as well as identifying important members within the groups (Breheny and Huang, 2009). Some representative ex- amples for these two types of methods include group Lasso, group SCAD, group bridge Lasso, and group exponen- tial Lasso (Yuan and Lin, 2006; Huang, Breheny and Ma, 2012; Wang, Chen and Li, 2007; Breheny and Huang, 2009; Breheny, 2015; Huang et al., 2009). Although variable selec- tion methods that takes group structures into consideration have been extensively studied in various parametric, semi- parametric and nonparametric models, the efforts for the Tobit models are still needed. Motivated by the group Lasso method of Yuan and Lin (2006), Liu, Wang and Wu (2013) propose a group Lasso for Tobit models. But their method can only work for low dimensional Tobit models and can not separate noisy and significant covariates within a group, i.e. fails to perform bi-level group variable selection. As far as we know, there no methods that perform bi-level group variable selection for high dimensional Tobit models. In this article, we propose a bi-level variable selection method for high dimensional Tobit (shorten as BHTobit)