Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems Feng Yang, K.Z. Mao, Gary Kee Khoon Lee, and Wenyin Tang Abstract—Although mostly used for pattern classification, linear discriminant analysis (LDA) can also be used in feature selection as an effective measure to evaluate the separative ability of a feature subset. When applied to feature selection on high-dimensional small-sized (HDSS) data (generally) with class-imbalance, LDA encounters four problems, including singularity of scatter matrix, overfitting, overwhelming and prohibitively computational complexity. In this study, we propose the LDA-based feature selection method minority class emphasized linear discriminant analysis (MCE-LDA) with a new regularization technique to address the first three problems. Different to giving equal or more emphasis to majority class in conventional forms of regularization, the proposed regularization emphasizes more on minority class, with the expectation of improving overall performance by alleviating overwhelming of majority class to minority class as well as overfitting in minority class. In order to reduce computational overhead, an incremental implementation of LDA-based feature selection has been introduced. Comparative studies with other forms of regularization to LDA as well as with other popular feature selection methods on five HDSS problems show that MCE-LDA can produce feature subsets with excellent performance in both classification and robustness. Further experimental results of true positive rate (TPR) and true negative rate (TNR) have also verified the effectiveness of the proposed technique in alleviating overwhelming and overfitting problems. Index Terms—Feature subset selection, regularized linear discriminant analysis, class emphasis, classification, robustness Ç 1 INTRODUCTION I N pattern recognition to the high-dimensional small-sized (HDSS) problems, i.e., much more variables (features) than independent observations (samples), feature selection generally serves a crucial step before constructing a classifi- cation model due to the so-called issue of curse of dimension- ality [1]. In the past a few years, feature selection on HDSS data (e.g., gene microarray data which are typically HDSS) has received considerable attention, with goals of identifying target-related features or building compact models for pat- tern classification [2], [3], [4], [5], [6]. For the first goal, rank- ing-based feature selection (or feature ranking) methods, which generally evaluate features on an individual basis, are often used. For the second goal, set-based feature selection (or feature subset selection) methods should be used, due to their consideration of feature interactions which is also an inherent characteristic of a pattern classification model. A typical set-based feature selection algorithm runs recursively imbedding two major components in its recur- sive procedure. The first component is a candidate feature subsets searching or generating strategy and the second is an evaluation criterion that measures the goodness of candi- date feature subsets generated. In the first component, the sequential forward searching (SFS) and sequential backward searching (SBS) are usually employed. In the second compo- nent, two types of evaluation criteria including classifier- dependent and classifier-independent measures are gener- ally utilized. Since set-based feature selection often aims to select a feature subset to construct a pattern classifier with good classification performance, measures that reflect classi- fication performance are often used as feature evaluation cri- teria. For example, counting-based error estimations such as leave-one-out error and k-fold cross validation error have been used for feature selection [7], [8], [9]. But the study in [10] found that leave-one-out error estimation and k-fold cross validation error estimation may not necessarily be ideal for feature subset selection on HDSS data. This is because the counting-based error estimators are discrete functions in nature, and more than one feature subsets might lead to the same classification error, which will become even serious in situation of high dimensionality and small sample size, and this in turn results in great selection uncertainty. This so-called ties problem can be avoided by using continu- ous functions as evaluation criteria, such as Bayesian error estimation [11], [12], [13], [14], [15], loss functions of regres- sion [16], [17], [18], [19], [20] and support vector machine (SVM)-based criteria [21], [22], [23], [24], [25], etc. Linear discriminant analysis (LDA) is mostly used as a classifier for pattern classification, however, it can also be used as an effective measure to evaluate the separative abil- ity of a feature subset for feature subset selection. Instead of using the classification error of LDA classifier as evaluation criterion, LDA-based feature selection evaluates features based on the ratio of between-class difference to within-class scatter of projected samples on the dimension that maxi- mizes the ratio. LDA projection involves inverse operation of the scatter matrix. For HDSS data, however, the scatter F. Yang and G.K.K. Lee are with the Department of Computing Science, Institute of High Performance Computing, Agency for Science, Technology and Research (A STAR), Singapore. E-mail: {yangf, leekk}@ihpc.a-star.edu.sg. K.Z. Mao and W. Tang are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. E-mail: {ekzmao, wenyin}@ntu.edu.sg. Manuscript received 11 Jul. 2013; revised 15 Jan. 2014; accepted 3 Apr. 2014. Date of publication 28 Apr. 2014; date of current version 1 Dec. 2014. Recommended for acceptance by K. Chang. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TKDE.2014.2320732 88 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015 1041-4347 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.