Robust Feature Selection by Weighted Fisher Criterion for Multiclass Prediction in Gene Expression Profiling Jianhua Xuan 1 , Yibin Dong 1 , Javed Khan 2 , Eric Hoffman 3 , Robert Clarke 4 , Yue Wang 5 1 Department of EECS, The Catholic University of America, Washington, DC 20064 2 Pediatric Oncology Branch, National Cancer Institute, Gaithersburg, MD 20877 3 Research Center for Genetic Medicine, Children’s National Medical Center, Washington, DC 20010 4 Lombardi Cancer Center, Georgetown University, Washington, DC 20007 5 Department of ECE, Virginia Polytechnic Institute and State University, Alexandria, VA 22314 Abstract This paper presents a robust feature selection approach for multiclass prediction with application to microarray studies. First, individually discriminatory genes (IDGs) are identified by using weighted Fisher Criterion (wFC). Second, jointly discriminatory genes (JDGs) are selected by a sequential search method, according to their joint class separability. To combat the small size effect on feature selection, leave-one-out procedures are incorporated into both IDG and JDG selection steps to improve the robustness of the approach. By applying this approach to a microarray study of small round blue cell tumors (SRBCTs) of childhood, we have demonstrated that our robust feature selection method can be used to successfully identify a subset of genes with superior classification performance for multiclass prediction. 1. Introduction Molecular analysis of clinical heterogeneity in cancer treatment has been difficult in part because it has historically relied on specific biological insights or largely focused on particular genes with known functions, rather than systematic and unbiased approaches for recognizing tumor subtypes and associated biomarkers [1]. The recent development of gene microarrays provides an opportunity to take a genome-wide approach to predict therapy outcome. By surveying mRNA expression levels for thousands of genes in a single experiment, it is now possible to read the molecular signature of an individual patient’s tumor. When the signature is analyzed with computer algorithms, new classes of cancer emerge that transcend distinctions based on histological appearance alone, and new insights into disease mechanisms and diagnostic or therapeutic targets emerge that move beyond correlation/classification/prediction [2]. For example, recent studies demonstrate that such global gene expression profiling of human tumors can provide molecular phenotyping that distinct tumor subtypes are not evident by traditional histopathological methods [1, 2]. Although such global views are likely to reveal previously unrecognized patterns of gene regulation and generate new hypotheses warranting further study, widespread use of microarray profiling methods is limited by the need for further technology developments, particularly computational bioinformatics tools (e.g., statistical pattern recognition methods) not previously included by the instruments. Although statistical pattern recognition has been used successfully in many applications (see an excellent review by A. K. Jain [3]), microarray profiling data challenges the field greatly in every technical aspect, to name just a few, feature selection, clustering, and classification. The main challenge is the so-called “curse of dimensionality”, when the sample size (typically, 10-100 in a microarray study) is much less than the number of features (up to 10,000 genes). The curse of dimensionality leads to a notorious “peaking phenomenon” - adding more features will degrade the performance of a classifier [3]. All of commonly used classifiers, including artificial neural networks (ANN), can suffer from the curse of dimensionality. An important study of small sample size effect in statistical pattern recognition has pointed out a set of valuable recommendations for classifier design and performance evaluation [4]. To alleviate the curse of dimensionality, feature selection has been called for to reduce dimensionality, i.e., to keep the number of features as small as possible [3, 5]. The problem of feature selection can be defined as to select a subset of features that leads to the smallest classification error. Feature selection procedure may be done through an exhaustive search in which all possible subsets of fixed size is examined to select a subset with the smallest classification error. Unfortunately, this selection process is computationally prohibitive [3, 5]. The mainstream of research on feature selection has thus been directed toward sequential suboptimal search methods [6]. Practically, we can let the values of the control parameters in the search algorithm “float” so as to best approximate the optimal solutions, namely the Sequential Floating Search strategy. SFFS (sequential forward floating selection) and SBFS (sequential backward floating selection) have been demonstrated in many comprehensive studies to perform almost as well as 0-7695-2128-2/04 $20.00 (C) 2004 IEEE