Intelligent Data Analysis 23 (2019) 241–253 241 DOI 10.3233/IDA-173720 IOS Press Gene selection for enhanced classification on microarray data using a weighted k-NN based algorithm Elías Ventura-Molina a , Antonio Alarcón-Paredes b , Mario Aldape-Pérez c , Cornelio Yáñez-Márquez a and Gustavo Adolfo Alonso b,* a Centro de Investigación en Computación, Instituto Politécnico Nacional. Av. Juan de Dios Bátiz, Esq. Miguel Othón de Mendizábal. Col. Nueva Industrial Vallejo, Gustavo A. Madero, 07738, Ciudad de México, México b Facultad de Ingeniería, Universidad Autónoma de Guerrero. Av. Lázaro Cárdenas s/n, Ciudad Universitaria Zona Sur, 39087. Chilpancingo Guerrero, México c Centro deInnovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, México. Av. Juan de Dios Bátiz, Col. Nueva Industrial Vallejo, 07700, Ciudad de México, México Abstract. Feature selection is a common solution to microarray analysis. Previous approaches either select features based on classical statistical tests that can be tuned up with a classifier, or using regularization penalties incorporated in the cost function. Here we propose to use a feature ranking and weighting scheme instead, which combines statistical techniques with a weighted k-NN classifier using a modified forward selection procedure. We demonstrate that classification accuracy of our proposal outperforms existing methods on a range of public microarray gene expression datasets. The proposed method is also compared to state-of-the-art feature selection algorithms by means of the Friedman test. Although a bunch of feature selection techniques has been used for genomic data, the experimental results show the classifica- tion superiority of our method on most of the present gene expression datasets. Keywords: Computational genomics, microarray data analysis, feature selection, feature ranking, feature weighting, k-nearest neighbors 1. Introduction The wide use of gene expression technologies, such as microarrays, permits to screen thousands of genes over multiple observations. In general, a microarray is a high-dimensional structure consisting of few samples (n) with thousands of genes (p). Gene expression information helps to monitor and measure relevant data to understand different biological information and facilitates the analysis in specific con- texts such as cancer diagnosis or the classification of different tumor types [7,18,21]. Due to the nature * Corresponding author: Gustavo Adolfo Alonso, Laboratory of Computing Technologies and Electronics, School of Engi- neering, Universidad Autónoma de Guerrero, Av. Lázaro Cárdenas s/n, Ciudad Universitaria Zona Sur, Chilpancingo, Guerrero 39087, México. Tel.: +52 1 747 112 2838; E-mail: gsilverio@uagro.mx. 1088-467X/19/$35.00 c 2019 – IOS Press and the authors. All rights reserved