International Journal of Foundations of Computer Science Vol. 26, No. 3 (2015) 321–341 c World Scientific Publishing Company DOI: 10.1142/S0129054115500185 Novel Randomized Feature Selection Algorithms Subrata Saha * and Sanguthevar Rajasekaran Department of Computer Science and Engineering University of Connecticut, Storrs, CT, USA * subrata.saha@engr.uconn.edu rajasek@engr.uconn.edu Rampi Ramprasad Department of Materials Science and Engineering University of Connecticut, Storrs, CT, USA rampi@ims.uconn.edu Received 11 June 2014 Accepted 3 November 2014 Communicated by Sartaj Sahni Feature selection is the problem of identifying a subset of the most relevant features in the context of model construction. This problem has been well studied and plays a vital role in machine learning. In this paper we present three randomized algorithms for feature selection. They are generic in nature and can be applied for any learning algorithm. Proposed algorithms can be thought of as a random walk in the space of all possible subsets of the features. We demonstrate the generality of our approaches using three different applications. The simulation results show that our feature selection algorithms outperforms some of the best known algorithms existing in the current literature. Keywords : Feature selection (FS); machine learning; data integrator (DI); gene selection algorithm (GSA); kernel ridge regression (KRR); sequential forward search (SFS). 1. Introduction Feature Selection is defined as the process of selecting a subset of the most relevant features from a set of features. It involves discarding irrelevant, redundant and noisy features. Feature selection is also known as variable selection, attribute selection or variable subset selection in the fields of machine learning and statistics. The concept of feature selection is different from feature extraction. Feature extraction creates new features from the set of original features by employing a variety of methods such as linear combinations of features, projection of features from the original space into a transformed space, etc. We can summarize the usefulness of feature selection as follows: (1) Shorter training times: When irrelevant and redundant features are eliminated, the learning time decreases; (2) Improved model creation: The model built is more accurate and efficient; and (3) Enhanced generalization: It produces simpler and more generalized models. 321