Combining SVMs with Various Feature Selection Strategies Yi-Wei Chen and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan Summary. This article investigates the performance of combining support vector machines (SVM) and various feature selection strategies. Some of them are ﬁlter- type approaches: general feature selection methods independent of SVM, and some are wrapper-type methods: modiﬁcations of SVM which can be used to select fea- tures. We apply these strategies while participating at NIPS 2003 Feature Selection Challenge and rank third as a group. 1 Introduction Support Vector Machine (SVM) (Boser et al. 1992; Cortes and Vapnik 1995) is an eﬀective classiﬁcation method, but it does not directly obtain the feature importance. In this article we combine SVM with various feature selection strategies and investigate their performance. Some of them are “ﬁlters”: gen- eral feature selection methods independent of SVM. That is, these methods select important features ﬁrst and then SVM is applied for classiﬁcation. On the other hand, some are wrapper-type methods: modiﬁcations of SVM which choose important features as well as conduct training/testing. We apply these strategies while participating at NIPS 2003 Feature Selection Challenge. Over- all we rank third as a group and are the winner of one data set. In NIPS 2003 Feature Selection Challenge, the main judging criterion is the balanced error rate (BER). Its deﬁnition is: BER ≡ 1 2 ( # positive instances predicted wrong # positive instances + # negative instances predicted wrong # negative instances ) . (1) For example, assume a test data set contains 90 positive and 10 negative instances. If all instances are predicted as positive, then BER is 50% since the ﬁrst term of (1) is 0/90 but the second is 10/10. There are other judg- ing criteria such as the number of features and probes, but throughout the competition we focus on how to get the smallest BER.