Journal of Computer Science 10 (11): 2232-2239, 2014 ISSN: 1549-3636 © 2014 A. Adel et al., This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/jcssp.2014.2232.2239 Published Online 10 (11) 2014 (http://www.thescipub.com/jcs.toc) Corresponding Author: Nazlia Omar, Knowledge Technology Group, Centre for AI Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia 2232 Science Publications JCS A COMPARATIVE STUDY OF COMBINED FEATURE SELECTION METHODS FOR ARABIC TEXT CLASSIFICATION Aisha Adel, Nazlia Omar and Adel Al-Shabi Knowledge Technology Group, Centre for AI Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia Received 2014-04-09; Revised 2014-10-15; Accepted 2014-11-11 ABSTRACT Text classification is a very important task due to the huge amount of electronic documents. One of the problems of text classification is the high dimensionality of feature space. Researchers proposed many algorithms to select related features from text. These algorithms have been studied extensively for English text, while studies for Arabic are still limited. This study introduces an investigation on the performance of five widely used feature selection methods namely Chi-square, Correlation, GSS Coefficient, Information Gain and Relief F. In addition, this study also introduces an approach of combination of feature selection methods based on the average weight of the features. The experiments are conducted using Naïve Bayes and Support Vector Machine classifiers to classify a published Arabic corpus. The results show that the best results were obtained when using Information Gain method. The results also show that the combination of multiple feature selection methods outperforms the best results obtain by the individual methods. Keywords: Feature Selection, Combination Method, Arabic Text Classification 1. INTRODUCTION With the rapid growth of the Internet, the volume of the news and information available on the web is growing exponentially. Since there has been an explosion of information available on the Internet, this makes the process of analyzing and processing them manually a very difficult task. As a consequence, text classification has gained importance in hierarchical organization of these documents. The fundamental goal of the text classification is to classify texts into appropriate classes. One of the problems of text classification is the huge number of features which reduce the performance of text classification and consume the time. Feature selection method is used to reduce the feature space by selecting the most relevant features (Maldonado and L’Huillier, 2013). Many feature selection methods have been proposed and investigated to improve the performance of English text classification. However, the work on feature selection for Arabic language are limited and most of studies in text classification for Arabic language are concerned with investigating the efficiency of text classification algorithms without enough attention to how the feature selection task can improve the accuracy of classification (Al-Salemi and Ab Aziz, 2010; Hawashin et al. 2013; Saad, 2011). Our motivation to do this research is to enhance the robustness of the finally selected feature subsets of the class and get rid of the noisy and redundant features because there is another subset which supplies the same information