http://www.iaeme.com/IJARET/index.asp 2871 editor@iaeme.com International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 11, Issue 12, December 2020, pp. 2871-2881, Article ID: IJARET_11_12_268 Available online at http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=12 Journal Impact Factor (2020): 10.9475 (Calculated by GISI) www.jifactor.com ISSN Print: 0976-6480 and ISSN Online: 0976-6499 DOI: 10.34218/IJARET.11.12.2020.268 © IAEME Publication Scopus Indexed FEATURE EXTRACTION AND SELECTION TECHNIQUES FOR TEXT CLASSIFICATION: A SURVEY Pramod Sunagar Department of Computer Science & Engineering Ramaiah Institute of Technology, Bangalore-560054 and affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India Anita Kanavalli Department of Computer Science & Engineering Ramaiah Institute of Technology, Bangalore-560054 and affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India Nanda Devi Shetty Department of Computer Science & Engineering Ramaiah Institute of Technology, Bangalore-560054 and affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India ABSTRACT Feature selection and extraction are frequently used approaches to solve the computational burden in problems with the classification of texts. An introduction of an extraction method for each class that summarizes the characteristics of the sample documents where the new features bring together information on the amount of proof contained in a document. In order to construct the abstract features of a new feature room with dimensions equal to the number of groups, the high dimensional properties of documents are predicted. This paper is aimed at exploring how various methods of feature extraction of text data are influenced by text classification tests. Two different methods of extraction for Bag of Words are studied, specifically the approaches with Count Vector and TF-IDF. An embedding method, called the GloVe extraction process, is also investigated. A comparison of the effectiveness and improvements of classifiers in standard text classification test sets is made. The findings show that the choice of the extraction method has a substantial effect on the resulting classifications but that no approach outperforms each other consistently. The findings instead indicate the best output for the retrieval methods with GloVe and the best output with the Bag of Words system for the precise measurements. While the main emphasis is on TF-IDF and word embedding methods, various feature extraction methods have been discussed.