http://www.iaeme.com/IJARET/index.asp 2871 editor@iaeme.com
International Journal of Advanced Research in Engineering and Technology (IJARET)
Volume 11, Issue 12, December 2020, pp. 2871-2881, Article ID: IJARET_11_12_268
Available online at http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=12
Journal Impact Factor (2020): 10.9475 (Calculated by GISI) www.jifactor.com
ISSN Print: 0976-6480 and ISSN Online: 0976-6499
DOI: 10.34218/IJARET.11.12.2020.268
© IAEME Publication Scopus Indexed
FEATURE EXTRACTION AND SELECTION
TECHNIQUES FOR TEXT CLASSIFICATION:
A SURVEY
Pramod Sunagar
Department of Computer Science & Engineering
Ramaiah Institute of Technology, Bangalore-560054 and affiliated to Visvesvaraya
Technological University, Belagavi, Karnataka, India
Anita Kanavalli
Department of Computer Science & Engineering
Ramaiah Institute of Technology, Bangalore-560054 and affiliated to Visvesvaraya
Technological University, Belagavi, Karnataka, India
Nanda Devi Shetty
Department of Computer Science & Engineering
Ramaiah Institute of Technology, Bangalore-560054 and affiliated to Visvesvaraya
Technological University, Belagavi, Karnataka, India
ABSTRACT
Feature selection and extraction are frequently used approaches to solve the
computational burden in problems with the classification of texts. An introduction of an
extraction method for each class that summarizes the characteristics of the sample
documents where the new features bring together information on the amount of proof
contained in a document. In order to construct the abstract features of a new feature
room with dimensions equal to the number of groups, the high dimensional properties
of documents are predicted. This paper is aimed at exploring how various methods of
feature extraction of text data are influenced by text classification tests. Two different
methods of extraction for Bag of Words are studied, specifically the approaches with
Count Vector and TF-IDF. An embedding method, called the GloVe extraction process,
is also investigated. A comparison of the effectiveness and improvements of classifiers
in standard text classification test sets is made. The findings show that the choice of the
extraction method has a substantial effect on the resulting classifications but that no
approach outperforms each other consistently. The findings instead indicate the best
output for the retrieval methods with GloVe and the best output with the Bag of Words
system for the precise measurements. While the main emphasis is on TF-IDF and word
embedding methods, various feature extraction methods have been discussed.