WEC: Weighted Ensemble of Text Classifiers Ashish Upadhyay, Tien Thanh Nguyen, Stewart Massie, John McCall School of Computing Science and Digital Media Robert Gordon University Aberdeen, UK {a.upadhyay, t.nguyen11, s.massie, j.mccall}@rgu.ac.uk Abstract—Text classification is one of the most important tasks in the field of Natural Language Processing. There are many approaches that focus on two main aspects: generating an effective representation; and selecting and refining algorithms to build the classification model. Traditional machine learning methods represent documents in vector space using features such as term frequencies, which have limitations in handling the order and semantics of words. Meanwhile, although achieving many successes, deep learning classifiers require substantial resources in terms of labelled data and computational complexity. In this work, a weighted ensemble of classifiers (WEC) is introduced to address the text classification problem. Instead of using majority vote as the combining method, we propose to associate each classifier’s prediction with a different weight when combining classifiers. The optimal weights are obtained by minimising a loss function on the training data with the Particle Swarm Optimisation algorithm. We conducted experiments on 5 popular datasets and report classification performance of algorithms with classification accuracy and macro F1 score. WEC was run with several different combinations of traditional machine learning and deep learning classifiers to show its flexibility and robustness. Experimental results confirm the advantage of WEC, especially on smaller datasets. Index Terms—Text Classification, Ensemble Method, Ensemble of Classifiers, Multiple Classifiers, Particle Swarm Optimisation. I. I NTRODUCTION Text classification is one of the most popular tasks of Natural Language Processing (NLP) which involves assign- ing a sentence/document one category from a list of pre- defined categories. There are various real-world applications ranging from classifying a review’s sentiment to classifying news/research articles into various topics in online libraries [1]. In recent years, there has been much research on this topic which mainly focuses on two aspects: representation of the text; and choice of classifier to approximate the relations between text representation and categories. Before the wave of Deep Neural Networks (DNNs), statisti- cal representations of text, such as n-gram and Bag-of-Words based Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) features were frequently used. In order to utilise these representations, machine learning algorithms, such as Na¨ ıve Bayes (NB) and Support Vector Machine (SVM) are frequently employed as the classification model. Although these traditional classification models can achieve good performance, they are only able to use the presence of a word in the document and do not address the order or semantics of the words which is a drawback of traditional machine learning approaches. Word Embedding algorithms such as word2vec and GloVe [2], [3] introduced a new approach to NLP in which the vectored representation of words can be used to capture the semantics. The dense representation of a word in high dimen- sional space tries to group the words with similar meaning in the same cluster and increases the distance from the words with dissimilar meaning. These word embeddings are fed into different types of DNNs such as Convolutional Neural Networks (CNNs) [4], [5], Long short-term memory (LSTM) [6] and Recurrent Neural Networks (RNNs) [7] with various backbone architectures to learn the relationship between the representation and its associated class label in the training data. Several advanced versions of Word Embeddings such as ELMo [8] and BERT [9] capture the context of a word in the sentence by using language models (LM) to generate deep contextualised representations of full sentences using pre- trained LMs with RNN or Transformer architectures. Despite many successes with DNNs compared to the tradi- tional algorithms on text classification problems, DNNs have their own drawbacks in terms of high resource requirements in relation to the amount of labelled data and computational training time. In many real-world applications where acquiring labelled data is an integral part of a process, employing deep learning algorithms in the initial phase is not effective due to the lack of labelled data. In this work, we propose a novel weighted ensemble of different algorithms and statistical text representations to classify text data. We construct an ensemble of text classifiers in which each classifier is obtained by training a different learning algorithm (traditional or deep learning algorithms) on a specific representation (i.e. set of features) extracted from the text sentence or document. The selected classifiers are combined to obtain the final collab- orated prediction. In the proposed combining method, each classifier puts different weights on its predictions which reflect its contribution to the collaborated prediction. We propose to search for the optimal combining weights by minimising a 0-1 loss function on the training data. The main contributions of our work are: introduction of W eighted E nsemble for Text C lassification (WEC), a novel ensemble model based on a weighted combining method to address the text classification problem; 978-1-7281-6929-3/20/$31.00 ©2020 IEEE