WEC: Weighted Ensemble of Text Classiﬁers Ashish Upadhyay, Tien Thanh Nguyen, Stewart Massie, John McCall School of Computing Science and Digital Media Robert Gordon University Aberdeen, UK {a.upadhyay, t.nguyen11, s.massie, j.mccall}@rgu.ac.uk Abstract—Text classiﬁcation is one of the most important tasks in the ﬁeld of Natural Language Processing. There are many approaches that focus on two main aspects: generating an effective representation; and selecting and reﬁning algorithms to build the classiﬁcation model. Traditional machine learning methods represent documents in vector space using features such as term frequencies, which have limitations in handling the order and semantics of words. Meanwhile, although achieving many successes, deep learning classiﬁers require substantial resources in terms of labelled data and computational complexity. In this work, a weighted ensemble of classiﬁers (WEC) is introduced to address the text classiﬁcation problem. Instead of using majority vote as the combining method, we propose to associate each classiﬁer’s prediction with a different weight when combining classiﬁers. The optimal weights are obtained by minimising a loss function on the training data with the Particle Swarm Optimisation algorithm. We conducted experiments on 5 popular datasets and report classiﬁcation performance of algorithms with classiﬁcation accuracy and macro F1 score. WEC was run with several different combinations of traditional machine learning and deep learning classiﬁers to show its ﬂexibility and robustness. Experimental results conﬁrm the advantage of WEC, especially on smaller datasets. Index Terms—Text Classiﬁcation, Ensemble Method, Ensemble of Classiﬁers, Multiple Classiﬁers, Particle Swarm Optimisation. I. I NTRODUCTION Text classiﬁcation is one of the most popular tasks of Natural Language Processing (NLP) which involves assign- ing a sentence/document one category from a list of pre- deﬁned categories. There are various real-world applications ranging from classifying a review’s sentiment to classifying news/research articles into various topics in online libraries [1]. In recent years, there has been much research on this topic which mainly focuses on two aspects: representation of the text; and choice of classiﬁer to approximate the relations between text representation and categories. Before the wave of Deep Neural Networks (DNNs), statisti- cal representations of text, such as n-gram and Bag-of-Words based Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) features were frequently used. In order to utilise these representations, machine learning algorithms, such as Na¨ ıve Bayes (NB) and Support Vector Machine (SVM) are frequently employed as the classiﬁcation model. Although these traditional classiﬁcation models can achieve good performance, they are only able to use the presence of a word in the document and do not address the order or semantics of the words which is a drawback of traditional machine learning approaches. Word Embedding algorithms such as word2vec and GloVe [2], [3] introduced a new approach to NLP in which the vectored representation of words can be used to capture the semantics. The dense representation of a word in high dimen- sional space tries to group the words with similar meaning in the same cluster and increases the distance from the words with dissimilar meaning. These word embeddings are fed into different types of DNNs such as Convolutional Neural Networks (CNNs) [4], [5], Long short-term memory (LSTM) [6] and Recurrent Neural Networks (RNNs) [7] with various backbone architectures to learn the relationship between the representation and its associated class label in the training data. Several advanced versions of Word Embeddings such as ELMo [8] and BERT [9] capture the context of a word in the sentence by using language models (LM) to generate deep contextualised representations of full sentences using pre- trained LMs with RNN or Transformer architectures. Despite many successes with DNNs compared to the tradi- tional algorithms on text classiﬁcation problems, DNNs have their own drawbacks in terms of high resource requirements in relation to the amount of labelled data and computational training time. In many real-world applications where acquiring labelled data is an integral part of a process, employing deep learning algorithms in the initial phase is not effective due to the lack of labelled data. In this work, we propose a novel weighted ensemble of different algorithms and statistical text representations to classify text data. We construct an ensemble of text classiﬁers in which each classiﬁer is obtained by training a different learning algorithm (traditional or deep learning algorithms) on a speciﬁc representation (i.e. set of features) extracted from the text sentence or document. The selected classiﬁers are combined to obtain the ﬁnal collab- orated prediction. In the proposed combining method, each classiﬁer puts different weights on its predictions which reﬂect its contribution to the collaborated prediction. We propose to search for the optimal combining weights by minimising a 0-1 loss function on the training data. The main contributions of our work are: • introduction of W eighted E nsemble for Text C lassiﬁcation (WEC), a novel ensemble model based on a weighted combining method to address the text classiﬁcation problem; 978-1-7281-6929-3/20/$31.00 ©2020 IEEE