WEC: Weighted Ensemble of Text Classifiers
Ashish Upadhyay, Tien Thanh Nguyen, Stewart Massie, John McCall
School of Computing Science and Digital Media
Robert Gordon University
Aberdeen, UK
{a.upadhyay, t.nguyen11, s.massie, j.mccall}@rgu.ac.uk
Abstract—Text classification is one of the most important
tasks in the field of Natural Language Processing. There are
many approaches that focus on two main aspects: generating an
effective representation; and selecting and refining algorithms
to build the classification model. Traditional machine learning
methods represent documents in vector space using features such
as term frequencies, which have limitations in handling the order
and semantics of words. Meanwhile, although achieving many
successes, deep learning classifiers require substantial resources
in terms of labelled data and computational complexity. In this
work, a weighted ensemble of classifiers (WEC) is introduced to
address the text classification problem. Instead of using majority
vote as the combining method, we propose to associate each
classifier’s prediction with a different weight when combining
classifiers. The optimal weights are obtained by minimising a
loss function on the training data with the Particle Swarm
Optimisation algorithm. We conducted experiments on 5 popular
datasets and report classification performance of algorithms with
classification accuracy and macro F1 score. WEC was run with
several different combinations of traditional machine learning
and deep learning classifiers to show its flexibility and robustness.
Experimental results confirm the advantage of WEC, especially
on smaller datasets.
Index Terms—Text Classification, Ensemble Method, Ensemble
of Classifiers, Multiple Classifiers, Particle Swarm Optimisation.
I. I NTRODUCTION
Text classification is one of the most popular tasks of
Natural Language Processing (NLP) which involves assign-
ing a sentence/document one category from a list of pre-
defined categories. There are various real-world applications
ranging from classifying a review’s sentiment to classifying
news/research articles into various topics in online libraries
[1]. In recent years, there has been much research on this
topic which mainly focuses on two aspects: representation of
the text; and choice of classifier to approximate the relations
between text representation and categories.
Before the wave of Deep Neural Networks (DNNs), statisti-
cal representations of text, such as n-gram and Bag-of-Words
based Term Frequency (TF) and Term Frequency-Inverse
Document Frequency (TF-IDF) features were frequently used.
In order to utilise these representations, machine learning
algorithms, such as Na¨ ıve Bayes (NB) and Support Vector
Machine (SVM) are frequently employed as the classification
model. Although these traditional classification models can
achieve good performance, they are only able to use the
presence of a word in the document and do not address the
order or semantics of the words which is a drawback of
traditional machine learning approaches.
Word Embedding algorithms such as word2vec and GloVe
[2], [3] introduced a new approach to NLP in which the
vectored representation of words can be used to capture the
semantics. The dense representation of a word in high dimen-
sional space tries to group the words with similar meaning in
the same cluster and increases the distance from the words
with dissimilar meaning. These word embeddings are fed
into different types of DNNs such as Convolutional Neural
Networks (CNNs) [4], [5], Long short-term memory (LSTM)
[6] and Recurrent Neural Networks (RNNs) [7] with various
backbone architectures to learn the relationship between the
representation and its associated class label in the training
data. Several advanced versions of Word Embeddings such
as ELMo [8] and BERT [9] capture the context of a word
in the sentence by using language models (LM) to generate
deep contextualised representations of full sentences using pre-
trained LMs with RNN or Transformer architectures.
Despite many successes with DNNs compared to the tradi-
tional algorithms on text classification problems, DNNs have
their own drawbacks in terms of high resource requirements
in relation to the amount of labelled data and computational
training time. In many real-world applications where acquiring
labelled data is an integral part of a process, employing deep
learning algorithms in the initial phase is not effective due to
the lack of labelled data. In this work, we propose a novel
weighted ensemble of different algorithms and statistical text
representations to classify text data. We construct an ensemble
of text classifiers in which each classifier is obtained by
training a different learning algorithm (traditional or deep
learning algorithms) on a specific representation (i.e. set of
features) extracted from the text sentence or document. The
selected classifiers are combined to obtain the final collab-
orated prediction. In the proposed combining method, each
classifier puts different weights on its predictions which reflect
its contribution to the collaborated prediction. We propose to
search for the optimal combining weights by minimising a 0-1
loss function on the training data.
The main contributions of our work are:
• introduction of W eighted E nsemble for Text
C lassification (WEC), a novel ensemble model based
on a weighted combining method to address the text
classification problem;
978-1-7281-6929-3/20/$31.00 ©2020 IEEE