information
Article
Text Mining and Sentiment Analysis of Newspaper Headlines
Arafat Hossain
1
, Md. Karimuzzaman
1
, Md. Moyazzem Hossain
1,
* and Azizur Rahman
2,
*
Citation: Hossain, A.;
Karimuzzaman, M.; Hossain, M.M.;
Rahman, A. Text Mining and
Sentiment Analysis of Newspaper
Headlines. Information 2021, 12, 414.
https://doi.org/10.3390/info12100414
Academic Editor: Byung-Won On
Received: 1 April 2021
Accepted: 12 August 2021
Published: 9 October 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Department of Statistics, Jahangirnagar University, Savar, Dhaka 1342, Bangladesh;
arafathossen239@gmail.com (A.H.); karimuzzaman.statju@gmail.com (M.K.)
2
School of Computing, Mathematics and Engineering, Charles Sturt University,
Wagga Wagga, NSW 2678, Australia
* Correspondence: hossainmm@juniv.edu (M.M.H.); azrahman@csu.edu.au (A.R.)
Abstract: Text analytics are well-known in the modern era for extracting information and patterns
from text. However, no study has attempted to illustrate the pattern and priorities of newspaper
headlines in Bangladesh using a combination of text analytics techniques. The purpose of this paper
is to examine the pattern of words that appeared on the front page of a well-known daily English
newspaper in Bangladesh, The Daily Star, in 2018 and 2019. The elucidation of that era’s possible
social and political context was also attempted using word patterns. The study employs three
widely used and contemporary text mining techniques: word clouds, sentiment analysis, and cluster
analysis. The word cloud reveals that election, kill, cricket, and Rohingya-related terms appeared
more than 60 times in 2018, whereas BNP, poll, kill, AL, and Khaleda appeared more than 80 times
in 2019. These indicated the country’s passion for cricket, political turmoil, and Rohingya-related
issues. Furthermore, sentiment analysis reveals that words of fear and negative emotions appeared
more than 600 times, whereas anger, anticipation, sadness, trust, and positive-type emotions came
up more than 400 times in both years. Finally, the clustering method demonstrates that election,
politics, deaths, digital security act, Rohingya, and cricket-related words exhibit similarity and belong
to a similar group in 2019, whereas rape, deaths, road, and fire-related words clustered in 2018
alongside a similar-appearing group. In general, this analysis demonstrates how vividly the text
mining approach depicts Bangladesh’s social, political, and law-and-order situation, particularly
during election season and the country’s cricket craze, and also validates the significance of the
text mining approach to understanding the overall view of a country during a particular time in an
efficient manner.
Keywords: newspaper; headlines pattern and context; word cloud; cluster analysis; sentiment
analysis; Bangladesh
1. Introduction
Text mining is a technique for extracting information from text by recognizing patterns
and trends. The term text mining, text analytics, or text analysis refers to the process of
retrieving information through lexical resources, tagging or annotation, and techniques
such as association, visualization, and prediction. After successfully developing basic
natural language processing (NLP) in the 1960s, different adoptions of techniques such
as dimension reduction, latent factor identification, and database text processing have
contributed to the flourishing of the new era of information retrieval. Moreover, the topic
model or latent semantic analysis and machine learning algorithms seemingly gave a more
substantial base after the 1990s—sentiment analysis and opinion mining methods have
emerged from analysing the sentiment of humans from text, which enthrals intellectual
fields including computer science, statistics, linguistics, and social science. Additionally,
a successful implication for the analysis of journals, social network services, and online
customer reviews, along with email filtering, product suggestions, fraud detection, search
engines, and bankruptcy predictions, has increased its significance in all aspects [1–7].
Information 2021, 12, 414. https://doi.org/10.3390/info12100414 https://www.mdpi.com/journal/information