Research Article
Summarizing Online Movie Reviews: A Machine Learning
Approach to Big Data Analytics
Atif Khan,
1
Muhammad Adnan Gul,
1
M. Irfan Uddin,
2
Syed Atif Ali Shah ,
3
Shafiq Ahmad ,
4
Muhammad Dzulqarnain Al Firdausi,
4
and Mazen Zaindin
5
1
Department of Computer Science, Islamia College Peshawar, Peshawar, Pakistan
2
Institute of Computing, Kohat University of Science and Technology, Kohat, Pakistan
3
Faculty of Engineering and Information Technology, Northern University, Nowshehra, Pakistan
4
King Saud University, College of Engineering, Department of Industrial Engineering, Riyadh, Saudi Arabia
5
King Saud University, College of Science, Department of Statistics and Operations Research, Riyadh, Saudi Arabia
Correspondence should be addressed to Shafiq Ahmad; ashafiq@ksu.edu.sa
Received 23 February 2020; Accepted 7 May 2020; Published 1 August 2020
Academic Editor: Shaukat Ali
Copyright © 2020 Atif Khan et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Information is exploding on the web at exponential pace, so online movie review is becoming a substantial information resource
for online users. However, users post millions of movie reviews on regular basis, and it is not possible for users to summarize the
reviews. Movie review classification and summarization is one of the challenging tasks in natural language processing. erefore,
an automatic approach is demanded to summarize the vast amount of movie reviews, and it will allow the users to speedily
distinguish the positive and negative aspects of a movie. is study has proposed an approach for movie review classification and
summarization. For movie review classification, bag-of-words feature extraction technique is used to extract unigrams, bigrams,
and trigrams as a feature set from given review documents, and represent the review documents as a vector space model. Next, the
Na¨ ıve Bayes algorithm is employed to classify the movie reviews (represented as a feature vector) into positive and negative
reviews. For the task of movie review summarization, Word2vec feature extraction technique is used to extract features from
classified movie review sentences, and then semantic clustering technique is used to cluster semantically related review sentences.
Different text features are used to calculate the salience score of each review sentence in clusters. Finally, the top-ranked sentences
are chosen based on highest salience scores to produce the extractive summary of movie reviews. Experimental results reveal that
the proposed machine learning approach is superior than other state-of-the-art approaches.
1. Introduction
With the expansion of Web 2.0 that emphasizes the in-
volvement of users, many websites such as a movie review
website, such as Internet Movie Database (IMDB) and
Amazon, encourage its users to write reviews for the
products they liked or purchased, in order to enhance the
shopping experience and satisfaction of customers. Online
sellers often ask their customers to provide opinions or
reviews over products or services they purchased online. e
amount of reviews received by a product increases quickly as
millions of customers post reviews about a product, which
results in information overload. is information overload
makes it a challenging task for a potential customer to scan
each review of a product for making a quick decision
whether to purchase a product or not. At the same time, it is
also hard for service providers or online merchants/product
manufacturers to keep track of a huge amount of reviews
posted by customers related to the services or products [1].
In order to overcome the challenge of information overload,
an automatic review classification and summarization sys-
tem is needed [2].
In this study, we will focus on the movie review domain.
Considering the movies, summarizing thousands of reviews
received by a movie can help the viewers (customers) to
swiftly scan the summary of it and promptly make a decision
Hindawi
Scientific Programming
Volume 2020, Article ID 5812715, 14 pages
https://doi.org/10.1155/2020/5812715