International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-9 Issue-1, May 2020
654
Retrieval Number: A1945059120/2020©BEIESP
DOI:10.35940/ijrte.A1945.059120
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Modern Multi-Document Text Summarization
Techniques
Yash Asawa, Vignesh Balaji, Ishan Isaac Dey
Abstract: Text Summarization is the technique in which the
source document is simplified, valuable information is distilled
and an abridged version is produced. Over the last decade, the
focus has shifted from single document to multi-document
summarization and despite significant progress in the domain,
challenges such as sentence ordering and fluency remain. In this
paper, a thorough comparison of the several multi-document text
summarization techniques such as Machine Learning based,
Graph based, Game-Theory based and more has been presented.
This paper in its entirety condenses and interprets the numerous
approaches, merits and limitations of these techniques. The
Benchmark datasets of this domain and their features have also
been examined. This survey aims to distinguish the various
summarization algorithms based on properties that prove to be
valuable in the generation of highly consistent, rational,
summaries with reduced redundancy and information richness.
The conclusions presented by this paper can be utilized to
identify the advantages of these papers which will help future
researchers in their study of this domain and ensure the
provision of important data for further analysis in a more
systematic and comprehensive manner. With the aid of this
paper, researchers can identify the areas that present some scope
for improvement and thereafter come up with novel or possibly
hybrid techniques in Multi-Document Summarization.
Index Terms: Abstractive, Extractive, Multi-document
summarization, Text Summarization
I. INTRODUCTION
For recovering data, people generally use the web, for
example, Google, Bing, Yahoo etc. Since the amount of
material on the web is evolving quickly, for clients it isn't
simple to discover pertinent and fitting data according to the
prerequisites. When a client transmits a query on an Internet
search engine for information or data then the reaction in
most of the occasions is a great many documents and the
client needs to confront the repetitive assignment of finding
the fitting data from this ocean of responses. This issue is
known as "Data Overloading"[1].
The essential objective of various multi-document
summarization techniques is to create summaries which
provide extensive inclusion, less redundancy in the
information and extensive consistency between sentences
[2]. In other words, the important content is removed from
each data source and at that point is re-structured to
generate summaries for multiple documents.
Revised Manuscript Received on April 21, 2020.
* Correspondence Author
Saravanakumar Kandasamy, Vellore Institute of Technology, Vellore,
Indiaksaravanakumar@vit.ac.in ,
Yash Asawa,Student, Bachelor’s degree, Computer Science and
Engineering, Vellore Institute of Technology, Vellore, India.
Vignesh Balaji, Student, Bachelor’s degree, Computer Science and
Engineering, Vellore Institute of Technology, Vellore, India.
Ishan Dey, Student, Bachelor’s degree, Computer Science and
Engineering, Vellore Institute of Technology, Vellore, India.
A number of research studies have addressed multi-
document text summarization over the last ten years but so
far, only four survey papers [3][4][5][6] have been
submitted on this overview.
Even though the papers have done a decent job in covering
the approaches presented in our target domain, most of the
novel techniques and models were presented after the
survey papers were published.
Over the last few years, the summarization domain has
developed beyond imagination because of all the new
efficient models and approaches that have been published
[6].
There are numerous notations for content summarization
based on the number of input sources, how the summary is
generated, the reason driving the generation of the
summary, language in which the documents are presented
to the system and their category. Some earlier works on this
field have presented a Fuzzy System approach for the
generation of textual summaries and they aimed at
validating its performance by employing it for the
assessment of text but it lagged behind because it couldn’t
match the results of a neural model completely driven by
data.
Another paper presented a deep learning methodology for
our problem. They proposed the use of vectors embedded to
represent the hidden semantic relations between the words.
The vectors were acquired from models that were already
trained by the provision of extensive amounts of data. The
requirement of an exhaustively inclusive corpus and
extensive training posed to be an issue [8]. The usage of
eight significant pre-established properties to calculate the
scores for each sentence was proposed in another paper.
Fuzzy Inference Systems were used for enhancing the value
of a summary created by a general statistical approach.
However there is no data and description available for the
sets of rules that were used, except that they were manually
generated [9].
In general, humans with the relevant expertise can use their
intelligence and domain knowledge to model the systems.
But, in a lot of cases, this can prove to be a very difficult
task for humans and hence one of the early researches
proposed an automatic modelling approach that utilizes data
[10]. In a more recent study, a method which implemented a
rule generation mechanism that incorporates expert
knowledge was proposed, along with some properties
gathered from data [11]. Another important issue in
summarization of text is the optimization issue which led to
a group of researchers proposing a nature inspired
optimization approach which was a multi-criteria
optimization model related to Artificial Bee Colony
abbreviated as ABC [12].