International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-9 Issue-1, May 2020 654 Retrieval Number: A1945059120/2020©BEIESP DOI:10.35940/ijrte.A1945.059120 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Modern Multi-Document Text Summarization Techniques Yash Asawa, Vignesh Balaji, Ishan Isaac Dey Abstract: Text Summarization is the technique in which the source document is simplified, valuable information is distilled and an abridged version is produced. Over the last decade, the focus has shifted from single document to multi-document summarization and despite significant progress in the domain, challenges such as sentence ordering and fluency remain. In this paper, a thorough comparison of the several multi-document text summarization techniques such as Machine Learning based, Graph based, Game-Theory based and more has been presented. This paper in its entirety condenses and interprets the numerous approaches, merits and limitations of these techniques. The Benchmark datasets of this domain and their features have also been examined. This survey aims to distinguish the various summarization algorithms based on properties that prove to be valuable in the generation of highly consistent, rational, summaries with reduced redundancy and information richness. The conclusions presented by this paper can be utilized to identify the advantages of these papers which will help future researchers in their study of this domain and ensure the provision of important data for further analysis in a more systematic and comprehensive manner. With the aid of this paper, researchers can identify the areas that present some scope for improvement and thereafter come up with novel or possibly hybrid techniques in Multi-Document Summarization. Index Terms: Abstractive, Extractive, Multi-document summarization, Text Summarization I. INTRODUCTION For recovering data, people generally use the web, for example, Google, Bing, Yahoo etc. Since the amount of material on the web is evolving quickly, for clients it isn't simple to discover pertinent and fitting data according to the prerequisites. When a client transmits a query on an Internet search engine for information or data then the reaction in most of the occasions is a great many documents and the client needs to confront the repetitive assignment of finding the fitting data from this ocean of responses. This issue is known as "Data Overloading"[1]. The essential objective of various multi-document summarization techniques is to create summaries which provide extensive inclusion, less redundancy in the information and extensive consistency between sentences [2]. In other words, the important content is removed from each data source and at that point is re-structured to generate summaries for multiple documents. Revised Manuscript Received on April 21, 2020. * Correspondence Author Saravanakumar Kandasamy, Vellore Institute of Technology, Vellore, Indiaksaravanakumar@vit.ac.in , Yash Asawa,Student, Bachelor’s degree, Computer Science and Engineering, Vellore Institute of Technology, Vellore, India. Vignesh Balaji, Student, Bachelor’s degree, Computer Science and Engineering, Vellore Institute of Technology, Vellore, India. Ishan Dey, Student, Bachelor’s degree, Computer Science and Engineering, Vellore Institute of Technology, Vellore, India. A number of research studies have addressed multi- document text summarization over the last ten years but so far, only four survey papers [3][4][5][6] have been submitted on this overview. Even though the papers have done a decent job in covering the approaches presented in our target domain, most of the novel techniques and models were presented after the survey papers were published. Over the last few years, the summarization domain has developed beyond imagination because of all the new efficient models and approaches that have been published [6]. There are numerous notations for content summarization based on the number of input sources, how the summary is generated, the reason driving the generation of the summary, language in which the documents are presented to the system and their category. Some earlier works on this field have presented a Fuzzy System approach for the generation of textual summaries and they aimed at validating its performance by employing it for the assessment of text but it lagged behind because it couldn’t match the results of a neural model completely driven by data. Another paper presented a deep learning methodology for our problem. They proposed the use of vectors embedded to represent the hidden semantic relations between the words. The vectors were acquired from models that were already trained by the provision of extensive amounts of data. The requirement of an exhaustively inclusive corpus and extensive training posed to be an issue [8]. The usage of eight significant pre-established properties to calculate the scores for each sentence was proposed in another paper. Fuzzy Inference Systems were used for enhancing the value of a summary created by a general statistical approach. However there is no data and description available for the sets of rules that were used, except that they were manually generated [9]. In general, humans with the relevant expertise can use their intelligence and domain knowledge to model the systems. But, in a lot of cases, this can prove to be a very difficult task for humans and hence one of the early researches proposed an automatic modelling approach that utilizes data [10]. In a more recent study, a method which implemented a rule generation mechanism that incorporates expert knowledge was proposed, along with some properties gathered from data [11]. Another important issue in summarization of text is the optimization issue which led to a group of researchers proposing a nature inspired optimization approach which was a multi-criteria optimization model related to Artificial Bee Colony abbreviated as ABC [12].