Contents lists available at ScienceDirect Information Processing and Management journal homepage: www.elsevier.com/locate/infoproman EdgeSumm: Graph-based framework for automatic text summarization Wafaa S. El-Kassas a,⁎ , Cherif R. Salama a,b , Ahmed A. Rafea b , Hoda K. Mohamed a a DepartmentofComputerandSystemsEngineering,FacultyofEngineering,AinShamsUniversity,Egypt b DepartmentofComputerScienceandEngineering,AmericanUniversityinCairo,Egypt ARTICLE INFO Keywords: Automatic text summarization Extractive text summarization Graph representation model Single-document summarization EdgeSumm ABSTRACT Searching the Internet for a certain topic can become a daunting task because users cannot read and comprehend all the resulting texts. Automatic Text summarization (ATS) in this case is clearly benefcial because manual summarization is expensive and time-consuming. To enhance ATS for single documents, this paper proposes a novel extractive graph-based framework “EdgeSumm” that relies on four proposed algorithms. The frst algorithm constructs a new text graph model representation from the input document. The second and third algorithms search the constructed text graph for sentences to be included in the candidate summary. When the resulting candidate summary still exceeds a user-required limit, the fourth algorithm is used to select the most important sentences. EdgeSumm combines a set of extractive ATS methods (namely graph-based, statistical-based, semantic-based, and centrality-based methods) to beneft from their advantages and overcome their individual drawbacks. EdgeSumm is general for any document genre (not limited to a specifc domain) and unsupervised so it does not require any training data. The standard datasets DUC2001 and DUC2002 are used to evaluate EdgeSumm using the widely used automatic evaluation tool: Recall-Oriented Understudy for Gisting Evaluation (ROUGE). EdgeSumm gets the highest ROUGE scores on DUC2001. For DUC2002, the evaluation results show that the proposed framework outperforms the state-of-the-art ATS sys- tems by achieving improvements of 1.2% and 4.7% over the highest scores in the literature for the metrics of ROUGE-1 and ROUGE-L respectively. In addition, EdgeSumm achieves very competitive results for the metrics of ROUGE-2 and ROUGE-SU4. 1. Introduction The Internet has an exponentially increasing amount of textual data. When users search the Internet for a certain topic, they get a huge amount of textual content as results. Users cannot read and comprehend all the potentially long documents in the search results. As a result, it becomes urgent to help users by summarizing textual content. Manual summarization is a costly and time-consuming process. As such, Automatic Text Summarization (ATS) systems are the key solution to this dilemma. Before ATS systems, users had to read the full documents to decide whether these documents are useful or not. Nowadays, users need to get important information from diferent documents or Internet resources in the shortest time (Ma & Wu, 2014). ATS systems are crucial to provide users with the most important information in the input documents. Of course, if a user needs more details, he can still consult the original https://doi.org/10.1016/j.ipm.2020.102264 Received 21 January 2020; Received in revised form 4 February 2020; Accepted 7 April 2020 ⁎ Corresponding author. E-mail addresses: wafaa.elkassas@gmail.com (W.S. El-Kassas), cherif.salama@eng.asu.edu.eg (C.R. Salama), rafea@aucegypt.edu (A.A. Rafea), hoda.korashy@eng.asu.edu.eg (H.K. Mohamed). Information Processing and Management 57 (2020) 102264 0306-4573/ © 2020 Elsevier Ltd. All rights reserved. T