Contents lists available at ScienceDirect
Information Processing and Management
journal homepage: www.elsevier.com/locate/infoproman
EdgeSumm: Graph-based framework for automatic text
summarization
Wafaa S. El-Kassas
a,⁎
, Cherif R. Salama
a,b
, Ahmed A. Rafea
b
, Hoda K. Mohamed
a
a
DepartmentofComputerandSystemsEngineering,FacultyofEngineering,AinShamsUniversity,Egypt
b
DepartmentofComputerScienceandEngineering,AmericanUniversityinCairo,Egypt
ARTICLE INFO
Keywords:
Automatic text summarization
Extractive text summarization
Graph representation model
Single-document summarization
EdgeSumm
ABSTRACT
Searching the Internet for a certain topic can become a daunting task because users cannot read
and comprehend all the resulting texts. Automatic Text summarization (ATS) in this case is
clearly benefcial because manual summarization is expensive and time-consuming. To enhance
ATS for single documents, this paper proposes a novel extractive graph-based framework
“EdgeSumm” that relies on four proposed algorithms. The frst algorithm constructs a new text
graph model representation from the input document. The second and third algorithms search
the constructed text graph for sentences to be included in the candidate summary. When the
resulting candidate summary still exceeds a user-required limit, the fourth algorithm is used to
select the most important sentences. EdgeSumm combines a set of extractive ATS methods
(namely graph-based, statistical-based, semantic-based, and centrality-based methods) to beneft
from their advantages and overcome their individual drawbacks. EdgeSumm is general for any
document genre (not limited to a specifc domain) and unsupervised so it does not require any
training data. The standard datasets DUC2001 and DUC2002 are used to evaluate EdgeSumm
using the widely used automatic evaluation tool: Recall-Oriented Understudy for Gisting
Evaluation (ROUGE). EdgeSumm gets the highest ROUGE scores on DUC2001. For DUC2002, the
evaluation results show that the proposed framework outperforms the state-of-the-art ATS sys-
tems by achieving improvements of 1.2% and 4.7% over the highest scores in the literature for
the metrics of ROUGE-1 and ROUGE-L respectively. In addition, EdgeSumm achieves very
competitive results for the metrics of ROUGE-2 and ROUGE-SU4.
1. Introduction
The Internet has an exponentially increasing amount of textual data. When users search the Internet for a certain topic, they get a
huge amount of textual content as results. Users cannot read and comprehend all the potentially long documents in the search results.
As a result, it becomes urgent to help users by summarizing textual content. Manual summarization is a costly and time-consuming
process. As such, Automatic Text Summarization (ATS) systems are the key solution to this dilemma. Before ATS systems, users had to
read the full documents to decide whether these documents are useful or not. Nowadays, users need to get important information
from diferent documents or Internet resources in the shortest time (Ma & Wu, 2014). ATS systems are crucial to provide users with
the most important information in the input documents. Of course, if a user needs more details, he can still consult the original
https://doi.org/10.1016/j.ipm.2020.102264
Received 21 January 2020; Received in revised form 4 February 2020; Accepted 7 April 2020
⁎
Corresponding author.
E-mail addresses: wafaa.elkassas@gmail.com (W.S. El-Kassas), cherif.salama@eng.asu.edu.eg (C.R. Salama), rafea@aucegypt.edu (A.A. Rafea),
hoda.korashy@eng.asu.edu.eg (H.K. Mohamed).
Information Processing and Management 57 (2020) 102264
0306-4573/ © 2020 Elsevier Ltd. All rights reserved.
T