International Journal of English Linguistics; Vol. 7, No. 4; 2017 ISSN 1923-869X E-ISSN 1923-8703 Published by Canadian Center of Science and Education 33 Addressing the Problem of Coherence in Automatic Text Summarization: A Latent Semantic Analysis Approach Abdulfattah Omar 1&2 1 Department of English, College of Science and Humanities, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia 2 Department of English, Faculty of Arts, Port Said University, Urban, Egypt Correspondence: Abdulfattah Omar, Department of English, College of Science and Humanities, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia. E-mail: a.abdelfattah@psau.edu.sa Received: March 13, 2017 Accepted: April 1, 2017 Online Published: July 15, 2017 doi:10.5539/ijel.v7n4p33 URL: http://doi.org/10.5539/ijel.v7n4p33 Abstract This article is concerned with addressing the problem of coherence in the automatic summarization of prose fiction texts. Despite the increasing advances within the summarization theory, applications and industry, many problems are still unresolved in relations to the applications of the summarization theory to literature. This can be in part attributed to the peculiar nature of literary texts where standard or typical summarization processes are not amenable for literature. This study, therefore, tends to bridge the gap between literature and summarization theory by proposing a summarization system that is based on more semantic-based approaches for extracting more meaningful and coherent summaries. Given that lack of coherence within summaries has its negative implications on understanding original texts; it follows that more effective methods should be developed in relation to the extraction of coherent summaries. In order to do this, a hybrid of methods including statistical (TF-IDF) and semantic (Latent Semantic Analysis LSA) methods were used to derive the most distinctive features and extract summaries from 10 English novellas. For evaluation purposes, both intrinsic and extrinsic methods are used for determining the quality of the extracted summaries. Results indicate that the integration of LSA into features extraction methods achieves better summarization performance outcomes in terms of coherence properties within the extracted summaries. Keywords: automatic summarization- cohesion- coherence- extraction- Latent Semantic Analysis- TF-IDF 1. Introduction The recent explosive growth of digital and online texts has posed a number of challenges in text summarization research. Traditionally this process has been paper-based using what can be described as the philological method where researchers and professionals tended to read source texts and compose their own summaries based on the selection of what they think to be the most significant sentences within these texts. The advent of electronic text, however, has raised many issues concerning the reliability and effectiveness of these traditional methods. The prolific size of digital corpora as well as the complexity of data abstracted from them make it imperative today to develop more reliable methods that can deal with these challenges in an effective way. Recognizing the ineffectiveness of manual and traditional methods, researchers are increasingly turning to computational and machine-based methods for carrying out summarization tasks. Over the recent years, different methods have been proposed in the study of automatic text summarization (ATS) including extraction-based, abstraction-based, and aided summarization methods. However, extraction methods remain the most widely used so far. These have largely been based on generating summaries in the form of generic extracts; that is, the resulting document summary is a sequence of fragments of the original text. Generally speaking, these methods depend on statistical/ quantitative weighting methods for extracting the most distinctive words and phrases. One problem with think kind of summarization, however, is that sentence relevance is not always accurate. Summaries of this kind suffer a very serious problem which is lack of sentence relevance. Therefore, summaries extracted are not coherent. This problem is even more challenging in the summarization of literary texts including novels and short fiction where extracted summaries cannot express well what texts are about. In most cases, the summaries do not reveal the development of actions and do not give the reader the expected information about a given text. The implication is that more coherent summaries are required for finding the main ideas of a text as well as