Abstract—Automatic Text Summarization has received a great deal of attention in the past couple of decades. It has gained a lot of interest especially with the proliferation of the Internet and the new technologies. Arabic as a language still lacks research in the field of Information Retrieval. In this paper, we explore lexical cohesion using lexical chains for an extractive summarization system for Arabic documents. Keywords— Summarization, Arabic language, lexical cohesion, lexical chains I. INTRODUCTION UMMARY is a "text that is produced from one or more texts, that conveys important information in the original text(s), and that is no longer than half of the original text(s) and usually significantly less than that" [21]. Summarization dates back to the late fifties where the first attempts relied entirely on statistical approaches [15]. The sentence consisting of words with a high frequency were given a higher weight than the others indicating the importance of these sentences. Other than the mentioned approach, many different approaches were devised to tackle the problem of summarization [4], [16], [12]. Cue phrases and lead method are one of the many approaches, the former extracts sentences containing words or phrases for e.g., „significant‟, „In this paper‟ etc. The latter extracts first sentences of paragraphs assuming they contain the main idea. These methods rely on shallow approaches to indicate the importance of sentences to be included in the summary. Other approaches look at deeper levels like similarity that occurs when two words share a common stem, as in for instance the thesaural relationships that identify the different semantic relations existing between words, or the Rhetorical Structure Theory which identifies the relationship between text units [23]. There have been a few studies done on summarizing Arabic documents. Lakhas attempted generating summary using a hybrid approach [6]. The developed system relied on shallow approaches; frequency calculations, indicative expressions (cue words), lead method and title method. The system was evaluated in DUC 2004 (Document Understanding Conference). Systems that produce user focused summaries such as the one developed by El-Haj, and al [10] generates query based and concept based summaries. Arabic Query Based Text Summarization System (AQBTSS) is a query Hamza Zidoum 1 is with Department of Computer Science, College of Science, Sultan Qaboos University, Muscat, Sultanate of Oman. based single document summarizer that generates summaries relevant to the user query. Each sentence in the document is compared against the user query and only the relevant sentences are extracted. The other system Arabic Concept Based Text Summarization System (ACBTSS) is a concept based summarizer that generates a summary by matching each sentence against a set of keywords entered by the user, and these words represent a specific concept. The system uses a vector space model (VSM) that makes use of two measures; term frequency (tf) and inverse document frequency (idf) to weighing sentences. A different approach devised using clustering techniques [24]. In this technique the roots are extracted for each word and are placed in the appropriate cluster. The words are assigned weights based on the number of words in the cluster it belongs. In addition to that it makes use of cue words which can enhance the weight of the sentence. The system then extracts sentences with highest scores, and the number of sentences depends on the size of the document. Summarization can be described as a two-step process: (1) Building a source representation from the original text, and (2) Converting the source representation to an intermediate representation. The input to the summarization systems can be in the form of textual data or other types of multimedia such as audio, video or images. Furthermore, summarization can even be performed on single documents or multiple documents consisting of a single language or more than one language also called multi-lingual. Output of the summarization system can be categorized into two groups: extracts and abstracts. Extract summaries consist of sentences from the original document whereas abstract summaries paraphrase some sections of the text or formed from the generated sentences. This requires language generation techniques and has some challenges. Presenting a user with an adequate summary requires capturing the main theme of the document. This can be accomplished by looking at the related words in the document. Lexical cohesion can be created by semantically related words and represented by lexical chains. Lexical chains groups together semantically related words. In comparison to English, and despite the recent interest due to geopolitic issues, Arabic still lacks research in the field of Information Retrieval. The factors contributing to this challenge of automatic processing of Arabic is the Arabic script itself due to the lack of dedicated letters to represent short vowels, changes in the form of the letter depending on its place in the word, and the absence of capitalization and minimal punctuation, Arabic words can be ambiguous as Computing Lexical Chains for Automatic Arabic Text Summarization Hamza Zidoum 1 , Ahmed Al-maamari 1 , Nasser Al-Amri 1 , Ahmed Al-Yahyai 1 , and Said Al-Ramadhani 1 S Int'l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) Vol. 2, Issue 1 (2015) ISSN 2349-1469 EISSN 2349-1477 http://dx.doi.org/10.15242/IJCCIE.E0915025 26