ABHATH AL-YARMOUK: "Basic Sci. & Eng." Vol. 22, No. 1, 2013, pp. 75- 95 _______________________________________ © 2012 by Yarmouk University, Irbid, Jordan. * Faculty of Sciences & IT, Zarqa University, Zarqa - Jordan ** Deparment of Computer Information Systems, Yarmouk University, Irbid, Jordan. *** College of Business and Information System, Dakota State University, Madison, SD, USA Keyword Extraction Based on Word Co-Occurrence Statistical Information for Arabic Text Mohammed Al-Kabi*, Hassan Al-Belaili ** , Bilal Abul-Huda ** and Abdullah H. Wahbeh *** Received on Jan. 22, 2012 Accepted for publication on June 24, 2012 Abstract Keyword extraction has many useful applications including indexing, summarization, and categorization. In this work we present a keyword extraction system for Arabic documents using term co-occurrence statistical information which used in other systems for English and Chinese languages. This technique based on extracting top frequent terms and building the co-occurrence matrix showing the occurrence of each frequent term. In case the co-occurrence of a term is in the biasness degree, then the term is important and it is likely to be a keyword. The biasness degree of the terms and the set of frequent terms are measured using 2. Therefore terms with high 2 values are likely to be keywords. The adopted 2 method in this study is compared with another novel method based on term frequency - inverted term frequency (TF-ITF) which tested for the first time. Two datasets were used to evaluate the system performance. Results show that the 2 method is better than TF-ITF, since the precision and the recall of the 2 for the first experiment was 0.58 and 0.63 respectively and for the second experiment the 2 accuracy was 64%. The results of these experiments showed the ability of the 2 method to be applied on the Arabic documents and it has an acceptable performance among other techniques. Keywords: Keyword extraction, Arabic Keyword extraction, Information Retrieval, Natural language processing. Introduction Today, the internet contains a huge amount of electronic information such as papers, articles and news. This huge volume shows the necessity to have an effective ways to retrieve and filter the desired information. Many search engines have the ability to retrieve the most relevant document but there is a need to show a brief description of the retrieved information especially when the human’s incapable to summarize this huge amount of information. Keyword extraction techniques are important for information seekers, since it allow them to get what they want just by looking at the suggested keywords so determine what they should read. The need of keyword extraction comes from the huge growth of the Internet; the amount of information is rapidly increasing in many different languages [1] and documents in Arabic language are part of this growth. Therefore there is an increasing need for the retrieval, filtering and mining of Arabic documents through the World Wide Web.