Intricacies of an Automatic Text Summarizer Lincy Meera Mathews #1 , Dr E Sathiyamoorthy *2 # Department of Information Science & Engineering, MSRIT Bangalore 1 lincymm99@gmail.com * Associate Professor, School of Information Science & Technology Vellore University, Vellore 2 esathiyamoorthy@vit.ac.in Abstract— The creation of an abstract over a text document prepared by a computer program is defined as an Automatic Text Summarizer. This abstract of the text document must however contain all the salient features of the original document. This paper tries to cover the necessary functional modules that complete an automatic text summarizer. It also highlights the trends and challenges in text summarization. Surveys of certain text summarization techniques are also mentioned. Keywords: Text summarizer, Natural language processing, Extract, Abstract, Hybrid I. INTRODUCTION Human Beings have now access to abundance of information on the net. However access to the relevant information required by the user is still a challenge today. In view of this, text summarization techniques[1] is now one of the most important and well researched tool for assisting and retrieving of digital information. Text summarizer was mainly developed to 1) Improve the quality of text classification techniques such as classification, clustering and regression. The output of classifiers is highly dependent on the quality of summarized text document. 2) Reduce time spent by researchers, academics by access to quality abstracts of digital documents. 3) Access to relevant and important facts immediately. Humans have the tendency to oversee important and critical facts or sentences. However a text summarizer will automatically cover the important facts of the document. Section 2 presents an outline and background in the area of text summarization. Section 3 investigates the methodologies and relevant modules required in a text summarizer. Section 4presents a survey of text summarization tools with their techniques. Section 5 covers the challenges of text summarizer. Section 6 concludes the paper. II. ABOUT TEXT SUMMARIZER A. Definition and Aim: Hovey,E.H [2] defines summary as a text that is produced from one or more texts, that contain a significant portion of the information in the original text(s) and that is no longer than half of the original text(s). The goal of a text summarizer program is to summarize a text document by 1) Distilling the most important facts in the respective document. 2) Covering all the salient aspects and topics of the document. 3) Narrowing down on the most precise and complete statement to represent a topic, paragraph or a sentence. 4) Inclusion of only the most necessitated statements without redundancy, ambiguities and error in representation of information. The structure of a Text document will necessarily contain the following: Title of the document and sentences that are formed by concatenating a group of words. However the document may or may not contain the following: Subtopics also called as sub headers, paragraphs defined as connected sequences of sentences and conclusion. Tapping on each unit of the document, the text summarizer must succeed on summarizing by using the information retrieved and formulate the best represented sentence for the document. Summarized document need not be a direct function of the text document alone. It can be a bi function of text document and user knowledge. The summarization must be dependent on the knowledge depth of the user. If the user is a novel, he gets a simplified version of the document as he might not be aware of the jargons used. A knowledgeable user might be interested in the core area of the topic being covered in the document. Lincy Meera Mathews et.al / International Journal of Engineering and Technology (IJET) ISSN : 0975-4024 Vol 5 No 3 Jun-Jul 2013 2871