International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 06 | June-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1182 Automated Document Summarization and Classification Using Deep Learning Krushna Sharma 1 , Avinash Gaikwad 2 , Swapnil Patil 3 , Pradeep Kumar 4 , D.P. Salapurkar 5 1,2,3,4 B.E. (Computer Engineering), Sinhgad College of Engineering, Pune, Maharashtra, India 5 Assistant Professor, Dept. of Computer Engineering, Sinhgad College of Engineering, Pune, Maharashtra, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – The exponential growth of the Internet has led to great deal of interest in developing useful and efficient tools and software to assist users in searching the web for relevant documents. Document classification is generally defined as content-based assignment of one or more predefined categories to documents. Document classification appears in many applications, including email-filtering, news monitoring, etc. It is not feasible to classify these documents manually and present automated classification methods have drawbacks like low accuracy and dependency on humans for document tagging. The proposed system uses deep learning methods to speed up the classification process and recommend relevant documents. The proposed deep learning algorithm -’Recurrent Neural Network with Convolutional Neural Network’ helps in construction of a robust classifier model using variety of data for training. This classifier can then be improvised to classify documents in a business database automatically. Key Words: Summarization, Classification, Neural Network, Deep Learning, Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), Recurrent Convolutional Neural Network (RCNN). 1. INTRODUCTION The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of pre-classified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. Until the late ’ͺ0s the most popular approach to TC, at least in the ǲoperationalǳ (i.e., real world applications) community, was a knowledge engineering (KE) one, consisting in manually defining a set of rules encoding expert knowledge on how to classify documents under the given categories. In the ’ͻ0s this approach has increasingly lost popularity (especially in the research community) in favor of the machine learning (ML) paradigm, according to which a general inductive process automatically builds an automatic text classifier by learning, from a set of pre-classified documents, the characteristics of the categories of interest. The advantages of this approach are an accuracy comparable to that achieved by human experts, and a considerable savings in terms of expert labor power, since no intervention from either knowledge engineers or domain experts is needed for the construction of the classifier or for its porting to a different set of categories. The proposed system implements Recurrent Neural network along with Convolutional neural network to build the classifier model. Only summary of document is used for classification phase which speeds up the training phase considerably. 1.1 Background and Basics With the dramatic growth of the Internet, people are overwhelmed by the tremendous amount of online information and documents. This expanding availability of documents has demanded exhaustive research in the area of automatic text summarization. A summary is defined as ǲa text that is produced from one or more texts, that conveys important information in the original text(s), and that is no longer than half of the original text(s) and usually, significantly less than thatǳ. Automatic text summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning. In recent years, numerous approaches have been developed for automatic text summarization and applied widely in various domains. For example, search engines generate snippets as the previews of the documents. Other examples include news websites which produce condensed descriptions of news topics usually as headlines to facilitate browsing or knowledge extractive approaches. Automatic text summarization gained attraction as early as the 1950s. An important research of these days was for summarizing scientific documents. Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer