Proceedings of the 13 th INDIACom; INDIACom-2019; IEEE Conference ID: 46181 2019 6 th International Conference on “Computing for Sustainable Global Development”, 13 th - 15 th March, 2019 Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) An Algorithmic Approach based on Principal Component Analysis for Aspect-based Opinion Summarization Surbhi Bhatia Manav Rachna International Institute of Research and Studies, Faridabad, India surbhibhatia1988@yahoo.com Rosy Madan Manav Rachna International Institute of Research and Studies, Faridabad, India madaan.rosy@gmail.com Saneh Lata Yadav KR Mangalam University, Gurugram, India ersnehlata70@gmail.com Komal Kumar Bhatia Computer Science and Engineering department J.C. Bose University of Science and Technology, YMCA University of Science and Technology, Faridabad. komal_bhatia1@rediffmail.com Abstract— Summarization helps in reducing the text in the shortest way possible such that significant properties in the text remain preserved and important information can be gained from the text. A novel approach is developed and summaries are generated using extractive based technique using Principal Component Analysis. The advantages of the proposed method lie in greater computational efficiency, data understanding, robustness and handling sparse data. This paper discusses the aspect based opinion summarization problem by proposing a novel unsupervised method by combining theories of rational awareness with sentence dependency trees to identify aspects. The results are carried out on dissimilar datasets consisting of numerous opinions and comparison with the previous based approaches demonstrates the success of the work. The results on Opinosis dataset are reported by measuring using ROUGE tool. The three random individuals are contacted for reference summaries which are compared with the system generated gold summaries for conducting subjective evaluation. Keywords—Text document, Extractive Summarization, Opinion Words, Principal Component Analysis, Condensed Text, Statistical Analysis. I. INTRODUCTION Web 3.0 platforms share enormous amount of information where people exchange their opinions and views. The ability to post opinionated reviews is a service that is provided by many reviews sites where customers can post their opinions as free text. Identification of aspects from free text is an interesting task in sentiment analysis. Aspect-based opinion summarization includes two main areas under Opinion Mining; i.e. Aspect identification and Opinion summarization. Novel algorithm is proposed for extracting aspects using dependency relations and linguistic features from customer reviews. Summarizing text is taken as an interesting task of Natural Language Processing (NLP). The intersection of two fields; computer science and linguistics leads to the formation of NLP. In other words, thoughts and notions can be swapped over between human and computers by applying NLP on processed data. Multiple opinions are required to generate summary and it is based on feature selection [1] feature rating [2], and identifying sentence that contain features [3] Target entities or aspect describes the theme of the review, for instance, “this is a clean restaurant with good food.” In this sentence, “restaurant” is target entity and service is an aspect with opinion words as clean and good respectively. A novel approach is proposed that covers the complete domain area for features and their opinion words. Many researchers have worked to extract target entities from ConceptNet [4] [5]. ConcepNet assertions are used to identify relevant concepts and these concepts and synonyms of the concepts are taken from WordNet [6]. In many works, parts of speech in English are grouped into sets of synonyms called as synsets by using lexical relations. Many researchers have discussed and argued that the main motive of opinion mining is to examine and evaluate sentiments which are stated by people on World Wide Web (WWW) [7][8][9]. Further, for greater and almost complete coverage of all concepts, the present work works on extending the rule based methodology by using grammatical relations between sentences to present the aspect-opinion word pair after filtering the meaningful terms present in the dataset relative to the particular domain. The syntactic relations are one-to-one correspondence, so, grammar dependencies will help in identifying aspect-opinion word pair more easily and effectively. Summarization using extractive technique uses the principle of Principal Component Analysis (PCA). It is a statistical technique that helps in transforming an array of data values which are associated or correlated in some form into values that are linearly uncorrelated data sets known as principal components using orthogonal transformation. The work includes the application of PCA in summarization of text by reducing the number of dimensions in data (aspects) and Copy Right © INDIACom-2019; ISSN 0973-7529; ISBN 978-93-80544-32-8 75