Proceedings of the 13
th
INDIACom; INDIACom-2019; IEEE Conference ID: 46181
2019 6
th
International Conference on “Computing for Sustainable Global Development”, 13
th
- 15
th
March, 2019
Bharati Vidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA)
An Algorithmic Approach based on Principal
Component Analysis for Aspect-based Opinion
Summarization
Surbhi Bhatia
Manav Rachna International Institute of Research and
Studies, Faridabad, India
surbhibhatia1988@yahoo.com
Rosy Madan
Manav Rachna International Institute of Research and
Studies, Faridabad, India
madaan.rosy@gmail.com
Saneh Lata Yadav
KR Mangalam University, Gurugram, India
ersnehlata70@gmail.com
Komal Kumar Bhatia
Computer Science and Engineering department
J.C. Bose University of Science and Technology, YMCA
University of Science and Technology, Faridabad.
komal_bhatia1@rediffmail.com
Abstract— Summarization helps in reducing the text in the
shortest way possible such that significant properties in the text
remain preserved and important information can be gained from
the text. A novel approach is developed and summaries are
generated using extractive based technique using Principal
Component Analysis. The advantages of the proposed method lie
in greater computational efficiency, data understanding,
robustness and handling sparse data. This paper discusses the
aspect based opinion summarization problem by proposing a
novel unsupervised method by combining theories of rational
awareness with sentence dependency trees to identify aspects. The
results are carried out on dissimilar datasets consisting of
numerous opinions and comparison with the previous based
approaches demonstrates the success of the work. The results on
Opinosis dataset are reported by measuring using ROUGE tool.
The three random individuals are contacted for reference
summaries which are compared with the system generated gold
summaries for conducting subjective evaluation.
Keywords—Text document, Extractive Summarization, Opinion
Words, Principal Component Analysis, Condensed Text, Statistical
Analysis.
I. INTRODUCTION
Web 3.0 platforms share enormous amount of information
where people exchange their opinions and views. The ability to
post opinionated reviews is a service that is provided by many
reviews sites where customers can post their opinions as free
text. Identification of aspects from free text is an interesting
task in sentiment analysis. Aspect-based opinion
summarization includes two main areas under Opinion Mining;
i.e. Aspect identification and Opinion summarization. Novel
algorithm is proposed for extracting aspects using dependency
relations and linguistic features from customer reviews.
Summarizing text is taken as an interesting task of Natural
Language Processing (NLP). The intersection of two fields;
computer science and linguistics leads to the formation of
NLP. In other words, thoughts and notions can be swapped
over between human and computers by applying NLP on
processed data. Multiple opinions are required to generate
summary and it is based on feature selection [1] feature rating
[2], and identifying sentence that contain features [3] Target
entities or aspect describes the theme of the review, for
instance, “this is a clean restaurant with good food.” In this
sentence, “restaurant” is target entity and service is an aspect
with opinion words as clean and good respectively. A novel
approach is proposed that covers the complete domain area for
features and their opinion words. Many researchers have
worked to extract target entities from ConceptNet [4] [5].
ConcepNet assertions are used to identify relevant concepts
and these concepts and synonyms of the concepts are taken
from WordNet [6]. In many works, parts of speech in English
are grouped into sets of synonyms called as synsets by using
lexical relations. Many researchers have discussed and argued
that the main motive of opinion mining is to examine and
evaluate sentiments which are stated by people on World Wide
Web (WWW) [7][8][9]. Further, for greater and almost
complete coverage of all concepts, the present work works on
extending the rule based methodology by using grammatical
relations between sentences to present the aspect-opinion word
pair after filtering the meaningful terms present in the dataset
relative to the particular domain. The syntactic relations are
one-to-one correspondence, so, grammar dependencies will
help in identifying aspect-opinion word pair more easily and
effectively. Summarization using extractive technique uses the
principle of Principal Component Analysis (PCA). It is a
statistical technique that helps in transforming an array of data
values which are associated or correlated in some form into
values that are linearly uncorrelated data sets known as
principal components using orthogonal transformation. The
work includes the application of PCA in summarization of text
by reducing the number of dimensions in data (aspects) and
Copy Right © INDIACom-2019; ISSN 0973-7529; ISBN 978-93-80544-32-8 75