Segregation of Similar and Dissimilar
Live RSS News Feeds Based
on Similarity Measures
Avani Sakhapara, Dipti Pawade, Hardik Chapanera,
Harshal Jani and Darpan Ramgaonkar
Abstract News in the form of text is widely available to us with a number of
different sources that are available on the Internet. Having so many varied sources
often makes the same news available by most of the sources. A user who prefers a
thoroughgoing update on different news headlines ends up reading the same news
from different sources. So we have developed a system to cluster the news based on
similarity. We extract the news from the RSS link provided by the user. Using
similarity measures like Edit Distance, Jaccard Similarity, Cosine Similarity and
WordNet Similarity, we have implemented a system which presents the summary of
identical and different news feeds from different sources and its effectiveness is
measured.
Keywords RSS news feeds
Jaccard
Cosine
Edit Distance
WordNet
1 Introduction and Related Work
E-newspapers of various news agencies can be accessed over the Internet. These
news websites consist of a feature of providing an instant update of incidents and
events to the readers via feeds called the Really Simple Syndication (RSS) feeds.
A user who prefers a thoroughgoing update on different news headlines ends up
reading same news from different sources. In this paper, we are going to address the
problem of news redundancy from various RSS feeds. Our objective is to design a
system which will extract Live RSS news feeds from the different sources selected
by the user. Then, it will analyze the headlines and will group into identical news
and dissimilar news.
Many researchers have worked in this area. Burkepile and Fizzano [1] have
introduced Artificial Immune based system which analyzes the content of the news
A. Sakhapara (&) D. Pawade H. Chapanera H. Jani D. Ramgaonkar
Department of IT, K.J. Somaiya College of Engineering, Mumbai, India
e-mail: avanisakhapara@somaiya.edu
© Springer Nature Singapore Pte Ltd. 2019
V. E. Balas et al. (eds.), Data Management, Analytics and Innovation,
Advances in Intelligent Systems and Computing 839,
https://doi.org/10.1007/978-981-13-1274-8_26
333