Segregation of Similar and Dissimilar Live RSS News Feeds Based on Similarity Measures Avani Sakhapara, Dipti Pawade, Hardik Chapanera, Harshal Jani and Darpan Ramgaonkar Abstract News in the form of text is widely available to us with a number of different sources that are available on the Internet. Having so many varied sources often makes the same news available by most of the sources. A user who prefers a thoroughgoing update on different news headlines ends up reading the same news from different sources. So we have developed a system to cluster the news based on similarity. We extract the news from the RSS link provided by the user. Using similarity measures like Edit Distance, Jaccard Similarity, Cosine Similarity and WordNet Similarity, we have implemented a system which presents the summary of identical and different news feeds from different sources and its effectiveness is measured. Keywords RSS news feeds Jaccard Cosine Edit Distance WordNet 1 Introduction and Related Work E-newspapers of various news agencies can be accessed over the Internet. These news websites consist of a feature of providing an instant update of incidents and events to the readers via feeds called the Really Simple Syndication (RSS) feeds. A user who prefers a thoroughgoing update on different news headlines ends up reading same news from different sources. In this paper, we are going to address the problem of news redundancy from various RSS feeds. Our objective is to design a system which will extract Live RSS news feeds from the different sources selected by the user. Then, it will analyze the headlines and will group into identical news and dissimilar news. Many researchers have worked in this area. Burkepile and Fizzano [1] have introduced Articial Immune based system which analyzes the content of the news A. Sakhapara (&) D. Pawade H. Chapanera H. Jani D. Ramgaonkar Department of IT, K.J. Somaiya College of Engineering, Mumbai, India e-mail: avanisakhapara@somaiya.edu © Springer Nature Singapore Pte Ltd. 2019 V. E. Balas et al. (eds.), Data Management, Analytics and Innovation, Advances in Intelligent Systems and Computing 839, https://doi.org/10.1007/978-981-13-1274-8_26 333