International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072 Tweet Summarization using NLP and Sentiment Analysis Prof. Omprakash Yadav*, Abhay Shinde1, Omkar Palkar2 * Assistant Professor, Department of Computer Engineering, Xavier Institute of Engineering, Mumbai, Maharashtra, India 1,2 B.E student, Computer Engineering, Xavier Institute of Engineering, Mumbai, Maharashtra, India ----------------------------------------------------------------------------***--------------------------------------------------------------------------- Abstract - During recent years, socially generated content has become pervasive on the World Wide Web. The enormous amount of content generated on Twitter that allows a huge number of users to contribute frequent short messages. It consists of small messages which are regarding some events happening in world or formally posting relating to themselves. Most of these messages are a reaction describing same events resulting in redundancy of tweets. The algorithm used takes a trending phrase or any phrase specified by a user, collects a large number of posts containing the phrase, and provides an automatically created summary of the posts related to the term. We get a global view regarding the messages in terms of short summaries relating trending terms during the course of a period of time such as an hour or a day. KeyWords : Tweets, fitness value, pbest, gbest, stemming, stopwords, PSO Algorithm. 1. INTRODUCTION Data mining is known as process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems occurs in Data mining. It is an important process where intelligent methods are applied to extract data patterns. It is an interdisciplinary subfield of computer science. The overall goal of the info mining process is to extract information from a knowledge set and transform it into a clear structure for further use. 1.1 Flow of the Project 1. Retrieval of tweets The tweets are extracted from Twitter Account 2. Pre-Processing Pre-processing describes any type of processing performed on tweets to prepare it for another processing procedure. 3. Segmentation The goal of tweet segmentation is to split the tweet into a sequence of semantically meaningful unit or any other types of phrases which are more often used together. For tweet segmentation, HybridSeg framework is proposed. 4. Clustering of similar tweets Clustering also called grouping multiple objects in a way that objects in the same group are more similar to each other than to those in other group (clusters). In this phase, similar tweets are clustered using Particle Swarm Optimization algorithm. 2. Input and Output Real time data in the form of tweets using the source obtained from the Twitter API. URLs of the users in the fetched data. The summarized tweets obtained will be displayed for the user. The most favourite/liked/popular tweets related to the search parameters. The number of Re-tweets available. © 2020, IRJET Impact Factor value: 7.529 ISO 9001:2008 Certified Journal | Page 7372