International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
Tweet Summarization using NLP and Sentiment Analysis
Prof. Omprakash Yadav*, Abhay Shinde1, Omkar Palkar2
*
Assistant Professor, Department of Computer Engineering, Xavier Institute of Engineering, Mumbai,
Maharashtra, India
1,2
B.E student, Computer Engineering, Xavier Institute of Engineering, Mumbai, Maharashtra, India
----------------------------------------------------------------------------***---------------------------------------------------------------------------
Abstract - During recent years, socially generated content has
become pervasive on the World Wide Web. The enormous amount
of content generated on Twitter that allows a huge number of
users to contribute frequent short messages. It consists of small
messages which are regarding some events happening in world or
formally posting relating to themselves. Most of these messages
are a reaction describing same events resulting in redundancy of
tweets. The algorithm used takes a trending phrase or any phrase
specified by a user, collects a large number of posts containing the
phrase, and provides an automatically created summary of the
posts related to the term. We get a global view regarding the
messages in terms of short summaries relating trending terms
during the course of a period of time such as an hour or a day.
KeyWords : Tweets, fitness value, pbest, gbest, stemming,
stopwords, PSO Algorithm.
1. INTRODUCTION
Data mining is known as process of finding anomalies, patterns
and correlations within large data sets to predict outcomes.
Discovering patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database systems
occurs in Data mining. It is an important process where
intelligent methods are applied to extract data patterns. It is an
interdisciplinary subfield of computer science. The overall goal
of the info mining process is to extract information from a
knowledge set and transform it into a clear structure for further
use.
1.1 Flow of the Project
1. Retrieval of tweets
The tweets are extracted from Twitter Account
2. Pre-Processing
Pre-processing describes any type of processing
performed on tweets to prepare it for another processing
procedure.
3. Segmentation
The goal of tweet segmentation is to split the tweet into a
sequence of semantically
meaningful unit or any other types of phrases which are
more often used together. For tweet segmentation,
HybridSeg framework is proposed.
4. Clustering of similar tweets
Clustering also called grouping multiple objects in a way that
objects in the same group are more similar to each other than
to those in other group (clusters). In this phase, similar tweets
are clustered using Particle Swarm Optimization algorithm.
2. Input and Output
Real time data in the form of tweets using the source obtained
from the Twitter API.
URLs of the users in the fetched data.
The summarized tweets obtained will be displayed for the
user. The most favourite/liked/popular tweets related to the
search parameters. The number of Re-tweets available.
© 2020, IRJET Impact Factor value: 7.529 ISO 9001:2008 Certified Journal | Page 7372