International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 6, Nov - Dec 2016 ISSN: 2347-8578 www.ijcstjournal.org Page 115 A Novel Approach for Extraction and Analysis of Tweets Siddu P. Algur [1] , Rashmi H. Patil [2] , Prashant Bhat [3] Department of Computer Science Rani Channamma University Belagavi - India ABSTRACT Online Social Media networks are being more popular nowadays where, we share rich and timely information about real world events such as sports, films, political issues etc. Twitter is one of the most popular online social media network which generates the up to date news or information throughout the world to the users. The information generated on twitter contains lot of irrelevant data. It is very difficult to extract the relevant data on day to day basis and to perform the analysis. In this paper, an attempt is made to demonstrate the extraction and analysis of tweets (on particular topic) by an extendible toolkit. The extracted data are analyzed using NodeXL which allows users to quickly generate useful network statistics, metrics and visualizations in the context of database. Keywords: - Online Social Media, NodeXL, Degree of Centrality, Twitter. I. INTRODUCTION These roots of social media stretch very deeper. Interacting with family and friends across the long distances has been a concern of humans for centuries. By using Social Medias, people are happily and comfortably communicate to strengthen their relationships. The Social Media gives chance to users to upload a profile and make friends with other users. The first blogging site became popular which was creating a social media sensation that is still popular. Nowadays there are so many Social Medias; some of them are World Wide, Face book, Twitter, LinkedIn, Google+, MySpace etc. Among all these Social Medias, twitter is very popular and fast growing network. Twitter is one of the popular social network which was created by the programmers (Jack Dorsey, Evan Williams and Biz Stone). On March 21 2006 Jack sent the first tweet as “just setting up my twttr” it would be the beginning of uprising. Now users state or express their feelings in 140 characters or less. One hundred and forty (ie 140) is the number of characters limit allowing users to post a tweet. Nowadays twitter has millions together users. Users share their opinions about any current issues like political, social, environmental, sports, educational, business, film industry etc. It is one of the Social Media which is spreading the news to all over the world. In twitter, users can form tweet networks, they can follow one another, also the twitter allows to retweet. The twitter network connections are visible in the text of each tweet or by requesting lists of the users that follow the author of each tweet from Twitter. Today, there is a wealth of Social Media data are coming to us at a steady stream in various format. Tweets contains a rich set of information like a unique numerical IDs which are attached to each tweet, IDs for all the replies, the URL of the author if a website is referenced, the number of followers and many other technical information which can be analyzed. In twitter, it is a challenging task to analyze trending topic and non-trending topics. Topic detection is a fundamental building block to monitor and summarize the information which is originating from social source. The objective of the study is to extract and analyze topic from twitter data. In the first step it is necessary to define explicitly what constitutes a twitter topic. Then the next step is the extraction of tweets based on explicitly defined twitter topic. One of the most common searches includes this to fetch the term containing particular query. Further the tweets will be analyzed using degree of centrality approach and the results will be analyzed using NodeXL. Also analysis part contains, network graph generated based on the connection between topic to the tweets, with each topic consisting a representation of the tweets that are linked together to that topic i.e. clustering. Centrality is the number of links incident upon a node (i.e. the number of ties that a node has). There are two types of centralities namely, Betweennes Centrality and Closeness Centrality which are discussed in this paper. The paper is organized into five sections. The section 2 presents the related work. The section 3 presents with the design methodology. The section 4 presents with the experimental results. Section 5 presents with the conclusion. II. RELATED WORK There have been various efforts in the previous years to provide flexible, interactive and effective exploratory interfaces for network analysis [7]. The authors [1] have worked on the datasets extracted from the micro blogging service. They described how a dataset produced using the query term ‘Syria’ can be increased in size to include tweets on the topic of Syria that do not contain that query term. They compare three methods for this task, using the top hash tags RESEARCH ARTICLE OPEN ACCESS