iTop – Interaction Based Topic Centric Community Discovery on Twitter Denzil Correa denzilc@iiitd.ac.in Ashish Sureka ashish@iiitd.ac.in Mayank Pundir mayank09025@iiitd.ac.in Indraprastha Institute of Information Technology (IIIT-Delhi) New Delhi, India http://www.iiitd.ac.in/ ABSTRACT Automatic detection of communities (or cohesive groups of actors in social network) in online social media platforms based on user interests and interaction is a problem that has recently attracted a lot of research attention. Mining user interactions on Twitter to discover such communities is a technically challenging information retrieval task. We present an algorithm – iTop – to discover interaction based topic centric communities by mining user interaction signals (such as @-messages and retweets) which indicate cohesion. iTop takes any topic as an input keyword and exploits local information to infer global topic-centric communities. We evaluate the discovered communities along three dimensions: graph based (node-edge quality), empirical-based (Twitter lists) and semantic based (frequent n-grams in tweets). We conduct experiments on a publicly available scrape of Twit- ter provided by InfoChimps via a web service. We perform a case study on two diverse topics - ‘Computer Aided Design (CAD)’ and ‘Kashmir’ to demonstrate the efficacy of iTop. Empirical results from both case studies show that iTop is successfully able to discover topic-centric, interaction based communities on Twitter. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Clustering; H.2.8 [Database Applications]: Data Mining General Terms Algorithms, Measurement, Experimentation Keywords Community Detection, Social Networks, Twitter 1. RESEARCH MOTIVATION AND AIM Over the past decade, there has been a swift rise in the number of users who register on online social networking Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PIKM’12, November 2, 2012, Maui, Hawaii, USA. Copyright 2012 ACM 978-1-4503-1719-1/12/11 ...$10.00. websites like MySpace, Facebook, and YouTube. Registered users on these social networks are provided with various fea- tures like reply, comment, subscribe and friendship in order to interact, engage and share information with each other. Such interactions lead to the formation of closely knit user- groups or densely connected clusters of users around specific topics within the social network; these are called communi- ties. The tendency of users to form a community structure is a significant characteristic of any social network. Commu- nity discovery or Community detection in social networks has many practical applications and hence, is of research in- terest to both physicists and computer scientists alike. How- ever, online social networks like Facebook and YouTube have a large user base and host enormous amounts of data. For example, YouTube has more than 800 Million unique visi- tors per month, 100 Million social interactions per week and more than 48 hours of video is uploaded every minute. 1 Due to the existence of such large scale networks and huge vol- ume of data, community discovery on social networks is a challenging information extraction (or retrieval) task. Currently, Twitter is one of the most used and immensely popular social network. Twitter is a micro-blog which al- lows registered users to share images, videos and text in short 140-character limit messages called tweets. Twitter reports that it has more than 100 Million active users with more than 200 Million tweets posted everyday. Previous research shows that Twitterers use Twitter to serve multi- ple purposes like - share their daily experiences, take part in conversations, share information (in the form of URLs, images, videos) and report & assimilate news [10]. Hence, Twitterers with homogenous interests tend to flock together viz. form cohesive self-formed communities. Moreover, these communities rally around specific topics which are of prime interest to Twitterers part of that specific group or com- munity [8]. For example, fans of the well known American singer Beyonce may form a community around the topic ‘Beyonce’ while fans of Lionel Messi (Argentina & Barcelona football player) may form a community around the the topic ‘Messi’. Extraction or discovery of these tight-knit homoge- neous communities on Twitter around specific topics of in- terest has wide spread real-life applications across multiple domains. We present two real-word examples to demon- strate our argument : 1. Digital Marketing – Many businesses use Twitter as a marketing tool and implement roadmaps depend- 1 http://www.youtube.com/t/press_statistics