RAPID: Real-time Analytics Platform for Interactive Data Mining Kwan Hui Lim *‡ , Sachini Jayasekara * , Shanika Karunasekera * , Aaron Harwood * , Lucia Falzon † , John Dunn † and Glenn Burgess † * The University of Melbourne and † Defence Science and Technology, Australia ‡ Singapore University of Technology and Design {kwan.lim@,w.jayasekara@student.,karus@,aharwood@}unimelb.edu.au, {FirstName.LastName}@dst.defence.gov.au Abstract. Twitter is a popular social networking site that generates a large volume and variety of tweets, thus a key challenge is to filter and track relevant tweets and identify the main topics discussed in real-time. For this purpose, we developed the Real-time Analytics Platform for In- teractive Data mining (RAPID) system, which provides an effective data collection mechanism through query expansion, numerous analysis and visualization capabilities for understanding user interactions, tweeting behaviours, discussion topics, and other social patterns. Keywords: Twitter, Social Networks, Real-time, Topic Tracking 1 Introduction MongoDB MongoDB User Interface Data Retrieval and Analysis Discussion Analysis Discussion Analysis Discussion Analysis Topic Tracking Topic Tracking Topic Tracking Hashtag Clustering Hashtag Clustering Hashtag Clustering Keyword Expansion Keyword Expansion Keyword Expansion User Query Processing User Query Processing User Query Processing Twitter Streaming API Data Storage Data Pre-processing Data Pre-processing Data Pre-processing Fig. 1. Overview of RAPID System Social networking sites, such as Twitter, have become a prevalent communication platform in our daily life, with discussions ranging from mainstream topics like TV and music to specialized topics like politics and climate change. Tracking and understanding these discussions provide valuable insights into the general opinions and sentiments towards spe- cific topics and how they change over time, which are useful to researchers, companies, government organizations alike, e.g., adver- tising, marketing, crisis detection, disaster management. Despite its usefulness, the large volume and wide variety of tweets makes it challenging to track and understand the discussions on these top- ics [2,5]. To address these challenges, we proposed and developed the Real-time Analytics Platform for Interactive Data mining (RAPID) for topic tracking and analysis on Twitter (Figure 1). RAPID offers a unique topic-tracking capability using query keyword and user expansion to track topics and related discussions, as well as various analytics capabilities to visualize the collected tweets, users and topics, and understand tweeting and interaction behaviours.