A Methodological Framework for Statistical Analysis of Social Text Streams Sophia Kleisarchaki 1,2 , Dimitris Kotzinos 1,3 , Ioannis Tsamardinos 1,2 , and Vassilis Christophides 1,2 1 Institute of Computer Science, FORTH {kleisar,kotzino,tsamard,christop}@ics.forth.gr 2 Computer Science Department, University of Crete, Greece 3 Department of Geoinformatics and Surveying, TEI Serres, Greece Abstract. Social media are one of the main contributors of user gen- erated content; providing vast amounts of data in daily basis, covering a wide range of topics, interests and events. In order to identify and link meaningful and relevant information, clustering algorithms have been used to partition the user generated content. We have identified though that these algorithms exhibit various shortcomings when they have to deal with social media textual information, which is dynamic and streaming in nature. Thus we explore the idea to estimate the al- gorithms’ parameters based on observations on the clusters’ properties’ (like the centroid, shape and density) evolution. By experimenting with the clusters’ properties, we propose a methodological framework that detects the evolution of the clusters’ centroid, shape and density and explores their role in parameters’ estimation. Keywords: twitter, clustering algorithm, centroid, shape, density. 1 Introduction We are witnessing an unprecedented growth of interest in social media 1 enabling people to achieve a near real-time information awareness. Several online network- ing sites (e.g. Facebook), micro-blogging applications (e.g. Twitter) and Social news (e.g. Digg) produce on a daily basis vast amounts of user-generated textual content. Identifying topics of conversation in social text streams and monitoring how they evolve over time have attracted both scientific and industrial interest. Twitter enables users to post short textual messages (up to 140 characters), known as tweets, to update their followers with their findings, thinking and comments on some topics. Topics cover in general, a wide variety of real-world events [3] ranging from popular, widely known events (e.g., related to worldwide or national breaking news, sports or music events) to happenings that might receive no coverage in traditional news outlets (e.g., a local social gathering, an annual convention, or a community-specific reunion). According to a recent 1 en.wikipedia.org/wiki/Socialmedia Y. Tanaka et al. (Eds.): ISIP 2012, CCIS 146, pp. 101–110, 2013. c Springer-Verlag Berlin Heidelberg 2013