Finding Structure in Blogs: Bipartite Networks Analysis [Invited Presentation, Extended Abstract] ∗ † Bosiljka Tadi´ c ‡ Department of Theoretical Physics Jožef Stefan Institute 1000 Ljubljana, Slovenia bosiljka.tadic@ijs.si Marija Mitrovi´ c § Department of Theoretical Physics Jožef Stefan Institute 1000 Ljubljana, Slovenia marija.mitrovic@ijs.si ABSTRACT Temporal patterns of activity on Blogs (posting, reading, commenting, comment-on-comment) contain valuable infor- mation about user behavior, which leads to potentially new type of social clustering in the Blog space. Here we show how the structure in Blog space can be retrieved from the data by mapping onto a bipartite graph and using the appro- priate methods of complex networks, including the spectral analysis of graphs [4, 3]. With the analysis of (almost) com- plete set of data from B92 Blogsite since its opening, we demonstrate how the user communities emerge in time and what are possible underlying mechanisms of this structure. Categories and Subject Descriptors E.m [Data]: Miscellaneous; I.2.4 [Computing Method- ologies]: Artificial Intelligence—Knowledge Representation Formalisms and Methods Keywords Complex networks, Blog structure, Cyber communities 1. MOTIVATION In recent review The convergence of social and technologi- cal networks [1] Kleinberg stressed that the “Internet-based ∗ Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. VALUETOOLS 2009, October 20-22, 2009 - Pisa, Italy. Copyright 2009 ICST 978-963-9799-70-7/00/0004 $5.00. † Work suported by FP7 project CYBEREMOTIONS and P1-0044 (Slovenia). ‡ Presenting author. § On leave from the Scientific Computing Laboratory, Insti- tute of Physics, Belgrade, Serbia. data on human interaction connects scientific inquiry like never before”. Underlying the on-line social interaction facil- itated by the communication technology (e.g., on Blogsites, Facebook, MySpace, Wikipedia, Digg, and other), “there is a broader process at work, a growing pattern of movement through online spaces to form connections with others, build virtual communities, and engage in self-expression”[1]. The social clustering emerging through these interactions is both new and fast developing and playing an important role in everyday life of modern society. In the classical approach, social interactions have been studied by mapping onto com- plex networks of connected individuals, who are forming groups with traditional social meaning. Grouping in the cy- ber space, however, is different and not necessarily related to conventional friendship, family or business relationships. This implies other mechanisms which drive people’s activ- ity in the cyber space. Recent studies [5] indicate diversity in people’s interests to posts, that might be related with quality, emotional, or moral contents of the posted mate- rial and its interference with users preferences and personal profiles. Consequently, new scientific methods are needed for the analysis of cyber communities. In our approach, we map large datasets about users and their posts and com- ments collected from Blogs onto suitably defined bipartite graphs, in which both users and their posts play an equal role. We then analyze user communities [3, 4] and identify posts which cause their clustering in the Blog space. 2. DATA STRUCTURE We consider large datasets collected from different Blog sites over extended period of time (from few weeks to few years). While the Blog sites can differ in internal organization and history containing a different information about bloggers, post and comments, the common feature of all blogs which enables us to study social interaction network is the exis- tence of unique identification of users (every blogger has to be registered under unique ID). Further details which af- fect the analysis may vary depending on the blogsite. For instance, bloggers are allowed to write posts (B92 Blog), se- lect a news story (Digg), or just leave a comment on posts written by professionals (BBC Blog). Regarding the subject categories, most Blogsites have in-advance determined cat- egories of posts, the exception is the B92 Blog where users write their story without obligation to adhere with a pre- defined category. Consequently, the internal structure with user communities on B92 posts emerges in a self-organized manner through user interactions on posts and comment-