Graph-Based Analyses of Large-Scale Social Data Philippe Cudr´e-Mauroux 1 and Saket Sathe 2 1 Massachusetts Institute of Technology (MIT) – USA 2 Swiss Federal Institute of Technology (EPFL) – Switzerland Abstract. In this paper, we make an attempt at analyzing the semantic mediation layer of large-scale online networks in a holistic way using graph theoretic tools. We model Peer-to-Peer data networks as graphs, derive a necessary condition to foster semantic interoperability in the large in such graphs, and test our heuristics in the context of an existing bioinformatic portal with hundreds of different schemas. 1 Introduction Even if much effort has recently been devoted to the creation of sophisticated schemes to relate pairs of schemas or ontologies through mappings [RB01], it is however still far from being clear how such large-scale semantic systems evolve or how they can be characterized. For example, even if a lack of schema mappings clearly limits the quality of the overall semantic consensus in a given system, the exact relationships between the former and the latter are unknown. Is there a minimum number of mappings required to foster semantic interoperability in a network of information sharing parties? Given a large set of schemas and schema mappings, can we somehow predict the impact of a query issued locally? This paper represents a first attempt to look at the problem from a macro- scopic point of view. Our contribution is two-fold: first, we develop a model capturing the problem of semantic interoperability in large scale decentralized environments. Second, we identify recent graph theoretic results and show how they can be extended to be applicable to our problem. More specifically, we de- rive a necessary condition to foster semantic interoperability in the large and present a method for evaluating the degree of propagation of a query issued locally. Also, we give an evaluation of our methods applied on a real graph rep- resenting several hundreds of interconnected bioinformatic schemas. The rest of this paper is organized as follows: we start by introducing a general layered rep- resentation for distributed semantic systems. Section 3 is devoted to the formal model with which we analyze semantic interoperability in the large. The main theoretical results related to semantic interoperability and semantic component sizes are detailed in Section 4 and Section 5. Section 6 explores weighted graphs, while Section 7 describes our findings related to the analysis of a real bioinfor- matic semantic network. Finally, we discuss practical applications of our main results from a decentralized perspective before concluding.