Estimating Completeness in Streaming Graphs Malay Bhattacharyya Department of C.S.E. University of Kalyani malaybhattacharyya @klyuniv.ac.in Supratim Bhattacharya Department of C.S.E. University of Kalyani bhattacharya.supratim @gmail.com Sanghamitra Bandyopadhyay Machine Intelligence Unit Indian Statistical Institute sanghami@isical.ac.in ABSTRACT Finding the completeness of a graph is important from vari- ous aspects. Considering the massive growth and dynamics of real-life networks, we readdress this problem in a stream- ing setting. We approach the problem of verifying the com- pleteness of a graph by estimating the eigen values of a sketch of its adjacency matrix. Here, we provide the first approximation algorithm for estimating the completeness of a bipartite graph in the streaming model. The approach is further generalized for any arbitrary simple graph. We em- ploy some useful recent results on `1 heavy eigen-hitters to construct the algorithms working in linear time and consum- ing sublinear space. The implementation of the algorithms have also been done and tested on a couple of networks. We illustrate the eectiveness of the proposed approaches in analyzing social, biological and other real-life networks. Categories and Subject Descriptors E.1 [Data Structures]: Graphs and Networks; F.2 [Analysis of Algorithms and Problem Complexity]: Nonnumeri- cal Algorithms and Problems; G.2.2 [Discrete Mathemat- ics]: Graph Theory General Terms Theory, Design, Analysis Keywords Streaming model, complete graphs, heavy eigen-hitter 1. INTRODUCTION Graphs and networks are suitable descriptors of various real-life environments like social activity, professional collab- oration, web activity, etc. [12, 16]. They reflect local and global relationships between the objects, which they model. Corresponding author. (c) 2014, Copyright is with the authors. Published in the Workshop Pro- ceedings of the EDBT/ICDT 2014 Joint Conference (March 28, 2014, Athens, Greece) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted under the terms of the Creative Commons license CC- by-nc-nd 4.0. Studying how these objects interact with each other is use- ful from dierent perspectives. A graph is complete if all of its objects are connected to each other [5]. We are often interested to find out whether a graph is complete or not. Verifying the completeness of a graph consumes quadratic space and time with respect to its order. Considering the massive growth and dynamics of real-life networks, this be- comes time/space inecient. Therefore, designing sublinear algorithms is very important in massive data analytics [15]. Due to the explosive growth of volume of the real-life datasets (the emergence of big data ), many of the computa- tional problems have been redefined to overcome the bottle- necks of time/space complexity. In this paper, we readdress the problem of verifying the completeness of a graph in a streaming model. In streaming models, the data are avail- able as a sequence of items (stream) and the data cannot be stored entirely [20]. Therefore, we have to examine the data within a few passes (may be single) as the available memory is also limited. Again, the processing time per item has to be sublinear. This imposes a new kind of uncertainty in computing beyond approximation and randomization. Here, we consider that the adjacency matrix of a graph is available as a stream. Adopting a turnstile model, we estimate completeness of the corresponding graph based on the `1 norm. Initially, we study the problem for a bipartite graph in the streaming model and generalize it further for any arbitrary simple graph. We employ some recent approx- imation results on `1 heavy eigen-hitters to find out top k eigen values, respectively [3]. The proposed algorithms run in linear time and consumes space proportional to k 2 and the error parameters. We also demonstrate the eective- ness of the approaches in analyzing social and other real-life networks. The current paper is organized as follows. Some back- ground details and motivating applications are included in section 2 and section 3, respectively. Section 4 describes the state-of-the-art. Some theoretical results are provided in section 5 and based on this the proposed method is pre- sented in section 6. Section 7 and section 8 cover some em- pirical results and discussions. Finally, section 9 concludes the paper. 2. PRELIMINARIES Let us introduce some formal notations and standard def- initions that will be used throughout the paper. We assume that |S| denotes the size (cardinality) of a set S. A graph is a doublet G =(V,E), where V denotes the set of vertices and E V V denotes the set of edges. The term graph 294