Geo-Social Co-location Mining Michael Weiler* Klaus Arthur Schmid* Nikos Mamoulis + Matthias Renz* * Institute for Informatics, Ludwig-Maximilians-Universität München + Department of Computer Science, University of Hong Kong {weiler,schmid,renz}@dbs.ifi.lmu.de nikos@cs.hku.hk ABSTRACT Modern technology to capture geo-spatial information produce a huge flood of geo-spatial and geo-spatio-temporal data as a new user mentality of utilizing this technology to voluntarily share in- formation. This location information, enriched with social infor- mation, is a new source to discovery new and useful knowledge. This work introduces geo-social co-location mining, the problem of finding social groups that are frequently found at the same lo- cation. This problem has applications in social sciences, allow- ing to research interactions between social groups and permitting social-link prediction. It can be divided into two sub-problems. The first sub-problem of finding spatial co-location instances, requires to properly address the inherent uncertainty in geo-social network data, which is a consequence of generally very space check-in data, and thus very space trajectory information. For this purpose, we propose a probabilistic model to estimate the probability of a user to be located at a given location at a given time, creating the notion of probabilistic co-locations. The second sub-problem of mining the resulting probabilistic co-location instances requires efficient for large databases having a high degree of uncertainty. Our ap- proach solves this problem by extending solutions for probabilistic frequent itemset mining. Our experimental evaluation performed on real (but anonymized) geo-social network data shows the high efficiency of our approach, and its ability to find new social inter- actions. 1. INTRODUCTION Spatial features describe the presence or absence of geographic object types at different locations. Examples of spatial features in- clude plant species, animal species, road types, cancers, crime, and business types, or features of individuals, such as personal prefer- ences, or simply their id. A spatial co-location pattern represents a subset of spatial features whose instances are frequently located in a spatial neighborhood. For example, “botanists may have found that there are orchids in 80% of the area where the middle-wetness green-broad-leaf forest grows” (example taken from [26]). Spa- tial co-location patterns may yield important insights for many ap- plications. For example, a mobile service provider may be inter- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. ested in services frequently requested by geographical neighbors, and thus gain sales promotion data. Other application domains in- clude Earth science, public health, biology, transportation and geo- social networks. Traditional solutions for the problem of frequent co-location mining [26] considers classical spatial data, where each data record has a (certain) spatial location. In this project, which we wish to discuss with a broad audience at GeoRich’15, we want to take the problem of spatial co-location mining into a new context, by considering spatio-temporal data, i.e., trajectory data of individuals. Thus, the problem now is to find groups of users which frequently co-locate in geo-space over time, creating the notion of geo-social co-location mining. There is already an abundance of public data sets that can be mined, in- cluding data sets from geo-social networks [7] and from social net- works using geo-tags such as Twitter. Frequent co-location mining on such data may yield interesting patterns, such as “Members of LMU and HKU are frequently to be found at the same location, while members of some other university are often found in solitude or among themselves”. In such an application, each instance of a co-location corresponds to a (l, t, S) triple, where S denotes the set of individuals that have been at the same location l at the same time t. The problem of geo-social co-location mining introduces two major new challenges which have not been sufficiently covered in existing work on traditional co-location mining. Firstly, the tem- poral dimension leads to very large sets of co-location instances, since every location and time pair leads to a possibly non-empty co-location instance, secondly existing solutions do not consider the uncertainty which is inherent in spatial data: Spatial data may be imprecise (e.g., due to measurement errors), data can be obso- lete (e.g., when the most recent position update is already minutes old), data may originate from unreliable sources (such as crowd- sourcing), or it may be blurred to prevent privacy threats and to protect user anonymity [8]. For example, the oval regions in Fig- ure 1 may correspond to individual persons, while the color of each person may represent the individual’s affiliations. Here, the loca- tion of each person is a conservative approximation based on the users GPS history. It is important to note that we are considering historic data. Thus, for a given point of time t, both past and future GPS positions of a user may be available. 1 Given these approxima- tions, it becomes possible to estimate which point of interest each user is currently visiting, yielding probability distribution as shown in the table in Figure 1 for depicted point of time (22:00) and for a point of time one hour later. Given such data, we can immediately envision a number of use- ful applications: Find groups of people often co-locating. In the setting de- 1 A probabilistic model to estimate the position of a mobile user given past and future observations can be found in [21].