Exploiting Spatio-Temporal User Behaviors for User Linkage
Wei Chen
School of Computer Science and
Technology, Soochow University,
China
wchzhg@gmail.com
Hongzhi Yin
*
School of ITEE, The University of
Queensland, Brisbane, Australia
db.hongzhi@gmail.com
Weiqing Wang
School of ITEE, The University of
Queensland, Brisbane, Australia
weiqingwang@uq.edu.au
Lei Zhao
School of Computer Science and
Technology, Soochow University,
China
zhaol@suda.edu.cn
Wen Hua
School of ITEE, The University of
Queensland, Brisbane, Australia
w.hua@uq.edu.au
Xiaofang Zhou
School of ITEE, The University of
Queensland, Brisbane, Australia
zxf@itee.uq.edu.au
ABSTRACT
Cross-device and cross-domain user linkage have been at-
tracting a lot of attention recently. An important branch of
the study is to achieve user linkage with spatio-temporal da-
ta generated by the ubiquitous GPS-enabled devices. The
main task in this problem is twofold, i.e., how to extract the
representative features of a user; how to measure the similar-
ities between users with the extracted features. To tackle the
problem, we propose a novel model STUL (Spatio-Temporal
User Linkage) that consists of the following two components.
1) Extract users’ spatial features with a density based clus-
tering method, and extract the users’ temporal features with
the Gaussian Mixture Model. To link user pairs more precise-
ly, we assign different weights to the extracted features, by
lightening the common features and highlighting the discrim-
inative features. 2) Propose novel approaches to measure the
similarities between users based on the extracted features,
and return the pair-wise users with similarity scores higher
than a predefined threshold. We have conducted extensive
experiments on three real-world datasets, and the results
demonstrate the superiority of our proposed STUL over the
state-of-the-art methods.
KEYWORDS
Cross-domain; User linkage; Spatio-temporal behaviors
1 INTRODUCTION
The proliferation of GPS-enabled devices and mobile tech-
niques has led to the emergence of large amount of spatio-
temporal information. For example, the vehicles equipped
*
This author is the corresponding author.
Permission to make digital or hard copies of all or part of this work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advan-
tage and that copies bear this notice and the full citation on the first
page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Request permissions
from permissions@acm.org.
CIKM’17, November 6–10, 2017, Singapore.
© 2017 ACM. ISBN 978-1-4503-4918-5/17/11. . . $15.00
DOI: http://dx.doi.org/10.1145/3132847.3132898
with GPS can generate lots of trajectories, which consist of
a sequence of points that are sampled in a short time period,
to keep track of moving objects. Meanwhile, the widespread
of location based social networks, such as Facebook, Twit-
ter, and Foursquare have generated massive discrete check-in
data [20], as many users share their status associated with lo-
cations and timestamps. The availability of spatio-temporal
information offers a good opportunity to model users’ spatio-
temporal behaviors [23][18]. On the other hand, user linkage,
which aims at connecting the same users across different plat-
forms, has attracted much attention. User linkage benefits
widespread real applications, such as prediction [13][21], data
fusion [28], recommendation [19][22], etc. This paper focus-
es on leveraging the increasingly available spatio-temporal
information in user linkage.
However, to the best of our knowledge, there is only one
work utilizing the users’ spatial and temporal features simul-
taneously to achieve user linkage [14]. In that work, location-
s and times are divided into bins, and each spatio-temporal
record is associated with a bin (r, t) where r is a region and t
represents a time interval. The similarities between users are
inferred based on users’ co-occurrences in each bin. Nonethe-
less, time and space are intrinsically continuous. Discretiza-
tion of time and space inevitably leads to information loss,
especially for the points near the boundaries. Assume that
u0 is a user on platform A while u1 and u2 are two user-
s on platform B. To simplify the problem, we assume that
there is only one activity record v0, v1 and v2 for each user
u0, u1 and u2 respectively. The distributions of these activ-
ity records in terms of space and time are given in Figure
1(a) and 1(b) respectively. Based on [14], u0 and u2 have a
larger probability to be linked together, as they co-occur in
both the spatial bin r1 and the temporal bin t1. However,
compared with u2, u1 is more similar to u0 in terms of both
spatial distribution in Figure 1(a) and temporal distribution
in Figure 1(b). Thus, the discretization based method can-
not capture the similarity between features that are divided
into different bins. Besides, discretization of time and space
always begs the question of selecting the region or time inter-
val size, and the size is invariably too small for some regions
and too large for others.