15 © Springer Nature Switzerland AG 2020
S. A. Costa et al. (eds.), Mathematics (Education) in the Information Age,
Mathematics in Mind, https://doi.org/10.1007/978-3-030-59177-9_2
Chapter 2
An Unsupervised Approach to User
Characterization in Online Learning
and Social Platforms
Dan Vilenchik
A Short History of User Characterization
Making sense of data that is automatically collected from online platforms such as
online social media or e-learning platforms is a challenging task: the data is mas-
sive, multidimensional, noisy, and heterogeneous (composed of differently behav-
ing individuals). In this chapter we focus on a central task common to all on-line
social platforms and that is the task of user characterization. For example, automati-
cally identify a spammer or a bot on Twitter, or a disengaged student in an e-learning
platform.
Understanding the nature and patterns of interaction between members of a
social network is a long standing research topic. Back in the 1950s (Katz and Felix
Lazarsfeld 1957) studied the problem of identifying infuential people in social net-
works. Two decades later, Freeman’s seminal work (Freeman 1978) coined three
key indices of centrality: degree (the number of friends), closeness (the average
number of hops from a user to all other users in the network) and betweenness (the
fraction of shortest paths that have to go through this user), fueling a torrent of theo-
retical and experimental work in the area. The subject became even more attractive
to researchers and industry as the role of online social networks (OSNs) increased
dramatically in recent years, with new business opportunities for marketeers.
The task of characterizing users of OSNs is typically approached as a supervised
learning classifcation problem. A target variable is defned, e.g. the ethnicity and
political affliation of a user (Pennacchiotti and Popescu 2011), gender, age, regional
origin (Rao et al. 2010), occupational class (Preotiuc-Pietro et al. 2015), etc. Next,
data is collected from the network (typically using some sort of crawling proce-
dure), and relevant features are extracted from each user account. Finally, one of a
host of machine learning algorithms is trained to perform the task.
D. Vilenchik (*)
School of Electrical and Computer Engineering, Ben-Gurion University, Beersheba, Israel
e-mail: vilenchi@bgu.ac.il