A Gamification Framework for Sensor Data Analytics Alexandra L’Heureux, Katarina Grolinger, Wilson A. Higashino, Miriam A. M. Capretz Department of Electrical and Computer Engineering Western University London, ON, Canada N6A 5B9 {alheure2, kgroling, whigashi, mcapretz}@uwo.ca Abstract—The Internet of Things (IoT) enables connected objects to capture, communicate, and collect information over the network through a multitude of sensors, setting the foundation for applications such as smart grids, smart cars, and smart cities. In this context, large scale analytics is needed to extract knowledge and value from the data produced by these sensors. The ability to perform analytics on these data, however, is highly limited by the difficulties of collecting labels. Indeed, the machine learning techniques used to perform analytics rely upon data labels to learn and to validate results. Historically, crowdsourcing platforms have been used to gather labels, yet they cannot be directly used in the IoT because of poor human readability of sensor data. To overcome these limitations, this paper proposes a framework for sensor data analytics which leverages the power of crowdsourcing through gamification to acquire sensor data labels. The framework uses gamification as a socially engaging vehicle and as a way to motivate users to participate in various labelling tasks. To demonstrate the framework proposed, a case study is also presented. Evaluation results show the framework can successfully translate gamification events into sensor data labels. Keywords-Internet of Things; Sensor Data; Gamification; Data Analytics; Machine Learning; Crowdsourcing; I. I NTRODUCTION The Internet of Things (IoT) [1] is an ecosystem pow- ered by sensors and microchips, which enables connection and communication among real-world objects, environments, software, and people. Through this network of things, sensors and devices are capturing and exchanging enormous amounts of data and fuelling the Big Data movement. The functionality of the IoT depends upon four fundamental steps [2]: data acquisition, information extraction, knowl- edge extraction, and action-taking. Data analytics techniques, through the use of machine learning algorithms, can be used to extract information and knowledge from raw data. Supervised and unsupervised machine learning algorithms, however, rely strongly on data labels for proper functioning. Data labels are defined as a representation of the ground truth or gold standard [3] of a data sample. Supervised ma- chine learning algorithms are entirely dependent upon labels to learn and extract knowledge from data, and their performance is directly related to label quality [4]. On the other hand, unsupervised machine learning algorithms extract patterns or discover similarities from data without prior access to labels [5]. In this case, however, labels are still important to validate algorithm accuracy. The lack of labels describing the contextual information surrounding sensor data readings is one of the root challenges for data analytics within the IoT. This is especially true in case of human activity data: the data captured by sensors during performance of human tasks that affect sensor readings. For example, an electricity consumption sensor can capture variations in consumption when someone turns off a light or plugs in a device. Although gathered within the IoT, this type of sensor data is often processed and analyzed by field experts because such data cannot be easily reconciled and interpreted without prior domain and contextual knowledge. For instance, simply by looking at electricity consumption time-series data, it is hard to determine which device was turned on or off. In contrast, other types of data, such as images and social media posts, are human interpretable: an untrained user can correctly identify what the data represent, enabling better data labelling and consequently data analytics. Crowdsourcing, a solution that leverages the power of crowds to perform tasks at a low cost [6], has been used for labelling. In the machine learning context, these tasks ask a large number of users to identify and label manually specific data such as images or tweets. The Mechanical Turk service [7] is an implementation of crowdsourcing. It enables researchers to post tasks to be performed, and in exchange for their participation, users receive financial compensation. Such crowdsourcing frameworks are adequate for labelling tasks where humans are more effective than computers, such as identifying images. However, in the case of sensor data typical crowdsourcing frameworks are often ineffective due to poor human readability of the data. Users cannot simply look at sensor data and effectively extract information on what human activity the sensors are measuring. This paper proposes a framework for sensor data analytics that leverages the power of crowds to enable sensor data labelling through gamification [8]. In the proposed framework, a game is designed to collect the labels needed for data analytics by asking users to perform specific tasks within the game. Each of these tasks is associated with labels, which are then automatically applied to the sensor readings. For example, to label electricity consumption data, a mobile game could ask users to perform tasks such as turning lights on