Cloud based Social and Sensor Data Fusion Surender Reddy Yerva EPFL, Lausanne, Switzerland surenderreddy.yerva@epfl.ch Hoyoung Jeung SAP Research, Brisbane, Australia hoyoung.jeung@sap.com Karl Aberer EPFL, Lausanne, Switzerland karl.aberer@epfl.ch Abstract—As mobile cloud computing facilitates a wide spec- trum of smart applications, the need for fusing various types of data available in the cloud grows rapidly. In particular, social and sensor data lie at the core in such applications, but typically processed separately. This paper explores the potential of fusing social and sensor data in the cloud, presenting a practice—a travel recommendation system that offers the predicted mood information of people on where and when users wish to travel. The system is built upon a conceptual framework that allows to blend the heterogeneous social and sensor data for integrated analysis, extracting weather-dependent people’s mood informa- tion from Twitter and meteorological sensor data streams. In order to handle massively streaming data, the system employs various cloud-serving systems, such as Hadoop, HBase, and GSN. Using this scalable system, we performed heavy ETL as well as filtering jobs, resulting in 12 million tweets over four months. We then derived a rich set of interesting findings through the data fusion, proving that our approach is effective and scalable, which can serve as an important basis in fusing social and sensor data in the cloud. I. I NTRODUCTION Mobile phones increasingly become multi-sensor devices, accumulating large volumes of data related to our daily lives. At the same time, mobile phones are also serving as a major channel for recording people’s activities at social-networking services in the Internet. These trends obviously raise the potential of collaboratively analyzing sensor and social data in mobile cloud computing—where applications running in the cloud are accessed from thin mobile clients, providing virtually unlimited processing power, and promising cross- device platform compatibility. The two popular data types, social and sensor data, are in fact mutually compensatory in various data processing and analysis. Participatory sensing, for instance, enables to collect people-sensed data via social network services (e.g., Twitter) over the areas where physical sensors are unavailable. Simultaneously, sensor data is capable of offering precise context information, leading to effective analysis of social data. Obviously, the potential of blending social and sensor data is high; nevertheless, they are typically processed separately in mobile cloud applications, and the potential has not been investigated sufficiently. In this paper, we explore the possibility of fusing social and sensor data in the cloud, while dealing with massive data streams. To this end, we present a travel recommendation system as a practice of the fusion, which offers the information of people’s moods regarding the predicted weather on where and when users wish to travel. The recommendation system gears various components towards effective, large-scale social and sensor data fusion. We summarize the salient features of the system in the sequel. • First, we propose a conceptual framework that enables to integrate and analyze the heterogeneous social and sensor data. Specifically, the framework first transforms tweets into data points in a mood space which consists of 12 subspaces, each of which corresponds to a mood (e.g., happy). We then derive the probability of each mood in the mood space from a large number of tweet data points accumulated over time. The system computes and maintains the mood probability information separately ac- cording to day (e.g., Monday), place (e.g., London), and weather (e.g., sunny), which are the major dimensions in query processing. • Second, we present a scalable fusion system that imple- ments the conceptual framework, extracting the weather- dependent mood information from real-time Twitter and meteorological sensor data. Our travel recommendation system is established upon a combination of several well- known systems typically used for large-scale data store and analysis in the cloud, such as Hadoop [1], HBase [2], and GSN [3]. This allows us to perform ETL jobs as well as analytic processing over massively streaming data. • Third, we offer in-depth analysis of our data-fusion ap- proach on comprehensive experimental results, obtained from using 12 million tweets as well as meteorological sensor readings collected over four months. The results demonstrate various interesting findings, including the degree of happiness according to a particular weather type, day, and location. Furthermore, we statistically prove that our mood estimation based on the fusion is effective and accurate. We believe that the approach proposed in this paper can set a firm yard-stone in scalable social and sensor data fusion, serving as an important foundation in further studies towards mobile cloud computing. The rest of the paper is organized as follows. Section II summarizes the related work. Section III describes in detail the theoretical framework for fusing social and sensor data, while Section IV presents the technical details as well as data collections used in our travel recommendation system. Section V offers experimental analysis on the data fusion, followed by the conclusions in Section VI.