Efﬁcient Change Detection for High Dimensional Data Streams Spiros V. Georgakopoulos Department of Computer Science and Biomedical Informatics University of Thessaly Lamia, Greece spirosgeorg@dib.uth.gr Sotiris K. Tasoulis Department of Applied Mathematics Liverpool John Moores University Liverpool, United Kingdom S.Tasoulis@ljmu.ac.uk Vassilis P. Plagianakos Department of Computer Science and Biomedical Informatics University of Thessaly Lamia, Greece vpp@dib.uth.gr Abstract—The recent technological advancements in cloud computing and the access in increasing computational power has led in undertaking the data processing derived by mobile devices. In particular, when these data are high dimensional this is indispensable, since the mobile device has to balance its processing functionalities to additional services. However, developing efﬁcient algorithms could allow various types of analysis to be performed locally, avoiding the necessity of a constantly connected device. In this work, we present a method- ology that combines lightweight dimensionality reduction and change detection techniques. The experimental results justify its impressive performance and subsequently its usefulness in several tasks. Index Terms—High Dimensional Data, Data streams, Cumula- tive Sum, Incremental Principal Component Analysis. 1. Introduction The recent years, within the ﬁeld of sensor networks, various wearable sensors are used to collect human body information. Furthermore, advances in Artiﬁcial Intelligent and Machine Learning allow data processing [1], [2] in an attempt to aid the medical treatment, social welfare, sports, etc. In many cases smartphone devices, having a variety of built-in sensors are used to collect these data. However, as the data dimensionality tends to grow, the limited memory and computational power of mobile devices such as the smartphones, Raspberry Pi or Unmanned Aerial Vehicle, is hindering the efﬁcient data processing. To deal with this problem, the wireless network capabili- ties of the devices are used and data are processed in remote servers or more recently in high computational power cloud infrastructure [3]. Nevertheless, this approach gives birth to a new series of problems [4], such as network connectivity, device energy consumption, etc. In this work, we provide a methodology that ﬁts on the low memory and computational capabilities of smartphones. To test our approach, we use the publicly available dataset “Human Activities and Postural Transitions” (HAPT) [5] which is a time series dataset characterized by high dimen- sionality. To this end, we employ an online dimensionality reduction technique to reduce the original space to an 1- dimensional space coupled with a lightweight statistical method for time series analysis. Our aim is to capture in real time a speciﬁc state in the signal every time it is appearing, using only the smartphone device. The rest of the paper is structured as follows: In Sec- tion 2, we provide information regarding the dataset used. In Section 3, background material for dimensionality reduction and classiﬁcation methods are provided. In Section 4, we present the proposed methodology and the experimental results. Finally, Section 5 contains concluding remarks and pointers for future work. 2. Dataset As a case study to examine our methodology, we use a multivariant time series dataset [6], constructed using a series of basic human activities which are obtained using the sensor signals of a smartphone. To assemble the dataset, experiments were carried out within a group of 30 volunteers at the age bracket of 19-48 years. All the participants were wearing a smartphone (Samsung Galaxy S II) on their waist during the experiment execution. 3-axial linear acceleration and 3-axial angular velocity were captured at a constant rate of 50Hz using the built-in accelerometer and gyroscope. The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise ﬁlters and then sampled in ﬁxed-width sliding windows of 2.56 sec and 50% over- lap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components was separated using a Butterworth low-pass ﬁlter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a ﬁlter with 0.3 Hz cutoff frequency was used. From each window, a vector of 561 features was obtained by calculating variables from the time and frequency domain. 3. Background Methods In this section, we brieﬂy review the basic tools used in the proposed methodology. In particular, we present the