Vol.:(0123456789) 1 3 Journal of Ambient Intelligence and Humanized Computing https://doi.org/10.1007/s12652-018-0685-7 ORIGINAL RESEARCH Predicting unusual energy consumption events from smart home sensor network by data stream mining with misclassifed recall Simon Fong 1  · Jiaxue Li 1  · Wei Song 2  · Yifei Tian 2  · Raymond K. Wong 3  · Nilanjan Dey 4 Received: 22 June 2017 / Accepted: 14 December 2017 © Springer-Verlag GmbH Germany, part of Springer Nature 2018 Abstract With the popularity and afordability of ZigBee wireless sensor technology, IoT-based smart controlling system for home appliances becomes prevalent for smart home applications. From the data analytics point of view, one important objective from analyzing such IoT data is to gain insights from the energy consumption patterns, thereby trying to fne-tune the energy efciency of the appliance usage. The data analytics usually functions at the back-end crunching over a large archive of big data accumulated over time for learning the overall pattern from the sensor data feeds. The other objective of the analytics, which may often be more crucial, is to predict and identify whether an abnormal consumption event is about to happen. For example, a sudden draw of energy that leads to hot spot in the power grid in a city, or black-out at home. This dynamic prediction is usually done at the operational level, with moving data stream, by data stream mining methods . In this paper, an improved version of very fast decision tree (VFDT) is proposed, which learns from misclassifed results for the sake of fltering the noisy data from learning and maintaining sharp classifcation accuracy of the induced prediction model. Spe- cifcally, a new technique called misclassifed recall (MR), which is a pre-processing step for self-rectifying misclassifed instances, is formulated. In energy data prediction, most misclassifed instances are due to data transmission errors or faulty devices. The former case happens intermittently, and the errors from the latter cause may persist for a long time. By caching up the data at the MR pre-processor, the one-pass online model learning can be efectively shielded in case of intermitting problems at the wireless sensor network; likewise the stored data could be investigated afterwards should the problem persist for long. Simulation experiments over a dataset about predicting exceptional appliances energy use in a low energy building are conducted. The reported results validate the efcacy of the new methodology VFDT + MR, in comparison to a collection of popular data stream mining algorithms from the literature. Keywords IoT smart home · Energy prediction · Data stream mining · Classifcation * Wei Song sw@ncut.edu.cn Simon Fong ccfong@umac.mo Jiaxue Li mb75431@umac.mo Yifei Tian tianyifei0000@sina.com Raymond K. Wong wong@cse.unsw.edu.au Nilanjan Dey neelanjan.dey@gmail.com 1 Department of Computer and Information Science, University of Macau, Taipa, Macau SAR, People’s Republic of China 2 Department of Digital Media Technology, North China University of Technology, Beijing, People’s Republic of China 3 School of Computer Science and Engineering, University of New South Wales, Sydney, Australia 4 Department of Information Technology, Techno India College of Technology, Kolkata, India