Bugs or Anomalies? Sequence Mining based Debugging in Wireless Sensor Networks Kefa Lu, Qing Cao, Michael Thomason Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville, Tennessee 37919 Email: {klu3, cao, thomason}@utk.edu Abstract—WSN applications are prone to bugs and failures due to their typical characteristics, such as being extensively distributed, heavily concurrent, and resource restricted. In this paper, we propose and develop a flexible and iterative WSN de- bugging system based on sequence mining techniques. At first, we develop a data structure called the vectorized Probabilistic Suffix Tree (vPST), an elastic model to extract and store sequential information from program runtime traces in compact suffix tree based vectors. Then, we build a novel WSN debugging system by integrating vPST with Support Vector Machines (SVM), a robust and generic classifier for both linear and nonlinear data classification tasks. Finally, we demonstrate that the vPST-SVM debugging system is efficient, flexible, and generic by three different test cases, two on the LiteOS operating system and one on the TinyOS operating system. I. I NTRODUCTION In the past decade, Wireless Sensor Networks (WSNs) have been widely developed and deployed for various purposes, such as environmental monitoring and data collection [1], [2], [3]. However, WSN applications are still suffering from numerous types of bugs and frequent failures [3], [4], due to their typical characteristics, such as distributed architecture, concurrent execution model, and strict resource limitations. It is difficult to perform efficient debugging on WSN applica- tions, because many of them are context sensitive and event driven. It is usually infeasible to fully control their operating context and triggering events. In addition, many WSN bugs are transient and irreproducible [5]. Therefore, it becomes a big challenge for current WSN researchers and developers to design and develop robust WSN debugging systems. In this paper, we design, implement, and evaluate a flexible and generic debugging system based on sequential data analy- sis and outlier detection techniques. Our approach is based on two theoretical models, the vectorized Probabilistic Suffix Tree (vPST) model and the Support Vector Machine (SVM) model. The original PST model is a flexible probabilistic model that can efficiently extract and store sequential information from sequences in compact suffix tree data structures [6], while SVM is a robust and generic classification technique that can solve both linear and nonlinear classification problems [7]. By extending PST to vPST, we are able to not only retain the sequential information but also the most significant substructures within sequences in compact and simple vectors. SVM can be easily applied on these vectors to detect outliers in the sequences. By combining the vPST model and the SVM classifier together with an efficient tracing subsystem, we find that the resulting technique proves to be immensely effective to locate real bugs. Our contributions in this paper are two-fold. First, we extend the PST model to the vPST model, which provides researchers a new methodology for extracting and analyzing sequential information. Specifically, the vPST model breaks sequences into pieces and stores them in meaningful data structures. Second, we propose the vPST-SVM system, which is especially helpful for detecting transient bugs. The whole debugging process is iterative, meaning that it allows the user to adjust debugging settings dynamically to achieve the best results. The whole system is evaluated by comparing prediction results for various test cases on different operating systems, where we incrementally changed the vPST depths during iterative debugging cycles. The following of this paper is organized as follows. In section 2, we briefly discuss related work, including some proposed WSN debugging systems. In section 3, we describe details on our vPST model and our iterative vPST-SVM anomaly detecting approach. In section 4, we describe our system design and implementation. In section 5, we present three interesting test cases for system evaluation. Section 6 concludes this paper with some discussions. II. RELATED WORK In the past decade, many different WSN debugging systems have been proposed [8], [9], [10]. However, some of them were not easily portable due to strong dependency on specific operating systems [10], [8]. Some others were restricted to source code analysis or simulation trace analysis [8], [9]. There is still a significant lack of efficient debugging systems that can fully take advantage of runtime traces from real deployments. In fact, there are many tricky bugs caused by race conditions or inappropriately controlled concurrencies that can only be triggered in real deployment under some specific circumstances. Indeed, there have been some efforts on developing trace based WSN debugging systems by data mining techniques. Dustminer [11] is based on frequent pattern mining. There are three significant drawbacks of it. First of all, it is based on and limited to frequent patterns mining. So Dustminer will fail to detect the bugs that only generate infrequent patterns. Secondly, it requires a lot of human effort to figure out clear 978-1-4673-2433-5/12/$31.00 ©2012 IEEE 463