Journal of Digital Information Management Volume 16 Number 5 October 2018 213 A Novel Framework for Context-aware Outlier Detection in Big Data Streams Hussien Ahmad, Salah Dowaji Damascus University Syrian Arab Republic hussien824@gmail.com sdowaji@gmail.com ABSTRACT: Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications]; Data mining General Terms: Data Mining, Anomoly Detection, Contextual Detection Keywords: Outlier Detection, Anomaly Detection, Context- Aware Outlier, Outliers Modeling, Big Data Analytics, Big Data Stream Received: 17 April 2018, Revised 3 June 2018, Accepted 10 June 2018 DOI: 10.6025/jdim/2018/16/5/213-222 1. Introduction Outlier detection and analysis is an important data mining problem that aims to find anomaly points and behavior in data sets (Zhang, 2008). This problem has been investigated deeply in a broad set of disciplines such as sensor networks, network intrusion, web logs, medical diagnosis and banking and insurance industries. The importance of outlier detection is imputed to the fact that the anomaly behavior is interpreted into important or critical information, for example, an anomaly behavior in computer network traffic might mean that there is an intrusion and that sensitive data is being sent to unauthorized destination (Dokas et al., 2002), or an abnormal MRI might indicate the existence of tumors (Seo & Milanfar, 2009).