A systematic approach for detecting and clustering distributed cyber scanning Elias Bou-Harb ⇑ , Mourad Debbabi, Chadi Assi Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada article info Article history: Received 8 February 2013 Received in revised form 13 May 2013 Accepted 12 September 2013 Available online 29 September 2013 Keywords: Cyber scanning detection Detrended Fluctuation Analysis Unsupervised data clustering abstract We present in this paper an approach that is composed of two techniques that respectively tackle the issues of detecting corporate cyber scanning and clustering distributed recon- naissance activity. The first employed technique is based on a non-attribution anomaly detection approach that focuses on what is being scanned rather than who is performing the scanning. The second technique adopts a statistical time series approach that is ren- dered by observing the correlation status of a traffic signal to perform the identification and clustering. To empirically validate both techniques, we utilize and examine two real network traffic datasets and implement two experimental environments. The first dataset comprises of unsolicited one-way telescope/darknet traffic while the second dataset has been captured in our lab through a customized setup. The results show, on one hand, that for a class C network with 250 active hosts and 5 monitored servers, the training period of the proposed detection technique required a stabilization time of less than 1 s and a state memory of 80 bytes. Moreover, in comparison with Snort’s sfPortscan technique, it was able to detect 4215 unique scans and yielded zero false negative. On the other hand, the proposed clustering technique is able to correctly identify and cluster the scanning machines with high accuracy even in the presence of legitimate traffic. We further validate this clustering technique by formulating the presented scenario as a machine learning problem. Specifically, we compare our proposed technique with an unsupervised data clus- tering technique that adopts the k-means and the expectation maximization approach. The results authenticate our clustering technique rendering it feasible for adoption. Ó 2013 Elsevier B.V. All rights reserved. 1. Introduction The ever increasing population and embracing of cyber- space has been a great asset both socially and economi- cally. However, recent events demonstrated that cyberspace could be subjected to amplified, debilitating and disrupting attacks that might lead to severe security issues with drastic consequences. In general, cyberspace could facilitate distributed denial of service attacks [1], ad- vanced persistent threats [2], zero day exploits [3] and cy- ber terrorism/warfare [4,5]. Despite efforts to protect the cyberspace, the latest report from Ottawa’s Auditor Gen- eral highlighted that only limited progress has been made in improving the cyber security of crucial networks [6]. Cy- ber scanning, the task of probing enterprise networks or Internet wide services, searching for vulnerabilities or ways to infiltrate IT assets, has been a growing cyber secu- rity concern. The latter is due to the fact that cyber scan- ning is commonly the primary stage of an intrusion attempt that enables an attacker to remotely locate, target, and subsequently exploit vulnerable systems. It is basically a core technique and the main enabler of the above men- tioned cyber attacks. Fig. 1 depicts a general anatomy of a cyber attack where cyber scanning plays a major role. In- deed, the capability to detect, identify and attribute such scanning activity and its components is a very important task to achieve as this would aid in preventing or mitigat- ing the actual cyber attack from occurring. 1389-1286/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.comnet.2013.09.008 ⇑ Corresponding author. Tel.: +1 5146495049. E-mail address: e_bouh@encs.concordia.ca (E. Bou-Harb). Computer Networks 57 (2013) 3826–3839 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet