International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 22 (2017) pp. 12762-12766 © Research India Publications. http://www.ripublication.com 12762 Comparative Analysis for Detecting DNS Tunneling Using Machine Learning Techniques 1 Mahmoud Sammour, 2 Burairah Hussin, 3 Mohd Fairuz Iskandar Othman Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Malaysia. 1 Orcid: 0000-0002-6860-2804 Abstract DNS tunneling is one of the issues that have concerned the information security community in the last decade. Such malicious activity resembles a legitimate threat for many corporations where there are a respected amount of network traffic that would be embedded with DNS tunneling. The threats that caused by such tunneling could be ranged from the full remote control into file transfer or even a full IP tunnel. Therefore, different approaches have been proposed for detecting the DNS tunneling such firewalls and intrusion detection systems. However, these approaches are limited to specific types of tunneling. Therefore, researchers have tended to utilize machine learning techniques due to its ability to analyze and predict the occurrence of DNS tunneling. Nonetheless, there are plenty of choices for employing specific machine learning techniques. This paper aims to provide a comparative study for three machine learning techniques including SVM, NB and J48. A benchmark dataset for the DNS tunneling has been used in the experiment in order to facilitate the comparison. Experimental results showed that SVM has the superior performance compared to the other classifiers in terms of detecting DNS tunneling by achieving 83% of f-measure. Keywords: Domain Name System, Tunneling, Support Vector Machine, Naïve Bayes, Decision Tree, Classification INTRODUCTION Domain Name System (DNS) is one of the important protocols that has a vital role regarding web activities such as browsing and emailing. This can be represented by allowing applications to use names such as example.com instead of a difficult-to- remember IP address [1]. Many organizations do not consider any threats regarding the DNS because it is not related to the data transfer. Nonetheless, many companies could be vulnerable to numerous types of threats throughout the DNS [2]. This is due to the respected amount of traffic that would be subjected to the DNS threats. Nowadays, many utilities are being available for conducting the tunneling over DNS, most of these utilities aim at gaining a free Wi-Fi access for sites that required restricted access via http [3]. However, serious threats could be happened along with gaining the free Wi-Fi access. Such threats can be represented as malicious activities that would be accommodated via the DNS tunneling. Using the DNS tunneling, a full remote control could be conducted via a channel for a compromised internet host. In addition, different activities could be conducted via the DNS tunneling such as operating system commands, file transfer or even a full IP tunnel. Feederbot [4] and Moto [5] are examples of DNS tunneling tools known to use DNS as a communication method. All the latter mentioned threats have motivated the information security community to provide robust methods that have the ability to detect the DNS tunneling [6]. Various types of detection DNS tunneling methods have been proposed, such methods can be categorized into two main classes; Traffic Analysis and Payload Analysis. The first class aims to analyze the overall traffic where some significant features could be identified such as volume of DNS traffic, number of hostnames per domain, location and domain history. While the second class aims to analyze the payload of a single request in order to identify features such as domain length, number of bytes and content. Analyzing the features that are related to the DNS tunneling has led the researcher community to utilize rule-based approaches in which both traffic and payload are being analyzed in terms of some features. Once a predefined condition has been occurred, the identification of DNS tunneling will be operated. However, with the complex and tedious task of manual curation of rules, researchers have tended to utilize Machine Learning Techniques (MLT). The key characteristic behind the machine learning lies on the statistical model that has the ability to identify significant rules automatically [7-9]. In addition, with the emergence of annotated datasets such as the JSON [10] which contains network connections with predefined labels (e.g. Tunneled or Legitimate), the focus on machine learning has been expanded. This is due to MLT requires historical data that is annotated. Hence, MLT would have the ability to train the model based on such data. Based on such training, a new data can be tested. In fact, there are numerous type of MLTs such as Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree (DT), K- nearest Neighbor (KNN) and others. With this variety, it is a challenging task to identify the most suitable classifier, that would fit the process of detecting DNS tunneling. This paper aims to accommodate a comparative analysis regarding the process of DNS tunneling using three MLTs classifier including SVM, NB and DT.