International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 22 (2017) pp. 12762-12766
© Research India Publications. http://www.ripublication.com
12762
Comparative Analysis for Detecting DNS Tunneling Using Machine
Learning Techniques
1
Mahmoud Sammour,
2
Burairah Hussin,
3
Mohd Fairuz Iskandar Othman
Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Malaysia.
1
Orcid: 0000-0002-6860-2804
Abstract
DNS tunneling is one of the issues that have concerned the
information security community in the last decade. Such
malicious activity resembles a legitimate threat for many
corporations where there are a respected amount of network
traffic that would be embedded with DNS tunneling. The threats
that caused by such tunneling could be ranged from the full
remote control into file transfer or even a full IP tunnel.
Therefore, different approaches have been proposed for
detecting the DNS tunneling such firewalls and intrusion
detection systems. However, these approaches are limited to
specific types of tunneling. Therefore, researchers have tended
to utilize machine learning techniques due to its ability to
analyze and predict the occurrence of DNS tunneling.
Nonetheless, there are plenty of choices for employing specific
machine learning techniques. This paper aims to provide a
comparative study for three machine learning techniques
including SVM, NB and J48. A benchmark dataset for the DNS
tunneling has been used in the experiment in order to facilitate
the comparison. Experimental results showed that SVM has the
superior performance compared to the other classifiers in terms
of detecting DNS tunneling by achieving 83% of f-measure.
Keywords: Domain Name System, Tunneling, Support Vector
Machine, Naïve Bayes, Decision Tree, Classification
INTRODUCTION
Domain Name System (DNS) is one of the important protocols
that has a vital role regarding web activities such as browsing
and emailing. This can be represented by allowing applications
to use names such as example.com instead of a difficult-to-
remember IP address [1]. Many organizations do not consider
any threats regarding the DNS because it is not related to the
data transfer. Nonetheless, many companies could be vulnerable
to numerous types of threats throughout the DNS [2]. This is due
to the respected amount of traffic that would be subjected to the
DNS threats.
Nowadays, many utilities are being available for conducting the
tunneling over DNS, most of these utilities aim at gaining a free
Wi-Fi access for sites that required restricted access via http [3].
However, serious threats could be happened along with gaining
the free Wi-Fi access. Such threats can be represented as
malicious activities that would be accommodated via the DNS
tunneling. Using the DNS tunneling, a full remote control could
be conducted via a channel for a compromised internet host. In
addition, different activities could be conducted via the DNS
tunneling such as operating system commands, file transfer or
even a full IP tunnel. Feederbot [4] and Moto [5] are examples
of DNS tunneling tools known to use DNS as a communication
method.
All the latter mentioned threats have motivated the information
security community to provide robust methods that have the
ability to detect the DNS tunneling [6]. Various types of
detection DNS tunneling methods have been proposed, such
methods can be categorized into two main classes; Traffic
Analysis and Payload Analysis. The first class aims to analyze
the overall traffic where some significant features could be
identified such as volume of DNS traffic, number of hostnames
per domain, location and domain history. While the second class
aims to analyze the payload of a single request in order to
identify features such as domain length, number of bytes and
content.
Analyzing the features that are related to the DNS tunneling has
led the researcher community to utilize rule-based approaches in
which both traffic and payload are being analyzed in terms of
some features. Once a predefined condition has been occurred,
the identification of DNS tunneling will be operated. However,
with the complex and tedious task of manual curation of rules,
researchers have tended to utilize Machine Learning Techniques
(MLT). The key characteristic behind the machine learning lies
on the statistical model that has the ability to identify significant
rules automatically [7-9]. In addition, with the emergence of
annotated datasets such as the JSON [10] which contains
network connections with predefined labels (e.g. Tunneled or
Legitimate), the focus on machine learning has been expanded.
This is due to MLT requires historical data that is annotated.
Hence, MLT would have the ability to train the model based on
such data. Based on such training, a new data can be tested.
In fact, there are numerous type of MLTs such as Support Vector
Machine (SVM), Naïve Bayes (NB), Decision Tree (DT), K-
nearest Neighbor (KNN) and others. With this variety, it is a
challenging task to identify the most suitable classifier, that
would fit the process of detecting DNS tunneling. This paper
aims to accommodate a comparative analysis regarding the
process of DNS tunneling using three MLTs classifier including
SVM, NB and DT.