Extracting IDS Rules from Honeypot Data: A Decision Tree Approach
Pedro Henrique Matheus da Costa Ferreira, Leandro Nunes de Castro
Natural Computing Laboratory, Graduate Program in Electrical Engineering
Mackenzie Presbyterian University, Brazil
Email: phmatheus@msn.com, lnunes@makenzie.br
ABSTRACT
This work uses data collected by honeypots to
create rules and signatures for intrusion detec-
tion systems. The rules are extracted from deci-
sion trees constructed based on the data of a real
honeypot installed on an internet connection
without any filter. The results of the experiments
showed that the extraction of rules for an intru-
sion detection system is possible using data min-
ing techniques, in particular the decision tree
algorithm. The technique proposed allows the
analyst to summarize the data into a tree, where
he/she can identify problems and extract rules to
help reducing or even mitigate the security prob-
lems pointed out by the honeypot.
KEYWORDS
Honeypot, Intrusion Detection System,
Datamining, Decision Tree, Dionaea.
1 INTRODUCTION AND MOTIVATION
Over the past ten years there has been an expo-
nential increase of devices connected to the
Internet [1], which promoted the emergence of
a new and fertile ground for cyber criminals.
They see in the system failures, the lack of
technical training for network administrators
and lack of vision of the companies that infor-
mation security is a vital area for the health of
business [2] the perfect opportunity to take ad-
vantage exploiting these flaws.
One of the main difficulties of a network ad-
ministrator is to keep the network safe from
external attacks. According to [3] the attacks
reported by companies in the last two years are
divided as follows: 43% are of malicious code
injection attacks through SQL, other 19.95%
are attacks targeted only at companies or ser-
vices provided by companies (APT); Botnet’s
represent 18.81% and, finally, the denial of
service attacks (DoS) reached 18.24%.
Still according to this study, organizations face
an average of 66 weekly cyber attacks that
cause some sort of damage to business. Organi-
zations in Germany and the United States expe-
rience the highest average weekly attacks, 82
and 79, respectively. Brazil and Hong Kong
have the lowest average frequency, totaling 47
and 54 attacks per week, respectively.
This type of scenario brought to light some
studies, such as [4], which proposed the first
intrusion detection system and the work of [5],
which launched the first honeypot. The work in
[6] proposed the creation of virtual honeypots.
These works seek to create tools to assist the
protection of computing assets by detecting
intruders or creating traps to monitor malicious
activities.
This work proposes the application of a data
mining technique based on the C4.5 decision
tree algorithm to a dataset obtained from at-
tacks targeting a Dionaea honeypot. After the
application of the technique it was possible to
generate rules for the IDS. The method also
reduced the volume of data to be analyzed al-
lowing the network administrator to have an
analytical overview of the information cap-
tured.
This paper is organized as follows. Section 2
provides a brief review of honeypots and Dio-
naea. Section 3 presents a case study for the
Paris dataset, including database details, pre-
The Proceedings of the International Conference in Information Security and Digital Forensics, Thessaloniki, Greece, 2014
ISBN: 978-1-941968-03-1 ©2014 SDIWC 97