Extracting IDS Rules from Honeypot Data: A Decision Tree Approach Pedro Henrique Matheus da Costa Ferreira, Leandro Nunes de Castro Natural Computing Laboratory, Graduate Program in Electrical Engineering Mackenzie Presbyterian University, Brazil Email: phmatheus@msn.com, lnunes@makenzie.br ABSTRACT This work uses data collected by honeypots to create rules and signatures for intrusion detec- tion systems. The rules are extracted from deci- sion trees constructed based on the data of a real honeypot installed on an internet connection without any filter. The results of the experiments showed that the extraction of rules for an intru- sion detection system is possible using data min- ing techniques, in particular the decision tree algorithm. The technique proposed allows the analyst to summarize the data into a tree, where he/she can identify problems and extract rules to help reducing or even mitigate the security prob- lems pointed out by the honeypot. KEYWORDS Honeypot, Intrusion Detection System, Datamining, Decision Tree, Dionaea. 1 INTRODUCTION AND MOTIVATION Over the past ten years there has been an expo- nential increase of devices connected to the Internet [1], which promoted the emergence of a new and fertile ground for cyber criminals. They see in the system failures, the lack of technical training for network administrators and lack of vision of the companies that infor- mation security is a vital area for the health of business [2] the perfect opportunity to take ad- vantage exploiting these flaws. One of the main difficulties of a network ad- ministrator is to keep the network safe from external attacks. According to [3] the attacks reported by companies in the last two years are divided as follows: 43% are of malicious code injection attacks through SQL, other 19.95% are attacks targeted only at companies or ser- vices provided by companies (APT); Botnet’s represent 18.81% and, finally, the denial of service attacks (DoS) reached 18.24%. Still according to this study, organizations face an average of 66 weekly cyber attacks that cause some sort of damage to business. Organi- zations in Germany and the United States expe- rience the highest average weekly attacks, 82 and 79, respectively. Brazil and Hong Kong have the lowest average frequency, totaling 47 and 54 attacks per week, respectively. This type of scenario brought to light some studies, such as [4], which proposed the first intrusion detection system and the work of [5], which launched the first honeypot. The work in [6] proposed the creation of virtual honeypots. These works seek to create tools to assist the protection of computing assets by detecting intruders or creating traps to monitor malicious activities. This work proposes the application of a data mining technique based on the C4.5 decision tree algorithm to a dataset obtained from at- tacks targeting a Dionaea honeypot. After the application of the technique it was possible to generate rules for the IDS. The method also reduced the volume of data to be analyzed al- lowing the network administrator to have an analytical overview of the information cap- tured. This paper is organized as follows. Section 2 provides a brief review of honeypots and Dio- naea. Section 3 presents a case study for the Paris dataset, including database details, pre- The Proceedings of the International Conference in Information Security and Digital Forensics, Thessaloniki, Greece, 2014 ISBN: 978-1-941968-03-1 ©2014 SDIWC 97