Neural Networks 32 (2012) 275–284
Contents lists available at SciVerse ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
2012 Special Issue
Application of growing hierarchical SOM for visualisation of network forensics
traffic data
E.J. Palomo
a,∗
, J. North
b
, D. Elizondo
b
, R.M. Luque
a
, T. Watson
b
a
Department of Computer Science, University of Malaga, Malaga, Spain
b
Cyber Security Centre, Department of Computer Technology, De Monfort University, Leicester, United Kingdom
article info
Keywords:
Network forensics
Hierarchical self-organisation
Data clustering
Data visualisation
Feature extraction
abstract
Digital investigation methods are becoming more and more important due to the proliferation of digital
crimes and crimes involving digital evidence. Network forensics is a research area that gathers evidence
by collecting and analysing network traffic data logs. This analysis can be a difficult process, especially
because of the high variability of these attacks and large amount of data. Therefore, software tools that
can help with these digital investigations are in great demand. In this paper, a novel approach to analysing
and visualising network traffic data based on growing hierarchical self-organising maps (GHSOM) is
presented. The self-organising map (SOM) has been shown to be successful for the analysis of highly-
dimensional input data in data mining applications as well as for data visualisation in a more intuitive
and understandable manner. However, the SOM has some problems related to its static topology and its
inability to represent hierarchical relationships in the input data. The GHSOM tries to overcome these
limitations by generating a hierarchical architecture that is automatically determined according to the
input data and reflects the inherent hierarchical relationships among them. Moreover, the proposed
GHSOM has been modified to correctly treat the qualitative features that are present in the traffic data
in addition to the quantitative features. Experimental results show that this approach can be very useful
for a better understanding of network traffic data, making it easier to search for evidence of attacks or
anomalous behaviour in a network environment.
© 2012 Elsevier Ltd. All rights reserved.
1. Introduction
The network has become a staple method of transferring infor-
mation to support both personal and business requirements. How-
ever, as different services have been enabled across the network
environment, the potential for cyber-crime has grown with these.
Unfortunately, not only are criminals exploiting this medium to an
unprecedented degree but we are now looking at the potential of
cyber-warfare or cyber-terrorism.
Digital devices can often be configured to record the traffic
and data fed to them in the form of logs. The preservation and
extraction of this information in a manner which preserves its
integrity and soundness is digital forensics. This information and
its interpretation can be used in criminal courts as both a means of
defence and prosecution (Kruse & Heiser, 2001). Although digital
forensics can take many different forms, this paper specifically
looks at a sub-field of forensics involving analysing network traffic.
Network forensics typically involves analysing any available audit
∗
Correspondence to: Department of Computer Science, E.T.S.I. Informatica,
University of Malaga, Bulevar Louis Pasteur, 35, 29071, Malaga, Spain. Tel.: +34 952
132 847; fax: +34 952 131 397.
E-mail address: ejpalomo@lcc.uma.es (E.J. Palomo).
trails for the specific streams identifying the offending activity
(Mukkamala & Sung, 2003). These audit trails can be created using
reconstructive analysis on the log files which can be created by
many different devices and software services on the network
including routers, firewalls, web-servers and databases.
Although it can be seen that this kind of analysis is desirable, it
is a non-trivial task (Roussev & III, 2004). There are several reasons
for this. One is the amount of data which needs analysing to find
potentially very small tell tale signs. This is not just limited to the
number of records, but also to the number of different features
each record may contain. The other main reason for the difficulty
in identifying the offending data is in the pattern that data takes.
When analysing datasets there are two distinctive analysis that
can be done; the first is to look for known data patterns which
correspond to attacks which have been seen before and the second
is to look for data attacks which have not been seen or identified
before. This paper concentrates on identifying attacks which may,
or may not have been seen before; meaning that the form of the
data patterns to be identified is not known.
The identification of information, or patterns, in large subsets
of data is a property of the fields of data-mining and feature
extraction. Unsupervised learning techniques are a subset of
these fields which enable the identification and grouping of
0893-6080/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neunet.2012.02.021