Database Intrusion Detection using Weighted
Sequence Mining
Abhinav Srivastava
1
, Shamik Sural
1
and A.K. Majumdar
2
1
School of Information Technology
2
Department of Computer Science & Engineering
Indian Institute of Technology, Kharagpur, 721302, India
Email: {abhinavs@sit, shamik@sit, akmj@cse}.iitkgp.ernet.in
Abstract— Data mining is widely used to identify interesting,
potentially useful and understandable patterns from a large
data repository. With many organizations focusing on web-
based on-line transactions, the threat of security violations
has also increased. Since a database stores valuable
information of an application, its security has started getting
attention. An intrusion detection system (IDS) is used to
detect potential violations in database security. In every
database, some of the attributes are considered more
sensitive to malicious modifications compared to others. We
propose an algorithm for finding dependencies among
important data items in a relational database management
system. Any transaction that does not follow these
dependency rules are identified as malicious. We show that
this algorithm can detect modification of sensitive attributes
quite accurately. We also suggest an extension to the Entity-
Relationship (E-R) model to syntactically capture the
sensitivity levels of the attributes.
Index Terms— Data dependency, Weighted Sequence
mining, Intrusion detection, E-R Model
I. INTRODUCTION
Over the last few years, data mining has attracted a lot
of attention due to increased generation, transmission and
storage of high volume data and an imminent need for
extracting useful information and knowledge from them
[1]. Data Mining refers to a collection of methods by
which large sets of stored data are filtered, transformed,
and organized into meaningful information sets [2]. It
also applies many existing computational techniques
from statistics, machine learning and pattern recognition.
In recent years, researchers have started looking into the
possibility of using data mining techniques in the
emerging field of computer security, especially in the
challenging problem of intrusion detection.
Intrusion is commonly defined as a set of actions that
attempt to violate the integrity, confidentiality or
availability of a system. Intrusion Detection is the process
of tracking important events occurring in a computer
system and analyzing them for possible presence of
intrusions [3]. Intrusion Detection Systems (IDSs) are the
software or hardware products that automate this
monitoring and analysis process. In intrusion detection, it
is assumed that all the prevention techniques are
compromised and an intruder has potentially entered into
the system. Hence, intrusion detection system is
considered to be the second line of defense. In general,
there are two types of attacks (i) inside and (ii) outside.
Inside attacks are the ones in which an intruder has all the
privileges to access the application or system but he
performs malicious actions. Outside attacks are the ones
in which the intruder does not have proper rights to
access the system. He attempts to first break in and then
perform malicious actions. Detecting inside attacks is
usually more difficult compared to outside attacks.
Intrusion detection systems determine if a set of
actions constitute intrusions on the basis of one or more
models of intrusion. A model classifies a sequence of
states or actions as "good" (no intrusion) or "bad"
(possible intrusions). There are mainly two models,
namely, anomaly detection and misuse detection. The
anomaly detection model bases its decision on the profile
of a user's normal behavior. It analyzes a user's current
session and compares it with the profile representing his
normal behavior. An alarm is raised if significant
deviation is found during the comparison of session data
and user's profile. This type of system is well suited for
the detection of previously unknown attacks. The main
disadvantage is that, it may not be able to describe what
the attack is and may sometimes have high false positive
rate. In contrast, a misuse detection model takes decision
based on comparison of user's session or commands with
the rule or signature of attacks previously used by
attackers. For example, a signature rule for the guessing
password attack can be "there are more than 6 failed login
attempts within 4 minutes". The main advantage of
misuse detection is that it can accurately and efficiently
detect occurrence of known attacks. However, these
systems are not capable of detecting attacks whose
signatures are not available.
In this paper, we propose a new approach for database
intrusion detection using a data mining technique which
takes the sensitivity of the attributes into consideration in
the form of weights. Sensitivity of an attribute signifies
how important the attribute is, for tracking against
8 JOURNAL OF COMPUTERS, VOL. 1, NO. 4, JULY 2006
© 2006 ACADEMY PUBLISHER