Database Intrusion Detection using Weighted Sequence Mining Abhinav Srivastava 1 , Shamik Sural 1 and A.K. Majumdar 2 1 School of Information Technology 2 Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur, 721302, India Email: {abhinavs@sit, shamik@sit, akmj@cse}.iitkgp.ernet.in Abstract— Data mining is widely used to identify interesting, potentially useful and understandable patterns from a large data repository. With many organizations focusing on web- based on-line transactions, the threat of security violations has also increased. Since a database stores valuable information of an application, its security has started getting attention. An intrusion detection system (IDS) is used to detect potential violations in database security. In every database, some of the attributes are considered more sensitive to malicious modifications compared to others. We propose an algorithm for finding dependencies among important data items in a relational database management system. Any transaction that does not follow these dependency rules are identified as malicious. We show that this algorithm can detect modification of sensitive attributes quite accurately. We also suggest an extension to the Entity- Relationship (E-R) model to syntactically capture the sensitivity levels of the attributes. Index Terms— Data dependency, Weighted Sequence mining, Intrusion detection, E-R Model I. INTRODUCTION Over the last few years, data mining has attracted a lot of attention due to increased generation, transmission and storage of high volume data and an imminent need for extracting useful information and knowledge from them [1]. Data Mining refers to a collection of methods by which large sets of stored data are filtered, transformed, and organized into meaningful information sets [2]. It also applies many existing computational techniques from statistics, machine learning and pattern recognition. In recent years, researchers have started looking into the possibility of using data mining techniques in the emerging field of computer security, especially in the challenging problem of intrusion detection. Intrusion is commonly defined as a set of actions that attempt to violate the integrity, confidentiality or availability of a system. Intrusion Detection is the process of tracking important events occurring in a computer system and analyzing them for possible presence of intrusions [3]. Intrusion Detection Systems (IDSs) are the software or hardware products that automate this monitoring and analysis process. In intrusion detection, it is assumed that all the prevention techniques are compromised and an intruder has potentially entered into the system. Hence, intrusion detection system is considered to be the second line of defense. In general, there are two types of attacks (i) inside and (ii) outside. Inside attacks are the ones in which an intruder has all the privileges to access the application or system but he performs malicious actions. Outside attacks are the ones in which the intruder does not have proper rights to access the system. He attempts to first break in and then perform malicious actions. Detecting inside attacks is usually more difficult compared to outside attacks. Intrusion detection systems determine if a set of actions constitute intrusions on the basis of one or more models of intrusion. A model classifies a sequence of states or actions as "good" (no intrusion) or "bad" (possible intrusions). There are mainly two models, namely, anomaly detection and misuse detection. The anomaly detection model bases its decision on the profile of a user's normal behavior. It analyzes a user's current session and compares it with the profile representing his normal behavior. An alarm is raised if significant deviation is found during the comparison of session data and user's profile. This type of system is well suited for the detection of previously unknown attacks. The main disadvantage is that, it may not be able to describe what the attack is and may sometimes have high false positive rate. In contrast, a misuse detection model takes decision based on comparison of user's session or commands with the rule or signature of attacks previously used by attackers. For example, a signature rule for the guessing password attack can be "there are more than 6 failed login attempts within 4 minutes". The main advantage of misuse detection is that it can accurately and efficiently detect occurrence of known attacks. However, these systems are not capable of detecting attacks whose signatures are not available. In this paper, we propose a new approach for database intrusion detection using a data mining technique which takes the sensitivity of the attributes into consideration in the form of weights. Sensitivity of an attribute signifies how important the attribute is, for tracking against 8 JOURNAL OF COMPUTERS, VOL. 1, NO. 4, JULY 2006 © 2006 ACADEMY PUBLISHER