Using Multivariate Analysis of Variance algorithm in Keystroke Identification M. Analoui 1 , A. Mirzaei 2 , H. Davarpanah 3 1-Ass. Professor, Computer Eng. Dept., IUST, Tehran, Iran, analoui@iust.ac.ir 2- Graduate student, Computer Eng. Dept., IUST, Tehran, Iran, ab_mirzaei@iust.ac.ir 3- Graduate student, Computer Eng. Dept., IUST, Tehran, Iran, davarpanah@iust.ac.ir Abstract This paper presents the use of Multivariate Analysis of Variance method in Keystroke identification. The proposed method produces a new feature set from the original measured futures. There are two major advantages in the new feature set. First: the size of feature set is very small comparing to the original features. Second: the new features will be orthogonal. We apply a Nearest Neighbor algorithm to identify the users. Although this algorithm is fast and less complex than the other classification algorithm, the results are extremely well. Multivariate Analysis of Variance (MANOVA) is an algorithm that is usually used for canonical analysis. It looks for a linear combination of an original variable set that has the largest separation between groups. The algorithm produces orthogonal features and therefore the correlation effect in distance measure is reduced. Keywords : Keystroke, Identification, Multivariate Analysis of Variance, Nearest Neighborhood algorithm 1. Introduction Securing computer network and management of access to information plays an important role in planning and designing a new system. If we limit the entrance of the users, who have not permission to use it, we can hope to have a safe system. We can categorize useful metrics for user recognition in three groups: • What user enters into system for introducing his/her –self, such as username, password, PIN code, and so on. • What user offers into system to access the system, such as Card, Token, and so on. • What the user is: that is the user’s voice, figure print, eye retina, and so on. Two first groups have the characteristic to be transferred to others (voluntary or involuntary). The third one, due to the lack of portability, is the strongest metric in spite of its high imple mentation cost. The typing identification systems are categorized in the third group .This system has two main advantages compared with other biometric systems . The keyboard is the default input devise, so the identification process doesn't need new tools .Moreover the user's characteristics can be measured during entire interaction of users with the system. Discussion with regard user recognition is divided into user verification and user identification. In user verification we try to identify that the entered user it truly the same person as he/she has claimed, so that output of the system would be true or false. In user identification we want to determine which person enters to the system. Several researches have been carried out in user verification context. In these cases, features are extracted from login name and password. [1-6] used statistical methods and their final accuracy was 97%. [4] assumed a normal probability distribution function for features. In, [7] [8] fuzzy logic is used for user verification. In these studies typing difficulty was used as one feature and in it the error rates reached about 7% to 9%. Also some of Neural Network architectures have been used and reached to the best results [9] [10]. [11] presents a suite of techniques for password authentication using fuzzy logic ,statistical methods and several hybrid combination of these approaches .The best result was achieved , Int’l Symposium on Telecommunications ( IST2003 ), 16 -18 August, 2003 Isfahan-Iran 391